The stable operation of transmission lines is paramount for modern power systems. Given the vast and geographically complex terrain, traditional manual inspection methods are often inefficient and hazardous. The proliferation of unmanned drone technology presents a viable solution for assisting in these critical tasks. While current research applies drones primarily for visual inspection using image recognition algorithms, a significant gap exists: these approaches seldom integrate real-time environmental and operational data from the equipment itself. This reliance on purely visual data can lead to suboptimal fault diagnosis and a lack of predictive capability. To address this, we propose a comprehensive drone-assisted inspection and maintenance framework that deeply integrates environmental perception. Our contribution is twofold: first, we design an intelligent multi-sensor data acquisition module mounted on the unmanned drone to gather a holistic dataset; second, we introduce significant enhancements to the YOLOv5 object detection algorithm, making it both lightweight for deployment on drone platforms and more accurate for identifying power equipment faults.
The core hardware of our system is built upon a commercial unmanned drone platform selected for its endurance and payload capacity. To transcend the limitations of vision-only inspection, we engineered an Intelligent Data Acquisition Module. This module expands the drone’s sensing capabilities by integrating a suite of sensors: current and voltage sensors for monitoring line load, infrared and contact temperature sensors for detecting thermal anomalies, and environmental sensors for humidity and ambient temperature. Simultaneously, a high-resolution camera captures visual data. A central data acquisition terminal synchronizes and tags all numerical sensor readings with their corresponding image frames and GPS coordinates, creating a rich, multi-modal dataset. This fusion of visual and parametric data forms the foundation for our “environmental perception” capability. For physical intervention, the unmanned drone is equipped with a robotic pod system capable of performing simple maintenance tasks, such as clearing minor obstructions, guided by the system’s analysis.

The software pipeline begins with preprocessing the captured image data. Raw images from field inspections are prone to noise, which can degrade detection accuracy. We employ a cascaded filtering approach for robust denoising. First, a median filter is applied to remove salt-and-pepper noise without blurring edges, defined for a filter kernel $K$ (typically 3×3 or 5×5) as:
$$ z(x,y) = \text{median}\{I(x-i, y-j)\}, \quad (i,j) \in K $$
where $I$ is the input image and $z$ is the filtered output. This is followed by a Non-Local Means (NLM) filter to address Gaussian noise while preserving texture. For a pixel $i$, its denoised value $NLM(i)$ is a weighted average of all pixels $j$ in a search region $\Omega_i$:
$$ NLM(i) = \sum_{j \in \Omega_i} w(i, j) \cdot I(j) $$
The weight $w(i,j)$ depends on the similarity between patches centered at pixels $i$ and $j$. This two-stage process ensures high-quality input for the subsequent detection algorithm.
The choice of object detection algorithm is critical for real-time operation on a resource-constrained unmanned drone. While YOLOv5 offers an excellent balance of speed and accuracy, its standard backbone network is computationally heavy. To achieve lightweighting, we replace the original backbone with ShuffleNetV2, a network renowned for its efficiency. We further optimize the convolution operations within ShuffleNetV2. A standard convolution layer has parameter count $G_s$:
$$ G_s = G^2 \times C_{in} \times C_{out} $$
where $G$ is the kernel size, $C_{in}$ is the number of input channels, and $C_{out}$ is the number of output channels. We decompose this into a depthwise convolution followed by a pointwise convolution (a depthwise separable convolution variant), reducing the parameter count to $G_g$:
$$ G_g = G^2 \times C_{in} + C_{in} \times C_{out} $$
The ratio $\alpha$ of the parameters in the improved versus standard convolution is:
$$ \alpha = \frac{G_g}{G_s} = \frac{G^2 + C_{out}}{G^2 \times C_{out}} $$
Since $C_{out}$ is typically much larger than $G^2$, $\alpha$ is significantly less than 1, confirming the reduction in parameters and computational load, making it ideal for an unmanned drone‘s onboard processor.
To boost detection accuracy, particularly for small targets like damaged insulators or bird nests, we refine the loss function and integrate an attention mechanism. The default CIoU loss in YOLOv5 considers the overlap area, center point distance, and aspect ratio. We replace it with EIoU loss, which directly minimizes the differences in width and height, leading to faster and better convergence:
$$ \mathcal{L}_{EIoU} = 1 – IoU + \frac{\rho^2(\mathbf{b}, \mathbf{b}^{gt})}{w_c^2 + h_c^2} + \frac{\rho^2(w, w^{gt})}{w_c^2} + \frac{\rho^2(h, h^{gt})}{h_c^2} $$
Here, $\mathbf{b}$ and $\mathbf{b}^{gt}$ are the centers of the predicted and ground-truth boxes, $\rho$ is the Euclidean distance, $w, h$ are the width and height, and $w_c, h_c$ are the width and height of the smallest enclosing box. Furthermore, we incorporate the Convolutional Block Attention Module (CBAM) into the network. CBAM sequentially infers attention maps along both the channel and spatial dimensions, allowing the model to focus on “what” and “where” is important. The channel attention $M_c$ is computed using global average and max pooling, while the spatial attention $M_s$ is computed from the concatenated feature maps:
$$ \begin{aligned} M_c &= \sigma(\text{MLP}(\text{AvgPool}(F)) + \text{MLP}(\text{MaxPool}(F))) \\ M_s &= \sigma(f^{7\times7}([\text{AvgPool}(F’); \text{MaxPool}(F’)])) \end{aligned} $$
where $F$ is the input feature map, $F’ = M_c \otimes F$, $\sigma$ is the sigmoid function, and $f^{7\times7}$ is a 7×7 convolution. This enables our unmanned drone‘s vision system to concentrate on critical fault features amidst cluttered backgrounds.
We validated our proposed system using a dataset collected from power grids in Northwestern China, comprising 6,330 images and 16,530 correlated sensor data samples. The dataset was annotated and split 9:1 for training and testing. We conducted an ablation study to evaluate the contribution of each system component, using standard metrics: Accuracy ($P_z$), Precision ($P_j$), and Recall ($P_r$), defined as:
$$ P_z = \frac{TP + TN}{TP + TN + FP + FN}, \quad P_j = \frac{TP}{TP + FP}, \quad P_r = \frac{TP}{TP + FN} $$
where $TP, TN, FP, FN$ are true positives, true negatives, false positives, and false negatives, respectively.
| Model Configuration | Accuracy (%) | Precision (%) | Recall (%) |
|---|---|---|---|
| Baseline YOLOv5 (Y) | 85.42 | 85.11 | 84.54 |
| Y + Smart Data Module (TY) | 87.97 | 87.08 | 86.84 |
| TY + Denoising (TZY) | 89.27 | 89.01 | 88.73 |
| TZY + Lightweight Backbone (TZQY) | 93.17 | 93.04 | 92.78 |
| TZQY + CBAM & EIoU (TZQCY) | 98.52 | 98.13 | 97.96 |
The ablation study clearly demonstrates the progressive improvement offered by each module. The intelligent data collection provides richer context, the denoising cleans the input, the lightweight backbone ensures efficiency on the unmanned drone, and the CBAM with EIoU sharply focuses and refines detection. The final model (TZQCY) achieved an average inference time of 0.09 seconds per image, suitable for real-time aerial inspection.
We further compared our final TZQCY model against other state-of-the-art object detection algorithms. The results, shown below, underscore the superior performance of our integrated approach developed for the unmanned drone platform.
| Algorithm Model | Accuracy (%) | Precision (%) | Recall (%) |
|---|---|---|---|
| YOLOv3 | 82.42 | 82.12 | 80.91 |
| YOLOv4 | 83.24 | 83.14 | 83.09 |
| Faster R-CNN | 81.54 | 81.18 | 80.54 |
| Attention-SSD | 85.34 | 85.26 | 84.94 |
| Our TZQCY Model | 98.52 | 98.13 | 97.96 |
Finally, the practical utility of the system was tested by integrating the detection model with the robotic maintenance pod. The high-accuracy detections guided the unmanned drone to perform automated clearance of ice and tree branches encroaching on lines. The success rate, defined as the number of obstacles cleared divided by the total number targeted ($P_c = Z_c / Z$), was measured.
| Algorithm Model | Ice Clearance Rate (%) | Branch Clearance Rate (%) |
|---|---|---|
| YOLOv3 | 85.32 | 85.24 |
| YOLOv4 | 86.25 | 86.46 |
| Faster R-CNN | 84.34 | 84.52 |
| Attention-SSD | 87.56 | 88.42 |
| Our TZQCY Model | 97.24 | 97.25 |
In conclusion, this work presents a holistic environment-aware system for drone-assisted power inspection and maintenance. By designing a multi-sensor data acquisition module and a significantly improved, lightweight YOLOv5-based detection algorithm (TZQCY), we have successfully addressed key limitations of current vision-only drone inspections. The system not only achieves high fault identification accuracy and speed suitable for real-time operation on an unmanned drone but also enables preliminary autonomous maintenance actions. This integrated approach paves the way for more intelligent, predictive, and efficient power grid management, reducing risk and operational cost while enhancing reliability.
