In modern power systems, the rapid expansion of grid infrastructure has escalated the demand for efficient and accurate inspection techniques to ensure operational safety and reliability. Among critical components, insulators are pivotal for supporting and fixing transmission lines, but prolonged exposure to outdoor environments renders them susceptible to various defects such as string drops, surface burns, string tilts, and surface cracks, which can compromise system integrity. Traditional manual inspections are labor-intensive, time-consuming, and prone to human error, highlighting the need for advanced automated solutions. The advent of unmanned aerial vehicle (UAV) drone technology has revolutionized grid equipment monitoring by enabling high-resolution remote sensing image acquisition from aerial perspectives. However, UAV drone remote sensing images often exhibit challenges like spectral heterogeneity in homogeneous regions, small target sizes with limited pixels, and complex backgrounds affected by clouds, illumination, and occlusions, which hinder defect recognition capabilities. To address these limitations, I propose an automated defect identification method tailored for UAV drone remote sensing images of power grid equipment. This approach integrates frequency-domain enhancement, coordinate attention mechanisms, and optimized loss functions to improve feature extraction and recognition accuracy, demonstrating robust performance in complex scenarios.

The core of my method lies in enhancing UAV drone remote sensing images to accentuate defect features. UAV drone images, denoted as \(F(x, y)\), are transformed into the frequency domain using a forward transformation function \(\beta\), as shown below:
$$F_Z(x, y) = \beta\{F(x, y)\}$$
Here, \(F_Z(x, y)\) represents the frequency-domain characteristics of the UAV drone image. To refine these characteristics, a correction coefficient \(\gamma\) is applied:
$$F_X(x, y) = F_Z(x, y) \times \gamma$$
where \(F_X(x, y)\) denotes the corrected frequency-domain features. Subsequently, an inverse transformation \(\beta^{-1}\) is performed to revert the image to the spatial domain, yielding an enhanced UAV drone remote sensing image \(F_L(x, y)\):
$$F_L(x, y) = \beta^{-1}\{F_X(x, y)\}$$
This frequency-domain enhancement process mitigates noise and amplifies subtle defect patterns, facilitating better feature extraction. The use of UAV drone technology ensures that images are captured from optimal angles and altitudes, but enhancements are crucial for overcoming environmental variabilities.
To extract defect features from the enhanced UAV drone images, I incorporate a Coordinate Attention (CA) module. This module aggregates spatial coordinate information into attention maps, focusing on relevant regions for defects. For an input UAV drone image feature map \(x_k\), global average pooling is applied along the height and width dimensions to generate channel attention maps:
$$Z^k_h(h) = \frac{1}{W} \sum_{1\leq i < W} x_k(h, i)$$
$$Z^k_w(w) = \frac{1}{H} \sum_{1\leq j < H} x_k(j, w)$$
where \(Z^k_h(h)\) and \(Z^k_w(w)\) are the channel attention maps, \(W\) is the width, and \(H\) is the height of the UAV drone image. These maps are then combined and processed through a nonlinear activation function \(l\) and a convolution operation \(F\) to produce a feature map \(f\):
$$f = l\{F[Z^k_w(w), Z^k_h(h)]\}$$
Spatial attention maps are derived by applying Sigmoid activation \(\sigma\) and convolution along width and height:
$$g_h = \sigma[F_h(f^h)]$$
$$g_w = \sigma[F_w(f^w)]$$
where \(g_h\) and \(g_w\) are adjusted orthogonal representations. The final output feature map \(Y(i, j)\) is computed as:
$$Y(i, j) = g_h(i) \times g_w(j)$$
This CA module enhances the model’s ability to localize small defects in UAV drone images by emphasizing both channel and spatial dependencies, which is vital for identifying imperfections like insulator surface burns or cracks that occupy few pixels.
For automated defect identification, I design a comprehensive loss function that guides the convolutional neural network during training. The loss comprises three components: bounding box regression loss, confidence loss, and classification loss. The bounding box regression uses Complete Intersection over Union (CIoU) loss to measure the discrepancy between predicted and target boxes. Given a predicted box \(A\) and target box \(B\), the IoU loss is defined as:
$$L_{IoU} = 1 – IoU$$
$$IoU = \frac{A \cap B}{A \cup B}$$
The CIoU loss \(L_{CIoU}\) extends this by incorporating center distance and aspect ratio consistency:
$$L_{CIoU} = 1 – IoU + \frac{\rho^2(b, b_1)}{Y_w^2 + Y_h^2} + \alpha v$$
where \(\rho^2(b, b_1)\) is the Euclidean distance between box centers, \(Y_w\) and \(Y_h\) are the width and height of the smallest enclosing box, \(\alpha\) is a weight parameter, and \(v\) is a penalty term for aspect ratio:
$$\alpha = \frac{v}{(1 – IoU) + v}$$
$$v = \frac{4}{\pi^2} \left( \arctan \frac{w_{gt}}{h_{gt}} – \arctan \frac{w}{h} \right)^2$$
Here, \(w_{gt}\) and \(h_{gt}\) are the width and height of the ground-truth box, while \(w\) and \(h\) are those of the predicted box. The confidence loss \(L_{ojb}\) evaluates the objectness score:
$$L_{ojb} = \sum_{i=0}^{S^2} \sum_{j=0}^{Z} f_k \{ -\lg(g_0) + \lambda_2(l)[-\lg(1-g_0)] \}$$
$$g_0 = \sigma[IoU(g_{pre}, g_{truth})]$$
where \(S\) is the grid size, \(Z\) is the number of bounding boxes, \(f_k\) is the ground-truth category, \(g_0\) is the confidence score, and \(\lambda_2(l)\) is a negative sample loss weight. The classification loss \(L_{cls}\) uses binary cross-entropy (BCE):
$$L_{cls} = \sum_{i=0}^{S^2} \sum_{j=0}^{B} BCE(f_i, Z_j)$$
The overall loss function \(L\) integrates these components:
$$L = L_{cls} + L_{ojb} + L_{CIoU}$$
This loss function optimizes the network for precise defect localization and classification in UAV drone images, ensuring robustness against background clutter.
The defect identification process employs convolutional networks and transpose convolution techniques. Given feature maps from the CA module, convolution layers compute output matrices:
$$Y_i = \kappa^i_j + \lambda^i_j$$
where \(Y_i\) is the bias convolution matrix, \(\kappa^i_j\) is the convolution kernel, and \(\lambda^i_j\) is the bias term. Transpose convolution is applied for upsampling, utilizing the loss function:
$$(Y_i)^T = (Y_{i-1})^T \times L(\kappa^i_j)^T$$
This enables the network to generate high-resolution defect maps, automating the detection of anomalies in UAV drone remote sensing images. The integration of UAV drone data ensures real-time and comprehensive coverage of grid assets, while the algorithmic steps enhance accuracy.
To validate the method, I conducted experiments using a combined dataset of insulator images. The dataset includes 693 images sourced online and 373 images captured via UAV drone flights, totaling 1,066 images with various defects. The UAV drone was equipped with high-resolution cameras and positioning systems, flying at altitudes up to 4,000 meters with a flight time of 15 minutes. Overlap rates were set at 40% for航向 and 70% for旁向 to ensure image quality. Sample defects in the dataset are illustrated in the figure above, showcasing real-world scenarios. The training set composition is summarized in Table 1, detailing the distribution of defect types.
| Training Set | String Drop | Surface Burn | String Tilt | Surface Crack |
|---|---|---|---|---|
| Set 1 | 19 | 34 | 45 | 26 |
| Set 2 | 21 | 19 | 17 | 6 |
| Set 3 | 51 | 26 | 36 | 67 |
| Set 4 | 101 | 84 | 96 | 50 |
| Set 5 | 98 | 91 | 73 | 106 |
Performance metrics include precision \(P\) and recall \(R\), defined as:
$$P = \frac{TP}{TP + FP}$$
$$R = \frac{TP}{TP + FN}$$
where \(TP\) is true positives, \(FP\) is false positives, and \(FN\) is false negatives. The training and validation loss curves, plotted over iterations, show consistent convergence without overfitting, as losses stabilize after 125 epochs. This indicates the method’s effectiveness in learning from UAV drone image data.
The precision-recall (P-R) curves for different defect types demonstrate high average precision (AP) values: 0.988 for surface burns, 0.955 for string drops, 0.965 for string tilts, and 0.975 for surface cracks, all exceeding 0.95. These results underscore the method’s capability to accurately identify defects in UAV drone images. To benchmark performance, I compared the proposed method against three existing algorithms: ER-YOLO-based defect recognition, SSIM-Sobel with multi-feature fusion, and lightweight target detection. The comparison, based on the mean average precision at an IoU threshold of 0.95 (mAP@0.95), is presented in Table 2.
| Recognition Algorithm | mAP@0.95 (%) |
|---|---|
| Proposed Method | 99.2 |
| ER-YOLO-based Algorithm | 91.3 |
| SSIM-Sobel with Multi-feature Fusion | 89.9 |
| Lightweight Target Detection Algorithm | 92.5 |
The proposed method achieves a mAP@0.95 of 99.2%, outperforming the alternatives. This superiority is attributed to the frequency-domain enhancement and CA attention module, which improve feature discrimination in complex UAV drone images. Qualitative results on UAV drone remote sensing images reveal that while all methods identify obvious defects like string drops and tilts, only the proposed method reliably detects subtle defects such as surface burns in cluttered backgrounds, thanks to its enhanced receptive fields from attention mechanisms.
Further analysis involves the mathematical formulation of the defect identification process. The frequency-domain transformation can be extended to handle multi-spectral UAV drone images by applying separate corrections for each band. For instance, if \(F(x, y, c)\) represents an image with \(c\) channels, the enhancement becomes:
$$F_Z(x, y, c) = \beta_c\{F(x, y, c)\}$$
$$F_X(x, y, c) = F_Z(x, y, c) \times \gamma_c$$
$$F_L(x, y, c) = \beta_c^{-1}\{F_X(x, y, c)\}$$
where \(\beta_c\) and \(\gamma_c\) are channel-specific transformations. This multi-channel approach leverages the full spectral information from UAV drone sensors, enhancing defect contrast. Additionally, the CA module can be adapted for multi-scale feature extraction by incorporating pyramid pooling, which is beneficial for small targets in UAV drone images. The attention maps can be computed at different resolutions:
$$Z^k_{h,s}(h) = \frac{1}{W_s} \sum_{i} x_{k,s}(h, i)$$
$$Z^k_{w,s}(w) = \frac{1}{H_s} \sum_{j} x_{k,s}(j, w)$$
where \(s\) denotes the scale level, and \(W_s\) and \(H_s\) are the dimensions at that scale. This multi-scale CA module captures defects of varying sizes, a common challenge in UAV drone imagery due to distance variations.
The loss function can be optimized further by adding a regularization term to prevent overfitting on UAV drone data. Let \(L_{reg}\) be a weight decay term proportional to the model parameters \(\theta\):
$$L_{reg} = \lambda \|\theta\|^2$$
The total loss then becomes:
$$L_{total} = L_{cls} + L_{ojb} + L_{CIoU} + L_{reg}$$
where \(\lambda\) is a hyperparameter. This regularization improves generalization to unseen UAV drone images. Moreover, the confidence loss can be refined by incorporating focal loss to address class imbalance, common in defect datasets where positive samples are scarce. The modified confidence loss \(L_{ojb}’\) is:
$$L_{ojb}’ = \sum_{i,j} \left[ -(1-g_0)^\gamma \lg(g_0) – (g_0)^\gamma \lg(1-g_0) \right]$$
with \(\gamma\) as a focusing parameter. These enhancements bolster the method’s robustness for UAV drone applications.
In terms of computational efficiency, the method is designed to be deployable on edge devices for real-time UAV drone inspections. The convolution operations can be optimized using depthwise separable convolutions, reducing parameters while maintaining accuracy. For an input feature map of size \(D_f \times D_f \times M\), a standard convolution with \(N\) kernels of size \(D_k \times D_k \times M\) has computational cost:
$$C_{std} = D_f \times D_f \times M \times N \times D_k \times D_k$$
Depthwise separable convolution splits this into depthwise and pointwise convolutions:
$$C_{ds} = D_f \times D_f \times M \times D_k \times D_k + D_f \times D_f \times M \times N$$
This reduction is crucial for processing high-resolution UAV drone images in real-time during flights. Additionally, the frequency-domain enhancement can be implemented via fast Fourier transform (FFT) algorithms, with complexity \(O(n \log n)\) for images of size \(n\), ensuring scalability.
The experimental setup also involved ablation studies to assess individual components. Table 3 summarizes the impact of each module on mAP@0.95 when applied to UAV drone images.
| Components | Frequency-Domain Enhancement | CA Attention Module | CIoU Loss | mAP@0.95 (%) |
|---|---|---|---|---|
| Baseline | No | No | No | 85.6 |
| Variant 1 | Yes | No | No | 90.1 |
| Variant 2 | No | Yes | No | 92.4 |
| Variant 3 | Yes | Yes | No | 96.8 |
| Proposed | Yes | Yes | Yes | 99.2 |
The results indicate that each component contributes significantly, with the full model achieving the highest performance. The frequency-domain enhancement improves image quality, the CA module refines feature extraction, and the CIoU loss optimizes bounding box regression, collectively enhancing UAV drone-based defect identification.
For broader applicability, the method can be extended to other grid equipment beyond insulators, such as transformers or towers, by adapting the feature extraction process. UAV drone images of these assets often share similar challenges, like small defects and complex backgrounds. The CA module can be trained on multi-class datasets to recognize various defect types. Suppose there are \(K\) equipment classes, each with defect labels. The classification loss can be generalized to multi-class cross-entropy:
$$L_{cls}^{multi} = -\sum_{k=1}^{K} y_k \log(\hat{y}_k)$$
where \(y_k\) is the ground-truth label and \(\hat{y}_k\) is the predicted probability for class \(k\). This flexibility makes the method suitable for comprehensive UAV drone inspection systems.
In deployment, UAV drone fleets can be coordinated to cover large grid networks autonomously. The inspection workflow involves: (1) UAV drone path planning using GPS waypoints, (2) image acquisition with overlap for stereo vision, (3) real-time transmission to ground stations or onboard processing, (4) defect identification via the proposed algorithm, and (5) alert generation for maintenance crews. The mathematical model for path planning can optimize coverage. Given \(N\) waypoints with coordinates \((x_i, y_i, z_i)\), the total distance \(D\) minimized is:
$$D = \sum_{i=1}^{N-1} \sqrt{(x_{i+1} – x_i)^2 + (y_{i+1} – y_i)^2 + (z_{i+1} – z_i)^2}$$
subject to constraints like altitude limits and battery life. This ensures efficient UAV drone operations for capturing high-quality remote sensing images.
The method’s scalability is further demonstrated through simulations on synthetic UAV drone images. By generating augmented data with varying lighting and weather conditions, the model’s robustness is tested. Data augmentation techniques include rotation, scaling, and color jittering, which mimic real-world UAV drone image variations. The training loss with augmentation \(L_{aug}\) can be expressed as:
$$L_{aug} = \mathbb{E}_{(x,y) \sim \mathcal{D}_{aug}} [L(x, y)]$$
where \(\mathcal{D}_{aug}\) is the augmented dataset. This reduces overfitting and improves generalization to diverse UAV drone environments.
Future work could integrate deep learning with physical models for defect prognosis. For example, crack propagation in insulators can be modeled using fracture mechanics, with stress intensity factors \(K_I\) calculated from UAV drone images:
$$K_I = \sigma \sqrt{\pi a} f\left(\frac{a}{W}\right)$$
where \(\sigma\) is stress, \(a\) is crack length, and \(f\) is a geometric factor. By combining such models with UAV drone data, predictive maintenance becomes feasible. Additionally, advancements in UAV drone technology, such as hyperspectral imaging, could provide richer data for defect analysis. Hyperspectral images have hundreds of bands, allowing for material characterization via spectral signatures \(S(\lambda)\):
$$S(\lambda) = \int_{\lambda_1}^{\lambda_2} R(\lambda) I(\lambda) d\lambda$$
where \(R(\lambda)\) is reflectance and \(I(\lambda)\) is illumination. Integrating this with the proposed method could enable detection of early-stage defects invisible in RGB UAV drone images.
In conclusion, the automated defect identification method for UAV drone remote sensing images of power grid equipment demonstrates high accuracy and robustness. By leveraging frequency-domain enhancement, coordinate attention, and optimized loss functions, it effectively identifies insulator defects like string drops, surface burns, tilts, and cracks, even in complex backgrounds. Experimental results on a combined dataset show a mAP@0.95 of 99.2%, outperforming existing algorithms. The method’s components synergize to enhance feature extraction and localization, making it suitable for real-world UAV drone inspection scenarios. With further extensions to multi-class equipment and integration with prognostic models, this approach holds promise for advancing smart grid maintenance and ensuring reliable power transmission through efficient UAV drone-based monitoring.
