YOLO-WRC-based UAV Detection Method for Spontaneous Combustion in Open-pit Coal Seams

In recent years, the application of unmanned aerial vehicles (UAVs) in industrial monitoring has gained significant traction, particularly in open-pit mining areas. China’s rapid advancement in UAV drone technology has enabled cost-effective, flexible, and high-precision data acquisition, surpassing traditional measurement and remote-sensing techniques. However, detecting spontaneous combustion in open-pit coal seams using China UAV drone systems presents unique challenges, including the lack of specialized detection models for high-temperature points, low recognition accuracy for small-scale and multi-scale thermal anomalies, and confusion between exhaust-pipe temperatures from machinery and actual coal seam hotspots. To address these issues, this study proposes an enhanced detection method based on YOLO-WRC, which integrates wavelet transform convolution, a reparameterized generalized feature pyramid network, a lightweight distributed focus detection head, and the PIoUv2 loss function. This approach aims to improve the robustness and adaptability of China UAV drone systems in identifying spontaneous combustion hotspots under complex environmental conditions.

The use of China UAV drone platforms for monitoring open-pit coal mines offers advantages such as reduced operational costs, real-time data collection, and enhanced safety by minimizing human exposure to hazardous areas. In China, where coal mining is a critical industry, leveraging UAV drone technology for early detection of spontaneous combustion can mitigate resource waste, environmental pollution, and economic losses. However, existing infrared image-based detection methods often struggle with the nuances of thermal data, such as subtle temperature variations and scale diversity. This study focuses on refining the YOLOv8n model—a lightweight and efficient framework suitable for deployment on resource-constrained China UAV drone systems—by incorporating novel modules that enhance feature extraction, fusion, and regression capabilities.

To provide a comprehensive understanding, this article is structured as follows: first, the methodological improvements in YOLO-WRC are detailed, including the integration of wavelet transform convolution, the reconstruction of the neck network, the introduction of a focused detection head, and the adoption of an advanced loss function. Second, the experimental setup is described, covering dataset creation, evaluation metrics, and comparison with mainstream models. Third, results are analyzed through ablation studies and visualization outputs. Finally, conclusions are drawn regarding the efficacy of the proposed method for China UAV drone applications in open-pit coal seam monitoring.

The core innovation of YOLO-WRC lies in its modular enhancements to the baseline YOLOv8n model. The backbone network incorporates wavelet transform convolution (WTConv) to replace standard convolutions in the C2f modules, forming C2f-WT blocks. This modification leverages Daubechies-6 discrete wavelet transform basis functions to decompose feature maps into multiple frequency components, allowing the model to capture both low-frequency information (e.g., broad temperature trends) and high-frequency details (e.g., local thermal anomalies). The process can be represented mathematically as follows: for an input feature map \(X\), wavelet transformation yields components such as \(X_{LL}^{(1)}\), \(X_{LH}^{(1)}\), \(X_{HL}^{(1)}\), and \(X_{HH}^{(1)}\), where subscripts denote low (L) and high (H) frequencies. Convolution is then applied separately to each component, enhancing the model’s ability to discern multi-scale hotspots. The wavelet transform operation is defined as:

$$ \text{WT}(X) = \{X_{LL}, X_{LH}, X_{HL}, X_{HH}\} $$

where each component is processed through dedicated convolutional layers. This approach not only expands the receptive field but also reduces computational costs, making it ideal for China UAV drone platforms with limited resources.

For the neck network, the original feature pyramid network (FPN) in YOLOv8n is replaced with a reparameterized generalized feature pyramid network (RepGFPN). This restructuring enhances feature extraction and fusion by incorporating cross-layer connections and branch structures within CSPStage modules. The RepGFPN uses reparameterized convolutions (RepConv) to merge multiple convolution operations into a single inference-time operation, reducing latency while maintaining accuracy. The feature fusion process in RepGFPN can be summarized as:

$$ F_{\text{out}} = \text{Concat}(\text{Conv}_{1\times1}(F_{\text{branch1}}), \text{RepConv}(F_{\text{branch2}})) $$

where \(F_{\text{branch1}}\) retains local and global information via convolution sampling, and \(F_{\text{branch2}}\) undergoes iterative feature fusion. This design mitigates information loss from shallow to deep layers, improving the detection of small and easily confused targets—a common issue in China UAV drone infrared imagery.

The detection head is upgraded to a concentrated layerwise localization attention head (CLLAHead), which integrates cross-layer latent attention (CLLA) and distribution focal loss (DFL). CLLA performs local self-attention within restricted regions, reducing computational overhead while adaptively fusing contextual information across different feature levels. The attention mechanism computes query (\(Q\)), key (\(K\)), and value (\(V\)) vectors from unified feature maps, with weights derived from a softmax function:

$$ \text{Attention}(Q, K, V) = \text{Softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V $$

where \(d_k\) is the dimensionality of \(K\). Meanwhile, DFL transforms bounding box regression from a single real-value prediction to a discrete probability distribution, enhancing localization precision for ambiguous targets. For a coordinate \(x\), the predicted logits \(z_i\) are converted to probabilities \(P(i)\) via softmax:

$$ P(i) = \frac{\exp(z_i)}{\sum_{j=0}^{15} \exp(z_j)} $$

and the final coordinate is computed as the expectation:

$$ \text{coord} = \sum_{i=0}^{15} P(i) \cdot i $$

This is particularly beneficial for China UAV drone systems operating in low-resolution or noisy infrared environments.

The regression loss function is changed from CIoU to PIoUv2, which introduces a target-size adaptive penalty factor to better handle multi-scale objects. The PIoUv2 loss is defined as:

$$ \mathcal{L}_{\text{PIoUv2}} = 1 – \text{IoU} + \frac{\rho^2(b, b^{gt})}{c^2} + \alpha v $$

where \(\rho\) is the Euclidean distance between predicted and ground-truth box centers, \(c\) is the diagonal length of the smallest enclosing box, and \(v\) is a consistency measure for aspect ratios. The adaptive factor \(\alpha\) adjusts based on target dimensions, improving regression performance for varying hotspot scales in open-pit coal seams.

To validate the proposed YOLO-WRC method, a dedicated dataset named UAV-OCST was created using a DJI M350 UAV equipped with a Zenmuse H20T thermal camera. The dataset simulates open-pit coal seam environments with artificial hotspots and exhaust-pipe mimics, collected under diverse conditions (e.g., day/night, seasonal variations, and adverse weather). It comprises 10,000 annotated images split into training, validation, and test sets at a 7:2:1 ratio. The experimental environment included an Intel Core i7 CPU, NVIDIA RTX 4060 GPU, PyTorch framework, and parameters such as SGD optimizer with momentum 0.937, initial learning rate 0.01, batch size 16, and 300 training epochs. Evaluation metrics included precision, recall, F1 score, mAP@0.5, parameter count, FLOPs, and inference speed (frames per second).

The following table summarizes the comparison of feature extraction modules, demonstrating the superiority of C2f-WT in the YOLO-WRC framework:

Module Precision (%) Recall (%) F1 (%) mAP@0.5 (%)
C2f (Baseline) 86.9 87.9 88.04 92.2
C2f-MLCA 87.2 88.6 87.90 92.4
C2f-Light 87.3 85.2 86.24 93.0
C2f-UIB 86.4 88.6 87.49 92.7
C2f-EMIB 87.5 88.2 87.85 92.3
C2f-CA 87.3 89.5 88.38 92.6
C2f-CIB 87.6 88.6 88.10 92.1
C2f-WT (Proposed) 87.9 89.4 88.75 93.1

Another table compares different detection heads, highlighting the effectiveness of CLLAHead in balancing precision and recall:

Detection Head Precision (%) Recall (%) F1 (%) mAP@0.5 (%)
C2f-WT Only 87.9 89.4 88.75 93.1
HATHead 86.8 87.8 87.30 93.5
RepHead 87.7 88.7 88.20 93.4
FRMHead 80.2 81.5 80.84 88.6
AFPNHead 84.3 86.7 85.48 89.6
RFAHead 85.6 89.3 87.41 90.5
OBB 86.8 88.3 87.54 92.6
CLLAHead (Proposed) 87.8 89.8 89.08 93.7

Ablation studies were conducted to assess the incremental contributions of each module in YOLO-WRC. The table below presents results where A denotes C2f-WT, B denotes RepGFPN, C denotes CLLAHead, and D denotes PIoUv2 loss function:

Model Configuration Precision (%) Recall (%) F1 (%) mAP@0.5 (%) Params (M) FLOPs (G) Speed (FPS)
Baseline (YOLOv8n) 86.9 87.9 88.04 92.2 3.02 8.2 234.25
A only 87.9 89.4 88.75 93.1 2.62 6.9 233.64
B only 87.4 88.5 87.95 92.5 3.29 8.4 252.96
C only 87.3 88.7 87.99 92.3 3.03 7.6 242.83
D only 83.2 89.0 86.00 94.1 3.06 8.9 202.41
A+B 87.8 89.8 89.08 93.7 2.90 7.4 223.95
A+C 87.4 89.4 86.90 93.5 2.64 6.3 205.21
A+D 84.9 89.0 86.90 94.6 2.62 6.9 205.21
B+C 86.7 88.6 87.64 91.8 3.30 8.5 243.53
B+D 85.9 89.3 87.57 94.3 3.29 8.4 246.45
C+D 87.1 88.9 87.99 93.6 3.03 7.6 239.56
A+B+C 88.4 89.8 89.10 94.4 2.92 7.5 215.63
A+B+D 87.8 88.9 88.35 93.9 2.90 7.4 226.75
B+C+D 87.6 89.4 88.49 94.1 3.30 8.5 245.43
YOLO-WRC (A+B+C+D) 88.2 90.1 89.13 95.4 2.92 7.5 216.32

These results indicate that the full YOLO-WRC model achieves the highest recall and mAP@0.5, with reductions in parameters and FLOPs, making it suitable for China UAV drone deployments. The integration of all modules contributes synergistically to improved detection performance.

Comparison with mainstream models further validates the superiority of YOLO-WRC. The table below includes traditional and state-of-the-art detectors, emphasizing the relevance for China UAV drone applications:

Model Precision (%) Recall (%) F1 (%) mAP@0.5 (%) Params (M) FLOPs (G) Speed (FPS)
Faster RCNN 71.6 79.4 75.30 79.6 41.20 156.2 38.60
SSD 78.6 80.1 79.34 84.2 26.50 73.1 51.50
YOLOv5 89.1 88.8 88.95 93.1 25.28 7.1 262.34
YOLOv6 89.2 89.1 89.14 93.9 162.36 44.2 189.35
YOLOv8n 86.9 87.9 88.04 92.2 3.02 8.2 234.25
YOLOv8s 88.0 89.2 88.60 92.3 11.14 28.4 177.89
YOLOv9c 87.8 89.4 88.59 91.9 253.49 102.5 65.36
YOLOv10n 88.8 89.3 89.04 92.9 2.61 8.2 335.24
NFZ-YOLOv10 88.3 89.1 88.70 93.5 2.15 6.2 368.17
ITD-YOLO 87.9 88.6 88.25 92.1 12.3 28.7 185.65
YOLO-MCSL 88.1 89.2 88.65 93.6 3.79 7.5 226.31
YOLO-WRC (Proposed) 88.2 90.1 89.13 95.4 2.92 7.5 216.32

YOLO-WRC outperforms others in recall and mAP@0.5, demonstrating high robustness for identifying abnormal high-temperature points in open-pit coal seams. Its balanced parameter efficiency and speed make it a compelling choice for integration into China UAV drone systems for real-time monitoring.

Visualization results further confirm that YOLO-WRC achieves higher confidence scores, detects targets missed by baseline models, and excels in recognizing small-sized and easily confused objects. For instance, in infrared images with subtle thermal variations, YOLO-WRC accurately distinguishes between coal seam hotspots and machinery exhausts, reducing false positives. This capability is crucial for China UAV drone operations, where environmental factors like weather and lighting can degrade image quality.

The adoption of advanced loss functions also plays a key role. Comparing PIoUv2 with alternatives like CIoU, DIoU, and GIoU, the following table highlights its effectiveness in the YOLO-WRC context:

Loss Function Precision (%) Recall (%) F1 (%) mAP@0.5 (%)
CIoU 88.4 89.4 88.90 94.1
DIoU 87.6 89.3 88.44 93.2
GIoU 89.2 89.6 89.40 93.7
PIoUv2 (Proposed) 88.2 90.1 89.13 95.4

In summary, the YOLO-WRC method offers significant improvements for UAV-based detection of spontaneous combustion in open-pit coal seams. By leveraging wavelet transform convolution, the model enhances feature representation for multi-scale targets; RepGFPN improves feature fusion and reduces information loss; CLLAHead focuses on localized attention and precise regression; and PIoUv2 adapts to varying object scales. These innovations collectively address the limitations of existing approaches, such as low accuracy for small hotspots and confusion between similar thermal sources.

For China UAV drone applications, this method provides a practical solution that balances accuracy, efficiency, and deployability. Future work could involve extending the dataset to include more diverse mining scenarios, optimizing the model for edge computing on UAVs, and exploring integration with other sensors like multispectral cameras. As China continues to invest in smart mining technologies, advancements in UAV drone detection systems will play a pivotal role in enhancing safety and sustainability in the coal industry.

The mathematical formulations underlying YOLO-WRC can be generalized for other thermal imaging tasks. For example, the wavelet transform convolution process can be expressed as a convolution operation in the frequency domain:

$$ Y = \sum_{c} \text{WT}^{-1}(\text{Conv}(\text{WT}(X_c))) $$

where \(X_c\) is the input channel, WT denotes wavelet transform, Conv is component-wise convolution, and WT\(^{-1}\) is the inverse transform. This allows for efficient computation on embedded systems commonly used in China UAV drone platforms.

Furthermore, the distribution focal loss (DFL) component ensures robust bounding box regression by modeling uncertainty in coordinates. The probability distribution \(P(i)\) over discrete bins enables smoother gradient flow during training, which is particularly beneficial for noisy infrared data. The loss for regression can be written as:

$$ \mathcal{L}_{\text{DFL}} = -\sum_{i} y_i \log(P(i)) $$

where \(y_i\) is the ground-truth distribution. Combined with the attention mechanisms in CLLAHead, this facilitates accurate localization of thermal anomalies captured by China UAV drone cameras.

In conclusion, the YOLO-WRC framework represents a step forward in automated detection systems for open-pit coal seam monitoring. Its design considerations—such as lightweight modules and adaptive loss functions—make it well-suited for deployment on China UAV drone fleets, contributing to proactive hazard management and operational efficiency in mining regions. Continued research in this area will likely yield further innovations, reinforcing the role of UAV drone technology in industrial automation and environmental protection.

Scroll to Top