Corn’s global significance across food security, bioenergy, and industrial sectors faces severe threats from pest infestations, particularly corn borer (Ostrinia furnacalis). Traditional field scouting methods suffer from scalability limitations and subjectivity. While deep learning advances pest detection, existing approaches exhibit critical gaps: Lab-based methods lack field scalability, while conventional UAV remote sensing struggles with small-target resolution. We introduce an agricultural UAV-enabled framework using low-altitude (5m) close-range RGB imaging coupled with YOLO-ESN (You Only Look Once Enhanced Small object Network), shifting detection focus from elusive insects to visually persistent boreholes—a practical adaptation for field deployment.
Materials and Methods
Data Acquisition via Agricultural UAV
Field campaigns utilized DJI Mavic3M agricultural UAVs equipped with 5.28MP RGB cameras across maize fields in arid continental climates (100.82°E, 38.43°N). Flight protocols maintained 5m altitude ensuring 1.18 cm/pixel resolution, capturing 467 raw images (11.7 MB/JPG). Post-processing split images into 2,802 sub-images (1,760×1,485 pixels), categorized by infestation severity: severe (191), moderate (769), mild (1,842). Class-imbalance mitigation employed Synthetic Minority Over-sampling Technique (SMOTE) and Random Under-Sampling (RUS), balancing each class to ≈800 samples. Dataset accessibility: nan.
YOLO-ESN Architectural Innovations
Building on YOLOv11, our YOLO-ESN introduces:
1. ELA (Enhanced Lightweight Attention) in Backbone: Replaces C2PSA for efficient spatial-channel feature enhancement. For input feature map \(X\), horizontal/vertical 1D convolutions generate attention weights:
$$Y_w = \text{Sigmoid}\left(\text{GN}\left(\text{Conv1D}\left(\text{AvgPool}_w(X)\right)\right)\right)$$
$$Y_h = \text{Sigmoid}\left(\text{GN}\left(\text{Conv1D}\left(\text{AvgPool}_h(X)\right)\right)\right)$$
Output refinement: \(Y = X \otimes Y_w \otimes Y_h\) where \(\otimes\) denotes element-wise multiplication. GN = GroupNorm.
2. P2 Detection Head: Augments Head with 320×320 high-resolution output leveraging shallow features for sub-8px boreholes.
3. Lightweight C3k2-SCConv in Neck: Integrates Spatial/Channel Reconstruction Convolution (SCConv) into C3k2. Spatial Reconstruction Unit (SRU) computes channel weights \(w_i\):
$$X_{\text{out}} = \text{GN}(X) = \gamma\frac{X – \mu}{\sqrt{\sigma^2 + \epsilon}} + \beta$$
$$w_i = \frac{\gamma_i}{\sum_{j=1}^C \gamma_j}, \quad i,j=1,2,\ldots,C$$
Channel Reconstruction Unit (CRU) partitions features via adaptive threshold \(\alpha\), optimizing computational efficiency.
4. NWD+EIoU Loss: Replaces CIoU to enhance small-target sensitivity. Normalized Wasserstein Distance (NWD) and Efficient IoU (EIoU) combine as:
$$W = \sqrt{(c_{px} – c_{tx})^2 + (c_{py} – c_{ty})^2} + \frac{(w_p – w_t)^2 + (h_p – h_t)^2}{4}$$
$$\mathcal{L}_{\text{wh}} = \frac{(w_p – w_t)^2}{w_c^2} + \frac{(h_p – h_t)^2}{h_c^2}$$
$$\mathcal{L} = \lambda_1\mathcal{L}_{\text{NWD}} + \lambda_2\mathcal{L}_{\text{EIoU}}$$
where \(\lambda_1, \lambda_2\) balance contributions.
Infestation Quantification
A decision tree classified sub-images using borehole counts \(N\):
- Mild: \(0 \leq N < 19\)
- Moderate: \(19 \leq N \leq 66\)
- Severe: \(N > 66\)
Thresholds minimized Mean Squared Error: \(\text{MSE} = \sum_i (y_i – \hat{y})^2\). Geospatial heatmaps visualized field-scale infestation distributions.
Experimental Configuration
Training used NVIDIA RTX 4090 GPUs with PyTorch 1.10.0. Phase 1 (30 epochs) applied SGD optimizer (lr=0.01, momentum=0.937). Phase 2 (500 epochs) employed AdamW (lr=0.01) at 1,280×1,280 resolution. Metrics included:
- Precision: \(\frac{\text{TP}}{\text{TP} + \text{FP}}\)
- Recall: \(\frac{\text{TP}}{\text{TP} + \text{FN}}\)
- mAP@50: mean Average Precision at IoU=0.5
- mAP@50:95: mAP averaged over IoU 0.5–0.95 (0.05 step)
Results and Analysis
Model Performance
YOLO-ESN achieved convergence at 100 epochs (validation loss=1.4). Final metrics:
| Metric | Value |
|---|---|
| Precision | 80.2% |
| Recall | 82.1% |
| mAP@50 | 88.6% |
| mAP@50:95 | 40.5% |
| Parameters | 8.37M |
| GFLOPs | 28.0 |
| FPS | 32.48 |
Ablation Study
Component contributions evaluated incrementally:
| Components | Params (M) | mAP@50 (%) | mAP@50:95 (%) |
|---|---|---|---|
| YOLOv11 (Baseline) | 9.46 | 81.0 | 35.6 |
| + ELA | 8.46 | 81.3 | 34.6 |
| + P2 Head | 9.57 | 85.1 | 36.8 |
| + SCConv | 9.22 | 81.9 | 35.7 |
| + NWD+EIoU | 9.46 | 82.9 | 36.8 |
| YOLO-ESN (All) | 8.37 | 88.6 | 40.5 |
Full integration reduced parameters by 11.52% while boosting mAP@50 by 7.6% and mAP@50:95 by 4.9%.
Comparative Analysis
Benchmarking against state-of-the-art models:
| Model | Precision (%) | Recall (%) | mAP@50 (%) | Params (M) |
|---|---|---|---|---|
| YOLOv8 | 72.4 | 79.2 | 81.3 | 11.14 |
| YOLOv11 | 73.9 | 79.8 | 81.0 | 9.46 |
| YOLOv12 | 67.9 | 74.6 | 77.5 | 9.10 |
| Faster R-CNN | 70.1 | 70.6 | 73.7 | 137.10 |
| SSD | 65.4 | 67.2 | 70.8 | 26.29 |
| YOLO-ESN | 80.2 | 82.1 | 88.6 | 8.37 |
YOLO-ESN outperformed contemporaries by 7.3–17.8% in mAP@50 with minimal parameters.
Field Deployment Validation
The decision tree achieved F1-scores of 0.906 (mild), 0.803 (moderate), and 0.842 (severe). Heatmaps generated from agricultural UAV data enabled precision spray planning, identifying high-risk zones (7.56% severe) and reducing chemical usage.
Conclusion
YOLO-ESN establishes an effective paradigm for agricultural UAV-based corn borer monitoring by targeting boreholes rather than insects. Integrating ELA attention, high-resolution P2 detection, and optimized loss functions achieved 88.6% mAP@50 with 8.37M parameters—ideal for edge deployment on agricultural drones. Future work will expand to multi-pest scenarios and real-time agricultural UAV fleet integration, advancing precision agroecology.
