An Enhanced Low-Altitude Detection Model for Pest-Infected Trees Using Agricultural Drones

The health of forest ecosystems is fundamental to environmental security and human well-being, serving as critical barriers against pollution and for carbon sequestration. However, the increasing prevalence of pests and diseases threatens this stability, leading to significant economic and ecological losses. Timely and accurate detection of infected trees is therefore paramount for effective forest management and protection. While traditional methods like manual surveys are labor-intensive and inefficient, and satellite遥感 offers limited spatial resolution, agricultural drone-based inspection presents a promising solution. Agricultural drones offer flexibility, cost-effectiveness, and the ability to capture high-resolution imagery from low altitudes, enabling rapid localization of affected areas.

Nevertheless, deploying high-precision detection models on agricultural drones for real-time, low-altitude operation poses significant challenges. The core issues involve a trade-off between detection accuracy and computational efficiency. Models must be lightweight enough to run on the constrained hardware of an agricultural drone while maintaining high precision to identify multi-scale targets (from single trees to clusters) amidst complex backgrounds, varying lighting, and potential occlusions. Common problems include missed detections of small or obscured targets and false positives triggered by confusing features like shadows, bare patches, or dead wood.

To address these challenges, this work proposes an improved object detection algorithm named DCA-YOLO, specifically designed for low-altitude agricultural drone applications. The model is built upon the YOLOv8n architecture and incorporates several key enhancements to optimize both performance and efficiency for the task of detecting pest-infected trees.

Core Methodological Improvements

The DCA-YOLO model integrates four principal modifications to the base YOLOv8n framework: a novel attention mechanism, a lightweight backbone, an enhanced feature fusion neck, and an optimized loss function.

1. Dynamic Channel Attention (DCA) Mechanism

Capturing discriminative features from agricultural drone imagery is crucial. We introduce the Dynamic Channel Attention (DCA) module to dynamically recalibrate channel-wise feature responses. It consists of two synergistic components: a Dynamic Channel Attention Module and a Dual-Branch Collaborative Attention Module.

The Dynamic Channel Attention Module uses a 1D convolution with a kernel size \(K\) that adapts based on the number of channels \(C\) to efficiently model cross-channel dependencies:

$$
K = \psi(C) = \bigg| \frac{\log_2(C) + b}{\gamma} \bigg|_{odd}
$$

where \(b\) and \(\gamma\) are constants, and \(|\cdot|_{odd}\) denotes the nearest odd number. Given an input feature map \(x\), the channel weights \(y\) are computed as:

$$
y = \sigma(\text{Conv1D}_{K}(\text{GAP}(x)))
$$

Here, \(\text{GAP}\) is Global Average Pooling, \(\text{Conv1D}_{K}\) is the 1D convolution with adaptive kernel \(K\), and \(\sigma\) is the Sigmoid activation function. The output is a channel-wise recalibration: \(\text{Output} = x \otimes y\), where \(\otimes\) denotes element-wise multiplication.

The Dual-Branch Collaborative Attention Module complements this by employing depthwise separable convolutions to generate Query (Q), Key (K), and Value (V) embeddings. A multi-head attention mechanism with a learnable temperature parameter \(\tau\) is then applied:

$$
\text{Attention}(Q, K, V) = \text{Softmax}\left(\frac{QK^T}{\sqrt{d_k} \cdot \tau}\right)V
$$

This allows the model to jointly focus on both global semantic context and fine-grained local details, which is vital for distinguishing infected trees from complex backgrounds in agricultural drone footage. The DCA module is strategically placed in the detection heads for the P2 and P4 feature levels to enhance multi-scale perception.

2. Lightweight Backbone with GhostNetV2

To ensure the model can run efficiently on an agricultural drone‘s edge computing device, we replace the original CSPDarknet backbone with GhostNetV2. GhostNetV2 introduces a Decoupled Fully Connected (DFC) attention mechanism within its Ghost modules. The DFC attention expands the receptive field efficiently by performing separate horizontal and vertical full connections, capturing long-range spatial dependencies crucial for identifying sparse or渐变 discoloration patterns on tree crowns. This results in a significant reduction in parameters and computational load while preserving essential feature representation capacity. The core bottleneck structure is shown below, integrating both the ghost operation for efficiency and the DFC branch for enhanced feature learning.

3. Enhanced Feature Pyramid Network (PC-BiFPN)

Low-altitude agricultural drone images contain targets of vastly different scales. To better fuse multi-scale features, we replace the original PANet with a modified Bidirectional Feature Pyramid Network (BiFPN). Our version, PC-BiFPN, incorporates a dedicated path for a higher-resolution P2 feature layer (160×160 pixels) to bolster small-target detection. Furthermore, it employs fast normalized fusion, assigning learnable weights to different input features during fusion:

$$
O = \frac{\sum_i w_i \cdot \text{Resize}(P_i)}{\sum_i w_i + \epsilon}
$$

where \(w_i\) are learnable weights for each input feature level \(P_i\), \(\text{Resize}\) denotes up/down-sampling, and \(\epsilon\) is a small constant. This allows the network to adaptively emphasize more important feature levels from the agricultural drone feed for the detection task.

4. Inner-IoU Loss Function

Accurate bounding box regression is challenging for small or irregular targets. We replace the default loss with Inner-IoU loss. It introduces an auxiliary bounding box \(B_{inner}\) derived from the ground truth box \(B_{gt} = (x_{gt}, y_{gt}, w_{gt}, h_{gt})\) by applying a scale factor \(r_{ratio}\) that dynamically adjusts based on the current IoU:

$$
r_{ratio} = r_{min} + (r_{max} – r_{min}) \cdot (1 – \text{IoU})
$$

$$
B_{inner} = (x_{gt}, y_{gt}, w_{gt} \cdot r_{ratio}, h_{gt} \cdot r_{ratio})
$$

The Inner-IoU is then computed between the prediction \(B_{pred}\) and this auxiliary box:

$$
\text{IoU}_{inner} = \frac{|B_{pred} \cap B_{inner}|}{|B_{pred} \cup B_{inner}|}, \quad \mathcal{L}_{Inner-IoU} = 1 – \text{IoU}_{inner}
$$

This mechanism forces the model to focus more on the central region of targets, improving localization accuracy for objects captured by the agricultural drone, especially when they are small or partially occluded.

Experimental Framework and Results

We constructed a dataset primarily from the PDT (Pests and Diseases Tree) dataset, comprising low-altitude agricultural drone imagery. After cleaning and extensive augmentation (including rotation, color jitter, blur simulation, and mosaic augmentation) to mimic real-world flight conditions, we used 3,500 images for training and 700 for validation/testing. All models were trained for 150 epochs with an input size of 640×640.

Ablation Study on Model Components

An ablation study was conducted to evaluate the contribution of each proposed module. Starting from the baseline YOLOv8n, we incrementally added components. The results, measured by precision (P), recall (R), mean Average Precision at IoU=0.5 (mAP@0.5), number of parameters, GFLOPs, and model size, are summarized below:

Model Configuration P (%) R (%) mAP@0.5 (%) mAP@0.5:0.95 (%) Params (M) GFLOPs Size (MB)
YOLOv8n (Baseline) 82.7 81.2 88.3 60.8 3.01 8.7 6.2
+ GhostNetV2 Backbone 80.8 79.6 87.6 59.8 1.86 5.7 4.5
+ GhostNetV2 + P2 Head 84.1 82.3 89.2 61.5 1.93 6.2 4.7
+ GhostNetV2 + PC-BiFPN 82.1 81.9 88.4 61.1 1.92 5.7 4.4
+ GhostNetV2 + DCA 84.1 83.7 89.4 61.5 2.21 5.8 4.7
+ GhostNetV2 + PC-BiFPN + P2 + DCA 86.2 86.8 91.2 66.2 2.34 6.3 5.1
DCA-YOLO (All Components) 87.6 87.4 93.1 67.6 2.34 6.4 5.2

The table clearly shows the progressive improvement. The GhostNetV2 backbone achieves significant lightweighting (22.3% fewer parameters, 34.5% fewer GFLOPs) with a modest performance drop. Subsequent additions of the P2 head, PC-BiFPN, and DCA attention effectively recover and then surpass the baseline accuracy. The final DCA-YOLO model achieves a 5.2 percentage point increase in mAP@0.5 and a 7.1 point increase in recall over the baseline, while maintaining 22.3% fewer parameters and 26.4% fewer GFLOPs. This demonstrates an excellent balance between accuracy and efficiency for agricultural drone deployment.

Comparison with State-of-the-Art Models

We compared DCA-YOLO against other popular lightweight object detection models on our test set. The results underscore its superiority for this specific application.

Model P (%) R (%) mAP@0.5 (%) mAP@0.5:0.95 (%) Params (M) GFLOPs
YOLOv5n 78.7 78.9 86.3 58.7 1.90 4.5
YOLOv6n 81.1 80.7 85.9 61.5 4.70 5.2
YOLOv7-Tiny 80.9 80.1 87.1 61.1 6.01 13.0
YOLOv8n (Baseline) 82.7 81.2 88.3 60.8 3.01 8.7
DCA-YOLO (Ours) 87.6 87.4 93.1 67.6 2.34 6.4

DCA-YOLO outperforms all compared lightweight models across key metrics—precision, recall, and mAP—while maintaining a highly competitive parameter count and computational cost. This makes it a compelling choice for real-time inference on an agricultural drone platform.

Visualization and Qualitative Analysis

Qualitative results further validate the improvements. The baseline YOLOv8n model often produces false positives on confusing elements like bare ground patches or shadows, and misses smaller or more obscured infected trees. In contrast, DCA-YOLO demonstrates markedly more robust detection. It successfully suppresses false alarms in high-exposure and sparse woodland scenes and identifies smaller, previously missed targets in dense forest canopies. The effectiveness of the DCA module is also visible in feature heatmaps; where the baseline model’s attention is scattered or misdirected towards background clutter, DCA-YOLO’s attention is more precisely concentrated on the actual areas of infected trees, indicating better feature discrimination learned from the agricultural drone imagery.

Conclusion

This work presents DCA-YOLO, an enhanced object detection model optimized for the real-time, low-altitude detection of pest-infected trees using agricultural drones. By integrating a Dynamic Channel Attention mechanism, a GhostNetV2 lightweight backbone, an improved PC-BiFPN feature fusion network, and an Inner-IoU loss function, the model achieves a superior balance between high accuracy and operational efficiency. Comprehensive experiments show that DCA-YOLO significantly outperforms the baseline and other state-of-the-art lightweight models, with notable gains in recall and mAP, while reducing model parameters and computational footprint. These improvements directly address the core challenges of miss-detections and false alarms in complex forestry environments. Therefore, DCA-YOLO is a highly suitable and effective solution for enabling agricultural drones to perform precise, efficient, and autonomous health monitoring in forest ecosystems, contributing to more sustainable and proactive forest management practices.

Scroll to Top