The proliferation of unmanned aerial vehicles (UAVs) has introduced significant security challenges, necessitating the development of robust and reliable anti-drone systems. Traditional methods like radar, RF signal monitoring, and acoustic sensing often fall short due to limitations in detecting small, low-flying, or radio-silent drones against cluttered backgrounds. Consequently, computer vision-based object detection has emerged as a critical component in modern anti-drone solutions. Among various frameworks, the YOLO family, particularly YOLOv8, offers an excellent balance of speed and accuracy for real-time applications. However, directly applying standard models to the anti-drone task faces inherent difficulties: drones often appear as extremely small targets in images captured from a distance, leading to missed detections, false alarms, and reduced precision. To address these critical issues in anti-drone surveillance, this article presents YOLO-DAP, a significantly improved algorithm based on YOLOv8n, designed specifically for high-accuracy, real-time UAV detection.
The core challenge in vision-based anti-drone systems lies in the “small target” problem. When a camera maintains a wide field of view to track a fast-moving drone, the target occupies a minuscule portion of the image, sometimes fewer than 10×10 pixels. Deep convolutional networks, through successive pooling and striding operations, progressively reduce feature map resolution. While this is efficient for capturing high-level semantic features for larger objects, it inherently loses fine-grained spatial details crucial for identifying tiny drones. The original YOLOv8 architecture employs three detection heads at scales of 80×80, 40×40, and 20×20. For a 640×640 input image, the 80×80 head corresponds to a stride of 8, meaning it attempts to detect objects using features that have been downsampled three times already. For a sub-10-pixel drone, this results in a severe loss of discriminative information. Therefore, the first and most pivotal modification in YOLO-DAP is the architectural redesign for small target sensitivity.
We remove the P5 large-object detection layer entirely, which is responsible for the 20×20 head, as large drones are not the primary concern in typical anti-drone scenarios. More importantly, we introduce a new, higher-resolution detection pathway. The original heads are replaced with a novel set operating at 160×160, 80×80, and 40×40 resolutions. The new 160×160 head, with an effective stride of 4, is constructed by leveraging and fusing shallower, higher-resolution feature maps from the backbone network. This provides the detection head with richer, low-level feature information containing the precise edges and textures of small drones. The feature fusion strategy combines top-down and bottom-up aggregated features from the neck with same-scale features from the backbone, enhancing the robustness and representational power for微小 targets. This strategic shift forms the foundation for our anti-drone model’s improved detection capability.

To further empower the network to discern drones within complex sky backgrounds (e.g., against clouds, buildings, or birds), we enhance its feature extraction capability. The standard Bottleneck module in the C2f block is replaced with a novel Dilated-Wise Residual (DWR) attention module, forming a C2f_DWR structure. The DWR module is designed to capture multi-scale contextual information efficiently, which is vital for distinguishing drones of various sizes and orientations. It operates using a two-step process within a residual framework. First, it expands the channel dimension. Then, it applies depthwise convolutions with different dilation rates (e.g., D=3, D=5) to the feature maps, effectively enlarging the receptive field without increasing parameters or losing resolution. This allows the network to gather context from broader areas around a potential small drone target. The outputs from these parallel dilated paths are aggregated, processed, and then added to the original input via a residual connection. This encourages the network to learn more powerful and comprehensive feature representations specifically tuned for the nuances of anti-drone detection. We integrate this module strategically before the SPPF layer in the backbone, where features have a moderate resolution suitable for such multi-scale processing.
Efficient feature fusion across different scales is paramount for detection performance. The neck of YOLO, which typically uses standard convolutions for downsampling in the feature pyramid, is optimized in YOLO-DAP. We introduce the lightweight ADown (Asymmetric Downsampling) module. Unlike a standard 3×3 convolution with stride 2, ADown employs a more efficient asymmetric design. It first applies average pooling to reduce spatial sensitivity, then splits the feature maps. One branch undergoes a 3×3 convolution, while the other undergoes max pooling followed by a 1×1 convolution. The outputs are then summed. This design maintains effective feature transformation capabilities for fusion while reducing computational complexity and parameter count. By integrating ADown into the feature pyramid network (FPN) path, the model better fuses multi-scale feature maps from different depths, leading to more discriminative features for the final detection heads, all while contributing to the overall model’s lightweight nature—a key consideration for deploying anti-drone systems on edge devices.
The choice of bounding box regression loss function significantly impacts convergence speed and localization accuracy. YOLOv8 uses the CIoU loss, which considers overlap area, center point distance, and aspect ratio. However, CIoU has limitations for anti-drone applications. Its aspect ratio term becomes inactive when the predicted and target box ratios are equal, failing to penalize certain misalignments adequately. More critically, its gradient can sometimes lead to inefficient regression paths. We adopt the more powerful PIoU (Powerful IoU) loss function. PIoU introduces a target-size-adaptive penalty factor and a gradient modulation function based on anchor quality. The penalty factor \( P \) is defined based on the absolute distances between corresponding edges of the predicted and target boxes, normalized by the target’s dimensions:
$$ P = \left( \frac{|dw_1|}{w_{gt}} + \frac{|dw_2|}{w_{gt}} + \frac{|dh_1|}{h_{gt}} + \frac{|dh_2|}{h_{gt}} \right) / 4 $$
where \( w_{gt} \) and \( h_{gt} \) are the width and height of the ground-truth box, and \( dw_1, dw_2, dh_1, dh_2 \) are the absolute distances between the left, right, top, and bottom edges of the prediction and target boxes, respectively. The complete PIoU loss is then formulated as:
$$ L_{PIoU} = L_{IoU} + 1 – e^{-P^2}, \quad 0 \leq L_{PIoU} \leq 2 $$
This formulation provides more direct and effective gradient signals during training, guiding the bounding boxes to converge faster and more accurately towards small drone targets, which is essential for building a high-precision anti-drone detector.
To validate the effectiveness of each proposed component and the overall YOLO-DAP algorithm, comprehensive experiments were conducted on the public TIB-Net UAV dataset. The dataset contains 2850 images of various drones in diverse scenarios, including complex backgrounds and varying lighting conditions. It was split into training, validation, and test sets. We performed ablation studies, adding each improvement incrementally to the YOLOv8n baseline. The performance was measured using standard metrics: mean Average Precision (mAP@0.5), Precision (P), Recall (R), number of parameters, and inference speed in Frames Per Second (FPS). The results are summarized in the table below.
| Model Configuration | Precision (P) % | Recall (R) % | mAP % | Parameters | FPS |
|---|---|---|---|---|---|
| Baseline (YOLOv8n) | 89.9 | 76.4 | 84.9 | 3.16 x 10⁶ | 130.0 |
| + New Heads (No P5) | 88.8 | 90.9 | 92.0 | 1.25 x 10⁶ | 154.6 |
| + C2f_DWR Module | 87.0 | 77.3 | 85.1 | 3.12 x 10⁶ | 123.0 |
| + ADown Module | 88.1 | 81.0 | 86.0 | 3.02 x 10⁶ | 137.5 |
| + PIoU Loss | 88.2 | 80.0 | 86.9 | 3.16 x 10⁶ | 145.9 |
| YOLO-DAP (All Improvements) | 89.8 | 92.5 | 92.7 | 1.23 x 10⁶ | 135.2 |
The ablation study clearly demonstrates the contribution of each component. The most significant gain comes from the architectural change of removing the P5 layer and introducing the new high-resolution detection heads. This single modification boosts mAP by 7.1 percentage points, increases Recall dramatically by 14.5 points, reduces parameters by 1.91 million, and even improves inference speed. This underscores the critical importance of tailoring the detection architecture to the small-target nature of the anti-drone problem. The C2f_DWR module provides a steady gain in feature extraction quality. The ADown module offers a good balance of improved accuracy and reduced parameters. The PIoU loss function contributes to better convergence and localization. When combined, YOLO-DAP achieves a remarkable mAP of 92.7%, which is 7.8 points higher than the baseline YOLOv8n, while simultaneously reducing the parameter count by 1.93 million and maintaining a real-time inference speed of over 135 FPS. This makes YOLO-DAP not only more accurate but also more efficient and deployable.
We further compared YOLO-DAP with other mainstream YOLO-family models under identical experimental conditions on the TIB-Net anti-drone detection task. The results are presented in the following comparative table.
| Model | Precision (P) % | mAP % | Parameters | FPS |
|---|---|---|---|---|
| YOLOv3 | 89.2 | 85.1 | 4.20 x 10⁶ | 157.6 |
| YOLOv5 | 85.1 | 82.1 | 2.65 x 10⁶ | 148.0 |
| YOLOv6 | 84.1 | 77.7 | 4.50 x 10⁶ | 151.8 |
| YOLOv8n | 89.9 | 84.9 | 3.16 x 10⁶ | 130.0 |
| YOLO-DAP | 89.8 | 92.7 | 1.23 x 10⁶ | 135.2 |
The comparative analysis reveals that YOLO-DAP achieves the highest detection accuracy (mAP) by a substantial margin while having the smallest number of parameters among all models. Although YOLOv3 has a slightly higher FPS, its accuracy is significantly lower, and it is much larger in size. The balance achieved by YOLO-DAP—superior accuracy, minimal parameters, and real-time speed—makes it distinctly advantageous for practical anti-drone system deployment. Visual comparisons on challenging test images show that YOLO-DAP effectively reduces missed detections and false positives that commonly plague the baseline model in complex aerial scenes, proving its robustness and practicality for real-world anti-drone applications.
In conclusion, the task of detecting small, fast-moving drones against cluttered skies poses a significant challenge for vision-based anti-drone systems. The proposed YOLO-DAP algorithm presents a comprehensive and effective solution to this challenge. Through a targeted redesign that includes a new high-resolution detection pathway for tiny targets, an enhanced feature extraction module with multi-scale context (C2f_DWR), a lightweight and efficient feature fusion downsampler (ADown), and an advanced bounding box regression loss (PIoU), YOLO-DAP achieves state-of-the-art performance on the anti-drone detection task. Experimental results on the public TIB-Net dataset validate its effectiveness, showing a major leap in accuracy (92.7% mAP) alongside a significant reduction in model size, all while maintaining real-time processing capabilities. This work demonstrates that careful, problem-specific architectural modifications can yield dramatic improvements, advancing the field of real-time, vision-based anti-drone target detection. The YOLO-DAP framework provides a powerful, efficient, and practical tool for enhancing aerial security systems.
