Target detection in low altitude drone operations faces significant challenges including mutual occlusion, small pixel coverage, and complex backgrounds. Conventional models struggle with feature extraction efficiency and real-time performance under these conditions. To address these limitations, we introduce HPRS-YOLO (High Precision and Refresh Rate Small Detection), an optimized framework based on YOLOv11n. This algorithm achieves a balance between accuracy (38.4% mAP0.5 on VisDrone2019) and speed (60 FPS on Jetson AGX Orin), specifically designed for low altitude UAV applications.
Architecture Innovations
SPMCC: Multi-Scale Contextual Feature Extraction
Replacing SPPF, our Spatial Pyramid Multi-Scale Common Convolution (SPMCC) eliminates pooling-induced information loss using dilated convolutions. For an input feature map \(X\), SPMCC applies parallel convolutions with dilation rates \(r_1=1\), \(r_2=3\), \(r_3=5\) following Hybrid Dilated Convolution principles:
$$ \text{Output} = \text{Concat}\left[\text{Conv}_{3\times3}^{r=1}(X), \text{Conv}_{3\times3}^{r=3}(X), \text{Conv}_{3\times3}^{r=5}(X)\right] $$
Weight sharing across branches reduces redundancy while expanding the effective receptive field by 186% compared to SPPF, crucial for detecting occluded targets in low altitude UAV imagery.

Metaformer-Enhanced C3K2 Modules
We integrate Transformer Block (C3K2_TF) and Conformer Block (C3K2_CF) to replace standard bottlenecks. For computational efficiency, C3K2_CF employs depthwise separable convolution:
$$ \text{Conv}_{\text{DS}}(X) = \text{Conv}_{\text{PW2}}\left(\text{Conv}_{\text{DW}}\left(\sigma(\text{Conv}_{\text{PW1}}(X))\right)\right) $$
where \(\sigma\) denotes activation functions. This decomposition reduces parameters by 17.8% while enhancing texture features for small targets in low altitude drone footage, evidenced by 34% higher activation intensity in small-object clusters.
Content-Aware Dynamic Upsampling
Replacing nearest-neighbor interpolation, Dysample generates offset \(O\) via linear projection and reshapes it to sampling grid \(S\):
$$ O = \text{Linear}_{C \rightarrow 2g s^2}(X^{0.25}) $$
$$ S = G + \text{PixelShuffle}(O) $$
The resampled feature \(X’\) is computed as:
$$ X’ = \text{GridSample}(X, S) $$
This dynamic adjustment suppresses boundary artifacts (Figure 1c), improving edge contrast by 40% for low altitude UAV targets against complex backgrounds.
Shallow Detail Focus Module (SDFM)
SDFM enables cross-scale calibration between neck input (\(F_{\text{in}}\)) and output (\(F_{\text{out}}\)) features:
$$ \alpha_i = \delta\left(\text{Pw-Conv}_n\left(\text{GAP}\left(\text{Concat}(F_{\text{in}}, F_{\text{out}})\right)\right)\right) $$
$$ F_{\text{fused}} = (\alpha_i \otimes F_{\text{in}}) \oplus ((1 – \alpha_i) \otimes F_{\text{out}}) $$
where \(\delta\) is sigmoid activation and \(\otimes/\oplus\) denote element-wise operations. This recovers 22% of missing spatial information for sub-20px targets in low altitude UAV datasets.
Experimental Validation
Setup and Metrics
Evaluated on VisDrone2019 and DOTA-v1.0 using:
$$ \text{mAP} = \frac{1}{N}\sum_{i=1}^{N} \int_0^1 p_i(r_i)dr_i, \quad \text{FPS} = \frac{1000}{\text{Pre} + \text{Infer} + \text{Post}} $$
Component | mAP0.5↑ | mAP0.5:0.95↑ | Params (M)↓ | FLOPs (G)↓ |
---|---|---|---|---|
Baseline (YOLOv11n) | 33.3 | 19.4 | 2.58 | 6.3 |
+SPMCC | 34.0 | 19.9 | 2.73 | 6.3 |
+C3K2_TF/CF | 35.8 | 20.9 | 2.42 | 6.5 |
+Dysample | 36.1 | 21.2 | 2.58 | 6.5 |
+SDFM (HPRS-YOLO) | 38.4 | 22.7 | 3.30 | 9.3 |
Performance Comparison
Model | mAP0.5 | mAP0.5:0.95 | FPS (Desktop) | FPS (Orin) |
---|---|---|---|---|
YOLOv8n | 31.9 | 18.4 | 232 | 32 |
YOLOv10n | 31.7 | 18.4 | 241 | 35 |
Drone-YOLO | 38.1 | 22.7 | – | – |
HPRS-YOLO | 38.4 | 22.7 | 212 | 60 |
Deployment on Jetson AGX Orin (FP16 precision) demonstrates real-time capability for low altitude UAV operations. The 2.0% mAP0.5 improvement on DOTA confirms robustness across aerial datasets.
Conclusion
HPRS-YOLO addresses critical challenges in low altitude drone target detection through: 1) Context-aware feature extraction (SPMCC), 2) Parameter-efficient feature enhancement (Metaformer-C3K2), 3) Boundary-preserving upsampling (Dysample), and 4) Cross-scale feature recovery (SDFM). The framework achieves state-of-the-art accuracy-speed balance for embedded low altitude UAV platforms, reducing small-target miss rate by 37% in occlusion scenarios. Future work will optimize sub-10px detection for urban drone operations.