Real-Time Target Detection for Low Altitude UAVs Using Enhanced HPRS-YOLO Algorithm

Target detection in low altitude drone operations faces significant challenges including mutual occlusion, small pixel coverage, and complex backgrounds. Conventional models struggle with feature extraction efficiency and real-time performance under these conditions. To address these limitations, we introduce HPRS-YOLO (High Precision and Refresh Rate Small Detection), an optimized framework based on YOLOv11n. This algorithm achieves a balance between accuracy (38.4% mAP0.5 on VisDrone2019) and speed (60 FPS on Jetson AGX Orin), specifically designed for low altitude UAV applications.

Architecture Innovations

SPMCC: Multi-Scale Contextual Feature Extraction

Replacing SPPF, our Spatial Pyramid Multi-Scale Common Convolution (SPMCC) eliminates pooling-induced information loss using dilated convolutions. For an input feature map \(X\), SPMCC applies parallel convolutions with dilation rates \(r_1=1\), \(r_2=3\), \(r_3=5\) following Hybrid Dilated Convolution principles:

$$ \text{Output} = \text{Concat}\left[\text{Conv}_{3\times3}^{r=1}(X), \text{Conv}_{3\times3}^{r=3}(X), \text{Conv}_{3\times3}^{r=5}(X)\right] $$

Weight sharing across branches reduces redundancy while expanding the effective receptive field by 186% compared to SPPF, crucial for detecting occluded targets in low altitude UAV imagery.

Metaformer-Enhanced C3K2 Modules

We integrate Transformer Block (C3K2_TF) and Conformer Block (C3K2_CF) to replace standard bottlenecks. For computational efficiency, C3K2_CF employs depthwise separable convolution:

$$ \text{Conv}_{\text{DS}}(X) = \text{Conv}_{\text{PW2}}\left(\text{Conv}_{\text{DW}}\left(\sigma(\text{Conv}_{\text{PW1}}(X))\right)\right) $$

where \(\sigma\) denotes activation functions. This decomposition reduces parameters by 17.8% while enhancing texture features for small targets in low altitude drone footage, evidenced by 34% higher activation intensity in small-object clusters.

Content-Aware Dynamic Upsampling

Replacing nearest-neighbor interpolation, Dysample generates offset \(O\) via linear projection and reshapes it to sampling grid \(S\):

$$ O = \text{Linear}_{C \rightarrow 2g s^2}(X^{0.25}) $$
$$ S = G + \text{PixelShuffle}(O) $$

The resampled feature \(X’\) is computed as:

$$ X’ = \text{GridSample}(X, S) $$

This dynamic adjustment suppresses boundary artifacts (Figure 1c), improving edge contrast by 40% for low altitude UAV targets against complex backgrounds.

Shallow Detail Focus Module (SDFM)

SDFM enables cross-scale calibration between neck input (\(F_{\text{in}}\)) and output (\(F_{\text{out}}\)) features:

$$ \alpha_i = \delta\left(\text{Pw-Conv}_n\left(\text{GAP}\left(\text{Concat}(F_{\text{in}}, F_{\text{out}})\right)\right)\right) $$
$$ F_{\text{fused}} = (\alpha_i \otimes F_{\text{in}}) \oplus ((1 – \alpha_i) \otimes F_{\text{out}}) $$

where \(\delta\) is sigmoid activation and \(\otimes/\oplus\) denote element-wise operations. This recovers 22% of missing spatial information for sub-20px targets in low altitude UAV datasets.

Experimental Validation

Setup and Metrics

Evaluated on VisDrone2019 and DOTA-v1.0 using:

$$ \text{mAP} = \frac{1}{N}\sum_{i=1}^{N} \int_0^1 p_i(r_i)dr_i, \quad \text{FPS} = \frac{1000}{\text{Pre} + \text{Infer} + \text{Post}} $$

Component mAP0.5 mAP0.5:0.95 Params (M)↓ FLOPs (G)↓
Baseline (YOLOv11n) 33.3 19.4 2.58 6.3
+SPMCC 34.0 19.9 2.73 6.3
+C3K2_TF/CF 35.8 20.9 2.42 6.5
+Dysample 36.1 21.2 2.58 6.5
+SDFM (HPRS-YOLO) 38.4 22.7 3.30 9.3

Performance Comparison

Model mAP0.5 mAP0.5:0.95 FPS (Desktop) FPS (Orin)
YOLOv8n 31.9 18.4 232 32
YOLOv10n 31.7 18.4 241 35
Drone-YOLO 38.1 22.7
HPRS-YOLO 38.4 22.7 212 60

Deployment on Jetson AGX Orin (FP16 precision) demonstrates real-time capability for low altitude UAV operations. The 2.0% mAP0.5 improvement on DOTA confirms robustness across aerial datasets.

Conclusion

HPRS-YOLO addresses critical challenges in low altitude drone target detection through: 1) Context-aware feature extraction (SPMCC), 2) Parameter-efficient feature enhancement (Metaformer-C3K2), 3) Boundary-preserving upsampling (Dysample), and 4) Cross-scale feature recovery (SDFM). The framework achieves state-of-the-art accuracy-speed balance for embedded low altitude UAV platforms, reducing small-target miss rate by 37% in occlusion scenarios. Future work will optimize sub-10px detection for urban drone operations.

Scroll to Top