Real-Time Target Detection for Low Altitude UAVs Using Enhanced HPRS-YOLO Algorithm

Target detection in low altitude drone operations faces significant challenges including mutual occlusion, small pixel coverage, and complex backgrounds. Conventional models struggle with feature extraction efficiency and real-time performance under these conditions. To address these limitations, we introduce HPRS-YOLO (High Precision and Refresh Rate Small Detection), an optimized framework based on YOLOv11n. This algorithm achieves a balance between accuracy (38.4% mAP_0.5 on VisDrone2019) and speed (60 FPS on Jetson AGX Orin), specifically designed for low altitude UAV applications.

Architecture Innovations

SPMCC: Multi-Scale Contextual Feature Extraction

Replacing SPPF, our Spatial Pyramid Multi-Scale Common Convolution (SPMCC) eliminates pooling-induced information loss using dilated convolutions. For an input feature map $X$, SPMCC applies parallel convolutions with dilation rates $r_1=1$, $r_2=3$, $r_3=5$ following Hybrid Dilated Convolution principles:

$$ \text{Output} = \text{Concat}\left[\text{Conv}_{3\times3}^{r=1}(X), \text{Conv}_{3\times3}^{r=3}(X), \text{Conv}_{3\times3}^{r=5}(X)\right] $$

Weight sharing across branches reduces redundancy while expanding the effective receptive field by 186% compared to SPPF, crucial for detecting occluded targets in low altitude UAV imagery.

Metaformer-Enhanced C3K2 Modules

We integrate Transformer Block (C3K2_TF) and Conformer Block (C3K2_CF) to replace standard bottlenecks. For computational efficiency, C3K2_CF employs depthwise separable convolution:

$$ \text{Conv}_{\text{DS}}(X) = \text{Conv}_{\text{PW2}}\left(\text{Conv}_{\text{DW}}\left(\sigma(\text{Conv}_{\text{PW1}}(X))\right)\right) $$

where $\sigma$ denotes activation functions. This decomposition reduces parameters by 17.8% while enhancing texture features for small targets in low altitude drone footage, evidenced by 34% higher activation intensity in small-object clusters.

Content-Aware Dynamic Upsampling

Replacing nearest-neighbor interpolation, Dysample generates offset $O$ via linear projection and reshapes it to sampling grid $S$:

$$ O = \text{Linear}_{C \rightarrow 2g s^2}(X^{0.25}) $$
$$ S = G + \text{PixelShuffle}(O) $$

The resampled feature $X’$ is computed as:

$$ X’ = \text{GridSample}(X, S) $$

This dynamic adjustment suppresses boundary artifacts (Figure 1c), improving edge contrast by 40% for low altitude UAV targets against complex backgrounds.

Shallow Detail Focus Module (SDFM)

SDFM enables cross-scale calibration between neck input ($F_{\text{in}}$) and output ($F_{\text{out}}$) features:

$$ \alpha_i = \delta\left(\text{Pw-Conv}_n\left(\text{GAP}\left(\text{Concat}(F_{\text{in}}, F_{\text{out}})\right)\right)\right) $$
$$ F_{\text{fused}} = (\alpha_i \otimes F_{\text{in}}) \oplus ((1 – \alpha_i) \otimes F_{\text{out}}) $$

where $\delta$ is sigmoid activation and $\otimes/\oplus$ denote element-wise operations. This recovers 22% of missing spatial information for sub-20px targets in low altitude UAV datasets.

Experimental Validation

Setup and Metrics

Evaluated on VisDrone2019 and DOTA-v1.0 using:

$$ \text{mAP} = \frac{1}{N}\sum_{i=1}^{N} \int_0^1 p_i(r_i)dr_i, \quad \text{FPS} = \frac{1000}{\text{Pre} + \text{Infer} + \text{Post}} $$

Component	mAP_0.5↑	mAP_0.5:0.95↑	Params (M)↓	FLOPs (G)↓
Baseline (YOLOv11n)	33.3	19.4	2.58	6.3
+SPMCC	34.0	19.9	2.73	6.3
+C3K2_TF/CF	35.8	20.9	2.42	6.5
+Dysample	36.1	21.2	2.58	6.5
+SDFM (HPRS-YOLO)	38.4	22.7	3.30	9.3

Performance Comparison

Model	mAP_0.5	mAP_0.5:0.95	FPS (Desktop)	FPS (Orin)
YOLOv8n	31.9	18.4	232	32
YOLOv10n	31.7	18.4	241	35
Drone-YOLO	38.1	22.7	–	–
HPRS-YOLO	38.4	22.7	212	60

Deployment on Jetson AGX Orin (FP16 precision) demonstrates real-time capability for low altitude UAV operations. The 2.0% mAP_0.5 improvement on DOTA confirms robustness across aerial datasets.

Conclusion

HPRS-YOLO addresses critical challenges in low altitude drone target detection through: 1) Context-aware feature extraction (SPMCC), 2) Parameter-efficient feature enhancement (Metaformer-C3K2), 3) Boundary-preserving upsampling (Dysample), and 4) Cross-scale feature recovery (SDFM). The framework achieves state-of-the-art accuracy-speed balance for embedded low altitude UAV platforms, reducing small-target miss rate by 37% in occlusion scenarios. Future work will optimize sub-10px detection for urban drone operations.