Improved YOLOv11-Based Algorithm for Small Target Detection in Maritime UAV Applications

Maritime search and rescue operations face significant challenges due to complex sea backgrounds, small target sizes, and limited computational resources on mobile platforms. We propose an enhanced YOLOv11 architecture specifically optimized for small target detection using surveying drones and surveying UAVs. Our approach addresses three critical limitations in existing models through novel structural improvements validated on the SeaDronesSee dataset.

Architectural Innovations

Wavelet Transform Effect Convolution (WTEConv)

Traditional convolution with enlarged kernels increases computational load quadratically. Our WTEConv integrates Shift-Wise operations with wavelet decomposition to maintain large receptive fields while minimizing parameters. For an input feature map $X \in \mathbb{R}^{C \times H \times W}$, we first apply channel-wise spatial shifting:

$$y(p) = \sum_{m=0}^{k_w-1} \sum_{n=0}^{k_h-1} w(p + \Delta p(m,n)) \cdot x(p + \Delta p(m,n) + p)$$
$$\Delta p(m,n) = \left( m – \left\lfloor \frac{k_w}{2} \right\rfloor, n – \left\lfloor \frac{k_h}{2} \right\rfloor \right)$$

followed by Haar wavelet decomposition:

$$[X_{LL}, X_{LH}, X_{HL}, X_{HH}] = f_{WT}(X)$$
$$f_{LL} = \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}, f_{LH} = \begin{bmatrix} 1 & 1 \\ -1 & -1 \end{bmatrix}, f_{HL} = \begin{bmatrix} 1 & -1 \\ 1 & -1 \end{bmatrix}, f_{HH} = \begin{bmatrix} 1 & -1 \\ -1 & 1 \end{bmatrix}$$

This dual approach expands receptive fields by 41% while reducing 5×5 convolution parameters by 30.8% compared to conventional implementations.

Multi-Branch Upsampling (MUpsample)

Standard upsampling suffers from feature quality degradation. Our MUpsample preserves spatial details through triple-branch processing:

$$X_{up} = \text{Up}(X_{l-1})$$
$$X_{cur} = \text{Conv}_{1×1}(X_l)$$
$$X_{next} = \text{WTConv}(X_{l+1})$$
$$X_{\text{MUpsample}} = \text{Concat}(X_{up}, X_{cur}, X_{next})$$

The structure maintains original feature dimensions while enhancing high-frequency information capture critical for surveying UAV operations.

Lightweight Dynamic Head (LDy Head)

We extend detection capabilities to 160×160 feature maps using attention mechanisms across three dimensions:

$$W = \pi_C(\pi_S(\pi_L(F) \cdot F) \cdot F) \cdot F$$
$$\pi_L(F) = \sigma(f(\text{AvgPool}(F)))$$
$$\pi_S(F) = \frac{1}{1 + e^{-\mathcal{G}(F)}}$$
$$\pi_C(F) = \text{Norm}(\mathcal{F}(\delta(\mathcal{F}(\text{AvgPool}(F))))$$

This configuration improves small-target sensitivity by 18.7% while adding only 0.2M parameters.

Experimental Validation

Performance Metrics

Evaluation on SeaDronesSee dataset demonstrates significant improvements:

Model	P(%)	R(%)	mAP50(%)	mAP50-90(%)	Params(M)	FPS	GFLOPs
YOLOv11n	77.8	57.8	62.8	37.3	2.58	70.3	6.3
+WTEConv	81.1	60.8	67.6	39.9	2.7	71.7	6.6
+MUpsample	78.9	66.6	71.0	38.6	2.5	79.7	7.8
+LDy Head	79.1	62.4	69.9	40.1	2.8	70.4	10.9
Full Model	85.5	69.5	75.2	42.7	2.2	89.8	9.7

Comparative Analysis

Benchmarking against state-of-the-art detectors confirms superiority in surveying drone applications:

Method	mAP50(%)	mAP50-90(%)	Params(M)	FPS
YOLOv8n	63.3	37.6	3.0	95.8
YOLOv10n	64.1	38.2	2.7	63.7
RT-DETR	71.3	38.4	19.8	48.8
Drone-YOLO	70.7	39.7	3.0	79.4
Ours	75.2	42.7	2.2	89.8

The precision-recall curve demonstrates consistent outperformance across confidence thresholds:

$$P = \frac{TP}{TP + FP}, \quad R = \frac{TP}{TP + FN}$$

with our model achieving 85.5% precision at 69.5% recall – critical for life-saving surveying UAV operations.

Mathematical Formulation

The complete enhancement workflow integrates all components:

$$\mathcal{F}_{\text{enhanced}} = \Phi_{\text{LDy}} \left( \Gamma_{\text{MUpsample}} \left( \Psi_{\text{WTEConv}} (I_{\text{input}}) \right) \right)$$
$$\Psi_{\text{WTEConv}} = \text{IWT} \left( \sum_{k=1}^{K} \text{WTConv} \left( \text{Shift-Wise}_k (X) \right) \right)$$
$$\Gamma_{\text{MUpsample}} = \mathcal{U} \left( \left[ \mathcal{C}_1(X_{l-1}), \mathcal{C}_2(X_l), \text{WTConv}(X_{l+1}) \right] \right)$$

where $\mathcal{U}$ denotes upsampling and $\mathcal{C}$ convolutional transformations.

Conclusion

Our enhanced YOLOv11 architecture significantly advances small-target detection capabilities for surveying drones operating in maritime environments. The integrated WTEConv, MUpsample, and LDy Head modules collectively improve mAP50 by 12.4% while reducing parameters by 14.7% compared to baseline YOLOv11n. This balanced approach enables real-time performance (89.8 FPS) on resource-constrained surveying UAV platforms, addressing critical challenges in maritime search and rescue operations. Future work will explore thermal imaging integration and multi-sensor fusion for all-weather deployment.