Surveying drones generate explosive growth of aerial imagery data, presenting unprecedented challenges for time-sensitive target detection in complex environments. Current deep learning-based detection methods face catastrophic forgetting when learning new target categories and overfitting when adapting to environmental interference like occlusion and lighting variations. This work introduces a novel multi-stage distillation framework that achieves state-of-the-art incremental detection performance for time-sensitive targets in surveying UAV imagery.

Our architecture employs a teacher-student framework with three synergistic modules: Wasserstein-based Inter-Class Distillation (WICD) minimizes catastrophic forgetting by quantifying distributional differences between categories, Prototype-Guided Intra-Class Consistency Distillation (PGICD) preserves intra-class invariance against environmental interference, and Cross-head Adaptive Distillation (CAD) dynamically balances classification-regression knowledge transfer. For surveying drone applications, this approach enables continuous adaptation to new targets while maintaining high precision for previously learned military vehicles, aircraft, and ships under challenging conditions.
Technical Framework
The system processes surveying UAV images through a transformer-based backbone network. Multi-scale features from the Feature Pyramid Network (FPN) feed into three specialized distillation modules:
1. Wasserstein Inter-Class Distillation (WICD)
WICD addresses catastrophic forgetting by modeling feature distributions using Gaussian mixtures and measuring inter-class divergence through Wasserstein distance. For class \(i\) features \(F_i\) and semantic queries \(Q_i\):
$$F_{\mu} = \frac{1}{C \times H \times W} \sum_{k=1}^{C \times H \times W} F_i[k]$$
$$F_{\Sigma} = \frac{1}{C \times H \times W} \sum_{k=1}^{C \times H \times W} (F_i[k] – F_{\mu})(F_i[k] – F_{\mu})^T$$
The Wasserstein distance between teacher (\(T\)) and student (\(S\)) distributions becomes:
$$D_{W}^F = \underbrace{||\mu_T – \mu_S||^2}_{\text{Mean term}} + \underbrace{\text{tr}(\Sigma_T + \Sigma_S – 2(\Sigma_T^{1/2}\Sigma_S\Sigma_T^{1/2})^{1/2})}_{\text{Covariance term}}$$
Table 1 validates WICD’s superiority over conventional distance metrics on surveying UAV datasets:
| Distance Metric | SIMD AP (%) | MAR20 AP (%) |
|---|---|---|
| Euclidean | 65.3 | 56.8 |
| Cosine | 56.4 | 48.1 |
| KL Divergence | 62.1 | 53.2 |
| Manhattan | 58.2 | 50.4 |
| Wasserstein (Ours) | 69.2 | 58.3 |
2. Prototype-Guided Intra-Class Consistency (PGICD)
PGICD combats overfitting by aligning class prototypes using Gaussian kernel similarity. For feature prototypes \(p^F\) and semantic prototypes \(p^Q\):
$$d_i^F = 1 – \exp\left(-\frac{||p_i^{F,T} – p_i^{F,S}||^2}{2\sigma \cdot (p_i^{F,T} \cdot p_i^{F,S})}\right)$$
The consistency loss enforces intra-class stability critical for surveying drones operating in variable conditions:
$$\mathcal{L}_{PGICD} = \lambda_1 \sum_{i=1}^k (d_i^F)^2 + \lambda_2 \sum_{i=1}^k (d_i^Q)^2$$
3. Cross-head Adaptive Distillation (CAD)
CAD dynamically balances knowledge transfer between classification and regression heads based on WICD/PGICD performance:
$$w_{cls} = \frac{1}{1 + \exp(-\alpha_{WICD} \cdot \mathcal{L}_{WICD} + \alpha_{PGICD} \cdot \mathcal{L}_{PGICD})}$$
$$w_{reg} = \frac{1}{1 + \exp(\beta_{WICD} \cdot \mathcal{L}_{WICD} – \beta_{PGICD} \cdot \mathcal{L}_{PGICD})}$$
The adaptive distillation loss becomes:
$$\mathcal{L}_{CAD} = \frac{w_{cls}}{|\mathcal{R}|} \sum_{r \in \mathcal{R}} \phi(r) \cdot \delta(p_{cls}^S(r), p_{cls}^T(r)) + \frac{w_{reg}}{|\mathcal{R}|} \sum_{r \in \mathcal{R}} \phi(r) \cdot \delta(p_{reg}^S(r), p_{reg}^T(r))$$
Experimental Validation
We evaluate on SIMD (15 categories) and MAR20 (20 aircraft categories) surveying UAV datasets under incremental scenarios:
| Method | SIMD 8+7 AP | MAR20 10+10 AP |
|---|---|---|
| Faster ILOD | 54.2 | 40.2 |
| ERD | 58.1 | 46.8 |
| ABR-IOD | 57.7 | 47.6 |
| Efficient-IOD | 66.7 | 53.7 |
| Ours | 70.8 | 60.2 |
| Upper Bound | 72.5 | 62.5 |
Our method achieves minimal performance gaps (1.7% absolute on SIMD, 2.3% on MAR20) compared to upper bounds. Ablation studies confirm each module’s contribution:
| Components | Step 3 AP (%) |
|---|---|
| Baseline | 56.2 |
| +WICD | 61.6 |
| +WICD+PGICD | 64.8 |
| +WICD+CAD | 65.3 |
| Full Model | 68.9 |
Conclusion
Our multi-stage distillation framework significantly advances time-sensitive target detection for surveying drones by simultaneously addressing catastrophic forgetting through Wasserstein-based distribution alignment and preventing overfitting via prototype-guided consistency. The adaptive knowledge transfer mechanism ensures robust performance across complex operational scenarios involving occlusion, scale variation, and environmental interference. Future work will optimize deployment efficiency for real-time surveying UAV applications and extend to video-based incremental detection.
