Surveying drones and surveying UAVs face significant challenges in low-altitude environments due to obstacles like buildings, trees, and infrastructure that cause target occlusion. We propose an integrated framework combining lightweight vision algorithms with multi-UAV coordination to overcome these limitations. Our approach enables real-time detection of heavily occluded targets and dynamic multi-angle observation for persistent tracking.
For occluded target detection, we designed a pure-encoder Transformer model eliminating redundant decoder layers. The architecture comprises three components: a RepVGG backbone for feature extraction, Transformer encoders for attention-based feature enhancement, and a fully connected prediction head. The backbone processes input images (template: 128×128, search area: 5× bounding box size) using re-parameterized VGG blocks:
$$ \text{RepVGG}(I) = \text{Conv}_{3×3}(\text{BN}(\text{ReLU}(\text{Conv}_{1×1}(I)))) $$
Features are flattened into $[HW, B, C]$ and fed into Transformer encoders where multi-head attention computes target-context correlations:
$$ \text{Attention}(Q,K,V) = \sigma\left(\frac{QK^T}{\sqrt{d_k}}\right)V $$
$$ \text{MultiHead} = \text{Concat}(\text{head}_1,…,\text{head}_h)W^O $$
The prediction head outputs bounding box probabilities through four Conv-BN-ReLU layers. Corner expectations are calculated as:
$$ (\hat{x}_{tl}, \hat{y}_{tl}) = \left( \sum_{y=0}^H \sum_{x=0}^W x \cdot P_{tl}(x,y), \sum_{y=0}^H \sum_{x=0}^W y \cdot P_{tl}(x,y) \right) $$

For multi-UAV cooperative tracking, we introduce a visibility-aware observation planning system. First, target positions are estimated via ray intersection from multiple surveying drones:
$$ \min_{p’_t} \sum_{i=1}^N D(p’_t, l_i) $$
where $D$ denotes distance from estimated position $p’_t$ to observation ray $l_i$. Non-occluded regions $\Omega_i$ around the target are identified through radial scanning at angular intervals $\beta$:
$$ \Omega_i = \{\theta | \text{line}(p_t, p_{\theta}) \cap \text{obstacles} = \emptyset \} $$
Optimal observation points $P_{\text{obv}}$ are generated based on occlusion-free sectors. When UAV count exceeds visible regions, particle swarm optimization minimizes maximum observation angle:
$$ \text{minimize} \quad \alpha \sum_{i=1}^n \sum_{j=1}^n c_{ij}x_{ij} + \beta z $$
$$ \text{subject to} \quad \sum_{j=1}^n x_{ij} = 1, \sum_{i=1}^n x_{ij} = 1, z \geq c_{ij}x_{ij} $$
Paths are planned using hybrid A* with occlusion cost $C_{\text{occ}}$ and safety cost $S_d$:
$$ h^k(x) = D(p_k, p_{k+1}) + C_{\text{occ}} + S_d $$
Yaw control maintains target-centered perspectives for all surveying UAVs:
$$ \psi = \text{atan2}\left( e_y^T (p_i – p’_{ti}), e_x^T (p_i – p’_{ti}) \right) $$
Our occlusion detection method achieves superior performance compared to state-of-the-art approaches. Testing on LaSOT dataset and Gazebo simulations shows consistent tracking under 90% occlusion:
| Method | FPS (RTX 2060) | Accuracy | Max Occlusion Tolerance |
|---|---|---|---|
| Proposed | 80 | 54.3% | 90% |
| E.T.Tracker | 40 | 59.0% | 90% |
| DiMP18 | 55 | 53.1% | 90% |
| KCF | 75 | 17.8% | 50% |
Onboard performance demonstrates real-time capability for surveying UAV applications:
| Hardware | FPS | Resource Utilization |
|---|---|---|
| Jetson Xavier NX | 36 | GPU: 62%, CPU: 79.6% |
| de next-TGU8 | 26 | CPU: 407.3% (multi-core) |
Multi-UAV cooperative tracking reduces localization error by 62% compared to single-drone operations. In dense jungle flight tests, three surveying drones maintained continuous target coverage through coordinated viewpoint adjustment. The framework’s key innovations include:
- Computationally efficient occlusion handling (36 FPS on edge devices)
- Visibility-optimized formation control for surveying drones
- Integrated perception-planning for cluttered environments
Future work will focus on occlusion prediction and communication-efficient coordination for large surveying UAV swarms. The proposed system significantly advances persistent target monitoring capabilities for surveying drones operating in urban canyons, forested terrain, and industrial complexes where traditional approaches fail.
