The proliferation of Unmanned Aerial Vehicle (UAV) drone swarms in modern conflicts has fundamentally altered reconnaissance and strike paradigms. To operate effectively in future contested maritime domains, these UAV drone collectives must possess advanced autonomy to navigate dense, dynamic obstacle fields, cooperatively track and circumnavigate non-cooperative high-value targets like warships, and rapidly generate actionable three-dimensional intelligence. This work presents an integrated methodology addressing these core challenges: intelligent cooperative trajectory planning fused with real-time dynamic 3D reconstruction. We propose a leader-follower architectural framework where a designated leader UAV drone locks onto and pursues a dynamic target, while follower drones maintain formation integrity. The swarm employs a context-aware formation strategy, switching between a line formation for efficient long-range pursuit and a circular formation for optimal multi-perspective observation upon target approach. A key innovation is our Replanning-Targeted Rapidly-exploring Random Tree Connect (RT-RRTC) algorithm, which enables the UAV drone swarm to dynamically replan paths around moving obstacles while continuously tracking the moving target, ensuring both mission success and survivability. Following successful approach, the UAV drone team executes a coordinated circumnavigation, with cameras fixated on the target. The imagery captured from these multiple, time-synchronized viewpoints is then processed using a 4D Gaussian Splatting (4D-GS) neural rendering framework, which explicitly models temporal dynamics to reconstruct a high-fidelity, dynamic 3D model of the non-cooperative vessel. Comprehensive simulation experiments in a realistic Gazebo-based maritime environment, populated with adversarial UAV drone obstacles and a maneuvering warship target, validate our approach. Results demonstrate high rates of successful obstacle avoidance and target tracking, efficient trajectory planning, and the generation of detailed 3D reconstructions with superior perceptual quality metrics compared to static reconstruction methods.

The operational landscape for UAV drone swarms is characterized by high dynamism and threat density. Traditional methods often treat path planning, target tracking, and reconstruction as sequential or isolated problems. For instance, standard path planners like RRT* may find paths to static goals but struggle when both the target and obstacles are in motion. Similarly, 3D reconstruction techniques like Structure from Motion (SfM) or Neural Radiance Fields (NeRF) typically assume static scenes, leading to severe artifacts when applied to moving objects. Our work bridges this gap by creating a tightly coupled loop: the planning module ensures the UAV drone team reaches and maintains an optimal observation posture around a moving target, while the reconstruction module consumes the resultant multi-view, temporal video stream to build a dynamic model. The leader-follower paradigm with mode-switching formations provides the necessary robustness and flexibility. The leader’s primary directive is target adherence, simplifying the coordination problem. Followers, in turn, focus on formation keeping relative to the leader and local obstacle avoidance, distributing the computational and sensing burden. The RT-RRTC planner enhances this by making the leader’s path planning reactive and efficient in cluttered, dynamic spaces. Finally, by employing 4D-GS for reconstruction, we leverage the spatio-temporal consistency of the UAV drone-captured data to overcome the limitations of static 3D modeling, enabling the creation of accurate 4D (3D + time) models crucial for post-mission analysis and decision-making.
System Overview and Problem Formulation
The mission objective for the cooperative UAV drone swarm is to autonomously detect, track, approach, and generate a 3D model of a non-cooperative maritime vessel while evading a field of mobile adversarial UAV drone obstacles. The high-level architecture integrates perception, decision-making, planning, control, and reconstruction modules. We assume the swarm has access to shared situational awareness, including the positions and velocities of the primary target and detected obstacles, as well as the states of all friendly UAV drones. The core challenge is to synthesize these inputs into safe, efficient, and task-effective multi-agent trajectories.
The dynamics of each quadrotor UAV drone are described by a state vector $\mathbf{x}$ and control input $\mathbf{u}$. The state encompasses position, velocity, attitude, and angular velocity:
$$
\mathbf{x} = [\mathbf{p}^T, \mathbf{v}^T, \boldsymbol{\eta}^T, \boldsymbol{\omega}^T]^T, \quad \mathbf{p}, \mathbf{v} \in \mathbb{R}^3, \quad \boldsymbol{\eta} = [\phi, \theta, \psi]^T, \quad \boldsymbol{\omega} \in \mathbb{R}^3
$$
where $\mathbf{p}=(x,y,z)$ is the position, $\mathbf{v}$ is the velocity, $\boldsymbol{\eta}$ contains the roll ($\phi$), pitch ($\theta$), and yaw ($\psi$) angles, and $\boldsymbol{\omega}$ is the body-frame angular velocity. The control input is the thrust $f$ and body torque $\boldsymbol{\tau}$: $\mathbf{u} = [f, \boldsymbol{\tau}^T]^T$. The nonlinear dynamics are given by:
$$
\begin{aligned}
\dot{\mathbf{p}} &= \mathbf{v} \\
\dot{\mathbf{v}} &= g\mathbf{e}_3 – (f/m)\mathbf{R}(\boldsymbol{\eta})\mathbf{e}_3 \\
\dot{\boldsymbol{\eta}} &= \mathbf{J}(\boldsymbol{\eta})\boldsymbol{\omega} \\
\mathbf{I}\dot{\boldsymbol{\omega}} &= \boldsymbol{\tau} – \boldsymbol{\omega} \times \mathbf{I}\boldsymbol{\omega}
\end{aligned}
$$
where $g$ is gravity, $m$ is mass, $\mathbf{R}$ is the rotation matrix from body to world frame, $\mathbf{e}_3=[0,0,1]^T$, $\mathbf{J}$ is the transformation matrix relating $\dot{\boldsymbol{\eta}}$ to $\boldsymbol{\omega}$, and $\mathbf{I}$ is the inertia tensor. A Nonlinear Model Predictive Controller (NMPC) is used for low-level flight control, solving a receding-horizon optimization problem to track reference trajectories $\mathbf{x}_{r,k}, \mathbf{u}_{r,k}$:
$$
\begin{aligned}
\min_{\mathbf{x}_{0:N}, \mathbf{u}_{0:N-1}} & \sum_{k=0}^{N-1} \left( \|\mathbf{x}_k – \mathbf{x}_{r,k}\|_{\mathbf{Q}}^2 + \|\mathbf{u}_k – \mathbf{u}_{r,k}\|_{\mathbf{R}}^2 \right) \\
\text{s.t.} \quad & \mathbf{x}_0 = \mathbf{x}(t_0), \\
& \mathbf{x}_{k+1} = f_{\text{discrete}}(\mathbf{x}_k, \mathbf{u}_k), \quad k=0,…,N-1,\\
& \mathbf{u}_{\min} \leq \mathbf{u}_k \leq \mathbf{u}_{\max}.
\end{aligned}
$$
The planning modules detailed in the following sections generate the reference trajectories $\mathbf{x}_{r,k}$ and $\mathbf{u}_{r,k}$ for this controller.
Context-Aware Cooperative Formation Design
Effective coordination of the UAV drone swarm necessitates formations tailored to distinct mission phases. We design two primary formation patterns and a logic for autonomous switching between them based on environmental context.
1. Line Formation ($F_a$): Used for rapid, long-range pursuit of the target when no immediate obstacles are present. The UAV drones align behind the leader along the direction of travel. This minimizes frontal cross-section and allows efficient traversal. The desired position for a follower UAV drone $i$ is:
$$
\mathbf{p}_{i,\text{des}}^{F_a} = \mathbf{p}_L – \mathbf{d}_i
$$
where $\mathbf{p}_L$ is the leader’s position and $\mathbf{d}_i$ is a fixed offset vector defining the follower’s position in the line relative to the leader.
2. Circular Formation ($F_c$): Activated when the swarm is in close proximity to the target vessel. The UAV drones distribute themselves evenly on a circle (or sphere) centered on the target’s estimated position. This maximizes spatial coverage and provides simultaneous multi-view observation, which is critical for high-quality 3D reconstruction. The desired position for UAV drone $i$ on a horizontal circle is:
$$
\mathbf{p}_{i,\text{des}}^{F_c}(t) = \mathbf{p}_{\text{target}}(t) +
\begin{bmatrix}
r \cos(\psi_0^i + \omega t)\\
r \sin(\psi_0^i + \omega t)\\
z_{\text{offset}}
\end{bmatrix}
$$
where $r$ is the orbit radius, $\psi_0^i$ is the initial phase angle for drone $i$, $\omega$ is the orbit angular rate, and $z_{\text{offset}}$ is a fixed altitude offset.
3. Formation Switching Logic: The swarm autonomously selects its formation based on distances to the target $d_{\text{goal}}$ and the nearest obstacle $d_{\text{obs}}$:
$$
F = \begin{cases}
F_b, & \text{if } d_{\text{obs}} < R_{\text{obs}} \quad \text{(Obstacle Avoidance Mode)}\\
F_c, & \text{else if } d_{\text{goal}} < R_{\text{orbit}}\\
F_a, & \text{otherwise}
\end{cases}
$$
Here, $R_{\text{obs}}$ is a threat radius triggering obstacle avoidance, and $R_{\text{orbit}}$ is the distance threshold for initiating the circular observation orbit. Mode $F_b$ is a transient, obstacle-avoidance formation where strict formation keeping is relaxed to prioritize collision avoidance; followers independently navigate around obstacles before reforming relative to the leader.
RT-RRTC: Dynamic Trajectory Planning for Tracking and Avoidance
The core planning challenge is for the leader UAV drone to generate a trajectory that continuously tracks the moving target while reactively avoiding multiple moving obstacles. Standard sampling-based planners like RRT or RRT* are inadequate as they typically plan to a static goal. We propose the Replanning-Targeted RRT-Connect (RT-RRTC) algorithm, an online, goal-directed planner that efficiently handles dual dynamics.
The algorithm operates in a receding-horizon fashion. At each planning cycle, it takes as input the leader’s current state $\mathbf{x}_L$, the predicted target state $\mathbf{\hat{x}}_T$, and the states of all obstacles $\{\mathbf{x}_{O,j}\}$. The core innovation is the use of a dynamic “local target point” $\mathbf{p}_{\text{local}}$ for the RRT-Connect tree expansion, instead of the actual target position. This local target is computed along the line connecting the target and the swarm’s home base, at a fixed look-ahead distance $L$ beyond the current target position projection. This biases exploration away from the obstacle cluster and towards a region that facilitates eventual re-formation.
$$
\mathbf{p}_{\text{local}} = \mathbf{p}_T + L \cdot \frac{\mathbf{p}_T – \mathbf{p}_{\text{base}}}{\|\mathbf{p}_T – \mathbf{p}_{\text{base}}\|}
$$
The RRT-Connect algorithm then grows two trees, $T_a$ from the UAV drone’s current position and $T_b$ from $\mathbf{p}_{\text{local}}$, attempting to connect them. To enhance efficiency in dynamic environments, the sampling range and step size are adaptive. The sampling range radius $r_s$ scales with the distance to the local target $d_{\text{local}}$:
$$
r_s = \min(r_{\text{max}}, \alpha \cdot d_{\text{local}})
$$
The connection step size $s$ is similarly adapted: $s = \beta \cdot d_{\text{local}}$. This allows large, exploratory steps when far from the goal and finer, more precise motions when near. The planner also incorporates velocity obstacles or a simple dynamic collision check to reject samples that would lead to a future collision with moving obstacles. Once a feasible path $\mathcal{P}$ is found, only the first segment (from the root to the first node) is committed as the immediate waypoint $\mathbf{p}_{b}$ for the leader’s NMPC. The root of $T_a$ is then reset to this new node, and the planning cycle repeats, enabling continuous adaptation. This “step-by-step” commitment is crucial for maintaining reactivity. The complete algorithm pseudo-code is summarized below.
Algorithm 1: RT-RRTC for Leader UAV Drone
1: Input: Current state $\mathbf{x}_L$, Target state $\mathbf{\hat{x}}_T$, Obstacle states $\{\mathbf{x}_{O,j}\}$.
2: Output: Next waypoint $\mathbf{p}_{b}$.
3: Compute local target point $\mathbf{p}_{\text{local}}$ using Eq.(3).
4: Initialize trees $T_a$ with root $\mathbf{x}_L$, $T_b$ with root $\mathbf{p}_{\text{local}}$.
5: while not connected and within computation budget do
6: Adapt sampling range $r_s$ and step size $s$.
7: Generate random sample $\mathbf{p}_{\text{rand}}$ in sphere around $T_a$ root.
8: Find nearest node $\mathbf{p}_{\text{near}}$ in $T_a$ to $\mathbf{p}_{\text{rand}}$.
9: Steer from $\mathbf{p}_{\text{near}}$ towards $\mathbf{p}_{\text{rand}}$ by $s$ to get $\mathbf{p}_{\text{new}}$.
10: if edge $(\mathbf{p}_{\text{near}}, \mathbf{p}_{\text{new}})$ is collision-free then
11: Add $\mathbf{p}_{\text{new}}$ and edge to $T_a$.
12: Try to connect $T_a$ to $T_b$ from $\mathbf{p}_{\text{new}}$.
13: end if
14: end while
15: if path $\mathcal{P}$ found then
16: Extract first node $\mathbf{p}_{\text{first}}$ from $\mathcal{P}$ after root.
17: Return $\mathbf{p}_{b} = \mathbf{p}_{\text{first}}$ as the next waypoint.
18: else
19: Execute a safety maneuver (e.g., hover or retreat).
20: end if
For the target tracking mode ($F_a$) when far from obstacles, the leader’s reference is simply a point ahead of the target: $\mathbf{p}_{a} = \mathbf{p}_T + \Delta \mathbf{v}_T$, where $\Delta$ is a constant time look-ahead. For the circular observation mode ($F_c$), the reference is given directly by the orbit equation $\mathbf{p}_{c}(t)$ from Eq.(2). The yaw reference $\psi_{\text{des}}$ is always set to point the UAV drone’s camera at the target: $\psi_{\text{des}} = \text{atan2}(y_T – y, x_T – x)$.
Dynamic 3D Reconstruction via 4D Gaussian Splatting
Once the UAV drone swarm establishes a stable circular formation around the moving vessel, the multi-view video streams are fused to create a dynamic 3D model. Traditional 3D reconstruction techniques like COLMAP (SfM + MVS) or static NeRF models fail on dynamic scenes. We employ 4D Gaussian Splatting (4D-GS), which extends the recent, highly efficient 3D-GS framework to model time-varying scenes.
3D-GS represents a scene as a set of anisotropic 3D Gaussians. Each Gaussian $G_i$ is defined by a mean position $\boldsymbol{\mu}_i \in \mathbb{R}^3$, a covariance matrix $\boldsymbol{\Sigma}_i$ (modeled by scaling and rotation), an opacity $\alpha_i$, and spherical harmonics coefficients $\mathbf{SH}_i$ for view-dependent color. Rendering is performed via tile-based rasterization through alpha-blending splats.
$$
C(\mathbf{p}) = \sum_{i \in \mathcal{N}} c_i \alpha_i \prod_{j=1}^{i-1} (1 – \alpha_j)
$$
where $\mathcal{N}$ is the set of Gaussians sorted by depth, and $c_i$ is the color from $\mathbf{SH}_i$ evaluated at the viewing direction.
4D-GS introduces a temporal dimension by modeling the motion of each Gaussian. The position $\boldsymbol{\mu}_i(t)$ and potentially other attributes like rotation or scale become functions of time. A common parameterization is to use a compact neural deformation field $D(\boldsymbol{\mu}_i, t; \Theta)$ that outputs a displacement $\Delta \boldsymbol{\mu}_i(t)$:
$$
\boldsymbol{\mu}_i(t) = \boldsymbol{\mu}_i^0 + D(\boldsymbol{\mu}_i^0, t; \Theta)
$$
where $\boldsymbol{\mu}_i^0$ is a canonical position and $\Theta$ are learned network weights. During training, we optimize the parameters of the Gaussians in canonical space ($\boldsymbol{\mu}_i^0, \boldsymbol{\Sigma}_i^0, \alpha_i^0, \mathbf{SH}_i^0$) alongside the deformation network parameters $\Theta$. The loss function combines photometric loss and a regularization term:
$$
\mathcal{L} = \sum_{\mathbf{r}, t} \|\hat{C}(\mathbf{r}, t) – C(\mathbf{r}, t)\|_1 + \lambda_{tv} \mathcal{L}_{tv}(\Delta \boldsymbol{\mu})
$$
where $\hat{C}$ and $C$ are the rendered and ground-truth images for ray $\mathbf{r}$ at time $t$, and $\mathcal{L}_{tv}$ is a temporal smoothness (total variation) loss on the displacements to encourage coherent motion. The input to this pipeline is the set of synchronized, calibrated videos from all observing UAV drones, along with their camera poses estimated via onboard visual-inertial odometry or relative pose estimation. The output is a 4D model that can be rendered from any viewpoint at any captured time, providing a comprehensive dynamic digital twin of the target vessel.
Simulation Experiments and Results
We validate our integrated approach in a high-fidelity simulation environment built on ROS and Gazebo. The scene features a moving warship target and 11 adversarial UAV drones patrolling as dynamic obstacles. Our friendly swarm consists of 3 UAV drones equipped with simulated cameras and depth sensors.
1. Trajectory Planning and Formation Performance: We compare our RT-RRTC planner against two baselines: a standard Dynamic-RRT (planning to the moving target’s instantaneous position) and an improved RRT (I-RRT) with adaptive sampling but no local targeting or connect heuristic. Performance is evaluated over 500 trials in environments with varying obstacle densities (3, 6, and 12 dynamic obstacles). Key metrics are success rate (reaching the target orbit without collision) and average single-planning-cycle computation time.
| Planner | 3 Obstacles (Success%, Time ms) | 6 Obstacles (Success%, Time ms) | 12 Obstacles (Success%, Time ms) |
|---|---|---|---|
| Dynamic-RRT | 39.8, 650 | 43.6, 670 | 40.6, 720 |
| I-RRT | 93.2, 80 | 85.4, 100 | 80.0, 140 |
| RT-RRTC (Ours) | 94.0, 30 | 87.2, 40 | 81.2, 70 |
The results demonstrate that RT-RRTC consistently achieves the highest success rates and, most notably, reduces planning time by approximately 50-60% compared to I-RRT, due to the efficiency of the bidirectional RRT-Connect search and the guiding local target. The formation switching logic performed robustly, with the swarm transitioning from line to dispersed avoidance and finally to a stable circular orbit around the target vessel. The target remained within the field of view of the lead UAV drone’s camera throughout the approach, with the pixel coordinates of the target center showing low variance during the orbit phase.
2. Dynamic 3D Reconstruction Quality: We assess the quality of the 4D reconstruction from the multi-UAV drone imagery. For comparison, we also attempt reconstruction using standard 3D-GS (which assumes static scenes) on the same data. Quality is measured using standard image metrics comparing novel view renders to held-out ground truth frames: Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Learned Perceptual Image Patch Similarity (LPIPS).
| View Source / Method | PSNR (dB) ↑ | SSIM ↑ | LPIPS ↓ |
|---|---|---|---|
| UAV1 (3D-GS) | — | — | — |
| UAV1 (4D-GS) | 36.60 | 0.9552 | 0.0530 |
| UAV2 (3D-GS) | 30.14 | 0.9241 | 0.1204 |
| UAV2 (4D-GS) | 33.24 | 0.9382 | 0.0767 |
| UAV3 (3D-GS) | 27.51 | 0.8895 | 0.1604 |
| UAV3 (4D-GS) | 35.48 | 0.9546 | 0.0392 |
| Multi-View Fusion (4D-GS) | 31.06 | 0.9240 | 0.1405 |
3D-GS fails catastrophically for some single-view sequences (indicated by ‘–‘) due to motion artifacts, producing unusable models. 4D-GS successfully reconstructs from all views, with high PSNR (>33 dB) and SSIM (>0.95) for individual drones. The multi-view fused 4D-GS model provides a comprehensive and consistent model of the entire vessel, though its aggregate metrics are slightly lower than the best single view due to the challenge of perfectly harmonizing all data streams. Importantly, the multi-view model provides complete coverage of the vessel, whereas any single UAV drone’s view is inherently limited. The 4D model can be queried to render the ship from novel angles at different times during its maneuver.
Conclusion
This work presented a unified framework enabling a UAV drone swarm to autonomously navigate a dense, dynamic adversarial environment, persistently track and circumnavigate a non-cooperative maritime target, and reconstruct a high-fidelity dynamic 3D model of it. The context-aware formation strategy allows the swarm to adapt its configuration for efficient travel, obstacle negotiation, and optimal sensing. The RT-RRTC planning algorithm provides a robust and efficient solution for the coupled problem of dynamic target tracking and dynamic obstacle avoidance, a significant advancement over conventional planners. Finally, by integrating the planning loop with a state-of-the-art 4D Gaussian Splatting reconstruction pipeline, we close the perception-action cycle, transforming multi-drone observations into actionable spatio-temporal intelligence. Simulation results confirm the effectiveness of each component and the integrated system, demonstrating high success rates in cluttered environments and the generation of perceptually high-quality dynamic reconstructions. Future work will focus on enhancing robustness under extreme perception uncertainties, integrating real-time model updates for re-planning, and testing the system on physical UAV drone platforms in more complex, open-water scenarios.
