Autonomous Vision-Based Landing of VTOL Drones on Moving Ships

The autonomous recovery of Vertical Take-Off and Landing (VTOL) drones onto maritime vessels, particularly under dynamic sea conditions, represents a critical capability for expanding naval and commercial operations. Traditional landing aids, such as radar command guidance or laser beam-riding systems, often rely on substantial, expensive ship-based infrastructure. In contrast, vision-based guidance offers a promising path toward a more compact, cost-effective, and self-contained solution suitable for smaller platforms. This article presents a comprehensive vision-based guidance strategy for enabling a VTOL drone to land autonomously on a moving ship deck, addressing the unique challenges posed by deck motion and the stringent requirements for safe, precise touchdown.

The core challenge lies in the vessel’s six degrees-of-freedom motion—primarily heave, sway, surge, roll, and pitch—induced by sea waves. This motion invalidates the assumption of a static landing target, making traditional position-servo approaches insufficient. A robust system must not only accurately estimate the relative state between the VTOL drone and the landing pad but also generate a guidance trajectory that guarantees a soft landing with near-zero vertical velocity at contact, all in real-time. The proposed methodology decomposes this problem into two synergistic phases: a terminal guidance phase from initial detection to a defined safe hover point above the deck, and a decision-based landing phase that dynamically adjusts the final approach based on real-time deck motion assessment.

The overall architecture of the vision-based guidance system for the VTOL drone is modular. The perception module, relying solely on a monocular camera, is responsible for detecting a known visual marker on the deck and estimating the full 3D relative state. Crucially, this includes estimating the ship’s translational velocity vector. This estimated state is fed into the guidance module, which performs trajectory planning and tracking. The output is a 3D velocity command vector for the VTOL drone. This command is sent to the vehicle’s inner-loop flight controller, which is assumed to be stable and capable of accurately tracking velocity setpoints. This hierarchical separation between high-level guidance and low-level control is essential for system design and analysis.

Perception: 3D Velocity Estimation via Spherical Optical Flow

Accurate relative velocity estimation is paramount for landing on a moving target. Traditional methods using 2D image optical flow on a planar projection are susceptible to noise and conflate translational and rotational motion, especially under the small rotational perturbations expected from a ship. To overcome this, a spherical imaging model is employed. This model, analogous to a biological retina, projects image points onto a unit sphere, providing a more geometrically faithful representation of visual motion.

Let $O_I X_I Y_I Z_I$ denote the inertial world frame, $O_c X_c Y_c Z_c$ the camera frame attached to the VTOL drone, and $O_b X_b Y_b Z_b$ the body frame of the VTOL drone. The position of a feature point $s_i$ in the inertial frame is related to its representation in the camera frame $p_i = [x_i, y_i, z_i]^T$ by:

$$ p_i = R_{CI}^T (s_i – \xi) $$

where $R_{CI}$ is the rotation matrix from the inertial to camera frame, and $\xi$ is the position of the camera center in the inertial frame. Its projection onto the unit sphere is:

$$ p_{s_i} = \frac{p_i}{\| p_i \|} $$

The spherical optical flow $\dot{p}_{s_i}$, which is the temporal derivative of the point on the sphere, is derived from the kinematics of relative motion. Let $v_c$ and $v_T$ be the translational velocities of the camera (VTOL drone) and target (ship deck) in the inertial frame, respectively. Let $\Omega_x$ be the skew-symmetric matrix of the camera’s angular velocity $[\Omega_1, \Omega_2, \Omega_3]^T$ expressed in the camera frame. The relationship is given by:

$$ \dot{p}_{s_i} = -\Omega_x p_{s_i} – \pi_{p_{s_i}} \frac{1}{\| p_i \|} (V_c – V_T) $$

Here, $V_c = R^T v_c$ and $V_T = R^T v_T$ are the velocities expressed in the camera frame, and $\pi_{p_{s_i}} = I_3 – p_{s_i} p_{s_i}^T$ is the projection operator onto the plane tangent to the sphere at $p_{s_i}$.

Defining the 3D optical flow vector $w$ and the aggregated vector $q$ over $n$ tracked features:

$$ w = \frac{V_c – V_T}{d}, \quad q = \sum_{i=1}^{n} p_{s_i} $$

where $d$ is the estimated distance to the target plane. The 3D flow can be estimated as:

$$ w = -Q^{-1}(\dot{q} + \Omega_x q), \quad \text{where} \quad Q = \sum_{i=1}^{n} \cos(\lambda_i) \pi_{p_{s_i}} $$

and $\lambda_i$ is the line-of-sight angle. Finally, the target ship’s inertial velocity is recovered by:

$$ v_T = v_c – d R w $$

While effective, this discrete feature-based flow can be noisy. To robustly estimate the dominant translational velocity in the presence of small deck rotations treated as noise, a spherical integral method is applied. Integrating the optical flow over a region $S^2$ on the lower hemisphere (excluding the area immediately around the optical axis which carries little translational information) yields a smoother, more robust estimate $\bar{w}$:

$$ \phi = \int_{S^2} \dot{p}_s \, ds = -\frac{\pi}{2}(\cos 2\beta_0 – \cos 2\beta_1)\Omega_x \eta + R^T \Lambda R w $$
$$ \bar{w} = -R^T \Lambda^{-1} R \left( \phi + \frac{\pi}{2}(\cos 2\beta_0 – \cos 2\beta_1)\Omega_x \eta \right) $$

where $\beta_0$ and $\beta_1$ define the integrated spherical annulus, $\eta$ is the normal to the target plane, and $\Lambda$ is a diagonal matrix dependent on $\beta_0, \beta_1$. This $\bar{w}$ is then used in the velocity recovery equation, providing the stable velocity estimate $v_T$ crucial for the VTOL drone’s guidance law.

Guidance: Tau Theory-Based Trajectory Planning and Tracking

Having estimated the relative position $r$ and the ship’s velocity $v_T$, the next step is to generate a feasible landing trajectory. Simple pursuit or proportional navigation is inadequate as they do not enforce soft-landing constraints. Polynomial trajectory optimization, while precise, is computationally expensive for real-time replanning. Inspired by biological systems, the proposed method utilizes Tau theory. Tau ($\tau$) is defined as the time-to-contact given the current rate of closure: $\tau_x = x / \dot{x}$ for any closing variable $x$.

Tau coupling theory states that for a smooth, controlled closure, the $\tau$ of one variable (e.g., altitude) is kept in constant proportion to the $\tau$ of another guiding variable: $\tau_y = k \tau_x$. For the VTOL drone landing, we couple the line-of-sight angles (azimuth $\chi$ and elevation $\lambda$) and the range $r$ to a guidance motion gap $x_a$ that prescribes a smooth deceleration profile.

First, a parabolic guidance motion gap $x_a(t)$ with desired dynamics is defined:

$$ v_a(t) = \dot{x}_a = 0.5a(t_d^2 – t^2) + v_{a0}(t_d – t) $$
$$ \dot{v}_a(t) = a(t) = -v_{a0} – a t $$

where $t_d$ is the time-to-go, $v_{a0}$ is the initial gap closure rate, and $a$ is a constant. We then couple the range $r$ to this gap:

$$ \tau_r = k \tau_{a}, \quad \text{with} \quad \tau_{a} = \frac{v_a}{\dot{v}_a} $$

Solving this differential equation with initial conditions $r(0)=r_0$, $\dot{r}(0)=v_{r0}$ yields the planned range profile:

$$ r(t) = r_0 \left( \frac{r_0}{k} \right)^{-1/k} \left( v_{r0} t – 0.5 a t^2 + \frac{r_0}{k} \right)^{1/k} $$

The constant $a$ and time-to-go $t_d$ are determined by solving a time-and-energy optimal control problem with terminal constraints $r(t_d)=0$ and $\dot{r}(t_d)=0$. This yields:

$$ t_d = \frac{2 v_{r0}}{b – 2}, \quad a = \frac{2r_0/k + 2v_{r0}t_d}{t_d^2} $$

where $b$ is a root of a quartic equation derived from the optimality conditions. The coupling constant $k$ must satisfy $0 < k < 0.5$ to ensure zero velocity and bounded acceleration at touchdown. Subsequently, the line-of-sight angles are coupled to the range: $\tau_\lambda = k \tau_r$ and $\tau_\chi = k \tau_r$, producing the desired angular profiles $\lambda(t)$ and $\chi(t)$. The complete 3D desired trajectory for the VTOL drone relative to the ship deck center is then:

$$ r_{des}(t) = \begin{bmatrix} x_{des}(t) \\ y_{des}(t) \\ z_{des}(t) \end{bmatrix} = \begin{bmatrix} -r(t) \sin(\lambda(t)) \cos(\chi(t)) \\ -r(t) \sin(\lambda(t)) \sin(\chi(t)) \\ -r(t) \cos(\lambda(t)) \end{bmatrix} $$

To track this trajectory, a Receding Horizon Control (RHC) or Model Predictive Control (MPC) scheme is implemented. Using a simple kinematic model where the state is the relative position $p = [x, y, z]^T$ and the control input is the relative velocity $v_r$, the state-space model is:

$$ \dot{p} = v_r, \quad p = C p $$

The RHC controller solves, at each time step, a finite-horizon optimization problem to minimize the tracking error and control effort:

$$ \min_{\Delta U} J = (R_s – Y)^T (R_s – Y) + \Delta U^T R \Delta U $$

where $R_s$ is the sequence of future reference points from $r_{des}(t)$, $Y$ is the predicted output, and $\Delta U$ is the change in the control sequence $U$ which contains future relative velocity commands. The first element of the optimized control sequence $U^*(k)$ is used. The final inertial velocity command for the VTOL drone’s flight controller is:

$$ v_{cmd} = v_T + U^*(k) $$

This ensures the VTOL drone tracks the Tau-generated path while compensating for the ship’s motion.

Landing Decision Strategy with Variable Safe Height

The final stage of landing is the most critical. A fixed final approach trajectory is dangerous if the deck is experiencing severe heave motion at the intended touchdown moment. Therefore, a decision strategy manages the transition from the terminal guidance phase to actual touchdown. The VTOL drone is first commanded to reach and hold a “safe height” $h_{safe}$ above the deck. This height is not static but is dynamically modulated based on real-time assessment of deck motion, particularly its vertical velocity and acceleration.

A Tau-based formulation is also used for this safe height profile. The drone plans to close the height gap $h$ to zero according to a Tau-coupled law, but the coupling constant or the time-to-go $t_d$ is continuously adjusted based on a “landing window” analysis. The safe height profile is given by:

$$ h_{safe}(t) = h_0 \frac{t_d^{2/k}}{(t_d^2 – t^2)^{1/k}} $$

The decision logic monitors the deck’s motion. If the deck’s vertical position and velocity are within a predefined tolerance band (simulating a momentary “quiet” period in its heave cycle), the strategy commits to landing, effectively setting $t_d$ to a small value to initiate a rapid, controlled descent. If the deck motion is too severe, the strategy increases $t_d$, causing the VTOL drone to effectively hold or even increase its safe height, waiting for a more stable moment. This creates an adaptive, event-triggered landing sequence that significantly enhances safety for the VTOL drone.

System Integration and Simulation Validation

The performance of the complete vision-based guidance system for the VTOL drone was validated through a semi-physical simulation loop. The environment consisted of three subsystems: 1) A perception and guidance subsystem running on a Raspberry Pi (ARM A53), 2) A flight control subsystem using the PX4 open-source autopilot, and 3) A high-fidelity visual simulation environment built in Unity3D. Unity3D simulated the dynamics of the ship in various sea states, rendered the camera view for the VTOL drone, and provided ground truth data. The other two subsystems processed the images and generated control commands in real-time, closing the loop.

The visual target was a high-contrast marker on the ship deck. The perception module successfully estimated the ship’s velocity. The table below summarizes key performance metrics from simulations under two different sea states (calm and rough), demonstrating the system’s robustness.

Sea State	Avg. Range Tracking Error (m)	Time to Safe Height (s)	Touchdown Vertical Speed (m/s)	Success Rate*
Calm (Sea State 3)	0.124	104.86	0.3	95.4%
Rough (Sea State 6)	0.183	113.9	N/A (Held)	25.4%

*Success rate here refers to the probability of the deck motion being within the acceptable landing window, based on a statistical “zero-crossing” model of wave motion.

In calm seas, the VTOL drone completed the landing smoothly with a soft touchdown. In rough seas, the decision strategy correctly dominated, causing the VTOL drone to hold at the safe height for extended periods. It only initiated landing during brief periods of relative deck stability, showcasing the critical role of the decision module. The Tau-guided trajectory provided a naturally smooth deceleration profile without requiring complex online optimization.

Conclusion and Future Directions

This article presented an integrated vision-based solution for the autonomous deck landing of VTOL drones. The method combines robust 3D motion estimation via spherical optical flow with a bio-inspired, Tau theory-based trajectory planning and tracking law, culminating in a decision strategy for safe engagement under dynamic deck motion. The system operates primarily with an onboard camera, enhancing the autonomy and reducing the cost of VTOL drone recovery systems. Simulation results validate the approach’s effectiveness in meeting soft-landing constraints and its adaptive behavior in different sea conditions.

Future work will focus on several enhancements. First, integrating the VTOL drone’s full dynamics and actuator constraints into the guidance loop, potentially using a more sophisticated nonlinear MPC, would allow for optimal maneuvering. Second, exploring direct visual servoing methods that use the optical flow or image features directly in the guidance law could simplify the pipeline, though challenges in trajectory shape control remain. Finally, extending the perception system to be robust to extreme lighting, obscurations (e.g., spray), and to provide reliable estimates during the final moments before touchdown will be crucial for real-world deployment of VTOL drones in maritime environments.