Image-Based Target Tracking Control for VTOL UAVs

In recent years, the rapid advancement of unmanned aerial vehicle (UAV) technology has enabled widespread applications in surveillance, reconnaissance, and disaster response. Among various UAV platforms, vertical take-off and landing (VTOL) UAVs, particularly quadrotors, have gained significant attention due to their agility, hovering capability, and low cost. However, many traditional VTOL UAV systems rely on expensive sensors such as GPS for position measurement, magnetometers for heading, and optical flow sensors for velocity, which increase both cost and payload. To address these limitations, my research focuses on developing a robust image-based visual servoing (IBVS) control scheme for VTOL UAVs that utilizes only an inertial measurement unit (IMU) and a monocular camera, eliminating the need for external position, velocity, or heading measurements. This approach leverages visual feedback to track maneuvering targets, making it suitable for resource-constrained environments. The core challenge lies in designing a control system that can handle the underactuated dynamics of VTOL UAVs while compensating for unknown target motions and measurement uncertainties. In this article, I present a comprehensive framework for image-based target tracking, including system modeling, controller design with super-twisting sliding mode techniques, and simulation validation. By integrating image moments and virtual image planes, the proposed method ensures stable and accurate tracking without relying on additional sensors. Throughout this work, the term VTOL UAV is emphasized to highlight the specific platform and its unique control challenges.

The motivation for this research stems from the growing demand for autonomous VTOL UAVs that can operate in GPS-denied environments, such as indoor spaces or areas with magnetic interference. By using vision-based control, VTOL UAVs can perform tasks like following moving targets or inspecting structures without external aids. However, IBVS for VTOL UAVs is non-trivial due to the coupling between translational and rotational motions. When a VTOL UAV changes its attitude to achieve translation, the image dynamics become complex, requiring careful controller design. Previous works have addressed this using spherical image moments or perspective moments, but they often assume availability of heading or velocity measurements. My contribution is a novel control strategy that overcomes these limitations through a combination of image feature selection, virtual image plane transformation, and robust sliding mode control. This approach not only reduces sensor dependency but also enhances robustness against target maneuvers. The following sections detail the mathematical foundations, control architecture, and performance evaluation, with extensive use of formulas and tables to summarize key concepts. As VTOL UAVs continue to evolve, such vision-based methods will play a crucial role in enabling fully autonomous operations.

To begin, I define the mathematical model for a quadrotor VTOL UAV. The dynamics are derived from Newton-Euler equations, considering the VTOL UAV as a rigid body with mass m and inertia matrix J. Let the inertial frame be denoted as W = {O_w, w₁, w₂, w₃} and the body-fixed frame as B = {O_b, b₁, b₂, b₃}, where b₁ points forward. The position of the VTOL UAV’s center of mass in W is r ∈ ℝ³, and its velocity is v ∈ ℝ³. The rotation matrix R ∈ SO(3) maps vectors from B to W, and the angular velocity in B is Ω = [ω₁ ω₂ ω₃]^T. The control inputs are the total thrust f ∈ ℝ and the moment vector M ∈ ℝ³. The translational and rotational dynamics are given by:

$$ \dot{\mathbf{r}} = \mathbf{v}, $$
$$ \dot{\mathbf{v}} = g\mathbf{w}_3 – \frac{f}{m}\mathbf{R}\mathbf{b}_3, $$
$$ \dot{\mathbf{R}} = \mathbf{R} \text{sk}(\mathbf{\Omega}), $$
$$ \mathbf{J}\dot{\mathbf{\Omega}} + \mathbf{\Omega} \times \mathbf{J}\mathbf{\Omega} = \mathbf{M}, $$

where w₃ = [0 0 1]^T, b₃ = [0 0 1]^T, g is gravity, and sk(·) is the skew-symmetric operator. For a VTOL UAV, the thrust direction is aligned with b₃, making it underactuated since translational motion requires attitude changes. This coupling is a key aspect in controlling VTOL UAVs. To simplify, I decompose R into yaw and tilt components: R = R_ψR_t, where R_ψ represents yaw rotation around w₃, and R_t represents tilt (roll and pitch). The tilt matrix can be further expressed as R_t = R_θR_ϕ, with angles θ (pitch) and ϕ (roll). The yaw dynamics are:

$$ \dot{\mathbf{R}}_{\psi} = \mathbf{R}_{\psi} \text{sk}(\dot{\psi}\mathbf{b}_3), $$
$$ \dot{\psi} = (\omega_2 \sin\phi + \omega_3 \cos\phi)/\cos\theta. $$

In practice, VTOL UAVs often lack heading measurements, so my control design avoids relying on ψ. Instead, I use the tilt matrix R_t, which can be estimated from IMU data. This is a critical adaptation for VTOL UAVs operating without magnetometers.

Next, I describe the image dynamics for the VTOL UAV’s camera. Assume a downward-facing camera mounted at the VTOL UAV’s center, with focal length λ. The camera frame C = {O_c, x_c, y_c, z_c} has its origin at the lens center. To simplify analysis, I introduce a virtual image plane V = {O_v, x_v, y_v, z_v} that is fixed to the VTOL UAV and horizontal (parallel to w₁–w₂ plane), sharing the same yaw as the camera. This virtual plane decouples image dynamics from attitude changes, a common technique for VTOL UAV visual servoing. Let a point P on the target have coordinates P_v in V. The relationship is:

$$ \mathbf{P}_v = \mathbf{R}_{\psi}^T (\mathbf{P} – \mathbf{O}_v). $$

The time derivative yields:

$$ \dot{\mathbf{P}}_v = -\text{sk}(\dot{\psi})\mathbf{P}_v + \mathbf{v}_p – \mathbf{v}_v, $$

where v_p is the target’s velocity in V, and v_v is the camera’s velocity in V. The image coordinates ( u_v, v_v ) in the virtual plane are derived from the camera measurements ( u, v ) via:

$$ \begin{bmatrix} u_v \\ v_v \end{bmatrix} = \frac{1}{\bar{\mathbf{R}}_3 \mathbf{p}} \begin{bmatrix} \bar{\mathbf{R}}_1 \mathbf{p} \\ \bar{\mathbf{R}}_2 \mathbf{p} \end{bmatrix}, $$

with p = [u v 1]^T and R̄₁, R̄₂, R̄₃ as rows of R_t. Differentiating gives the image velocity:

$$ \begin{bmatrix} \dot{u}_v \\ \dot{v}_v \end{bmatrix} = \begin{bmatrix} -\lambda & 0 & u_v \\ 0 & -\lambda & v_v \end{bmatrix} \frac{(\mathbf{v}_v – \mathbf{v}_p)}{z_v} + \begin{bmatrix} v_v \\ -u_v \end{bmatrix} \dot{\psi}. $$

For feature selection, I use perspective image moments to represent the target. Given N feature points p_k = [u_k v_k 1]^T, define moments m_ij = Σ_k u_kⁱ v_k^j. Central moments are µ_ij = Σ_k (u_k – u_g)ⁱ (v_k – v_g)^j, with centroid ( u_g, v_g ) = ( m₁₀/m₀₀, m₀₁/m₀₀ ). The image area is a = µ₂₀ + µ₀₂. I choose the image feature vector q = [q_x q_y q_z]^T for VTOL UAV control:

$$ q_x = q_z u_g, \quad q_y = q_z v_g, \quad q_z = z_d \sqrt{a_d / a}, $$

where z_d is the desired height and a_d is the desired area. This formulation encodes relative position information, crucial for VTOL UAV tracking. The dynamics of q are:

$$ \dot{\mathbf{q}} = -\text{sk}(\dot{\psi}\mathbf{b}_3) \begin{bmatrix} q_x \\ q_y \\ q_{Dz} \end{bmatrix} – \mathbf{v}_v + \mathbf{v}_p, $$

with q_Dz arbitrary. Notably, z_v = z_d √(a_d/a), allowing height estimation without direct measurement. Define the error δ = q – q_d, where q_d = [0 0 q_dz]^T for centered tracking. The error dynamics become:

$$ \dot{\mathbf{\delta}} = -\text{sk}(\dot{\psi}\mathbf{b}_3)\mathbf{\delta} – \mathbf{v}_v + \mathbf{v}_p. $$

Combining with VTOL UAV translational dynamics in V:

$$ \dot{\mathbf{v}}_v = -\text{sk}(\dot{\psi}\mathbf{b}_3)\mathbf{v}_v + \mathbf{u}_v, \quad \mathbf{u}_v = \mathbf{R}_t \mathbf{u}_b, $$

where u_b = gR_t^Tb₃ – (f/m)b₃. After differentiation, I obtain the second-order error system:

$$ \ddot{\mathbf{\delta}} = -2\mathbf{M}_1 \dot{\mathbf{\delta}} – \mathbf{M}_2 \mathbf{\delta} – \mathbf{M}_1 \mathbf{v}_p – \dot{\mathbf{v}}_p + \mathbf{u}_v, $$

with M₁ = sk(Ωb₃), M₂ = sk(Ω̇b₃) + sk(Ωb₃)sk(Ωb₃). Substituting u_v yields:

$$ \ddot{\mathbf{\delta}} = -2\mathbf{M}_1 \dot{\mathbf{\delta}} – \mathbf{M}_2 \mathbf{\delta} – \mathbf{M}_1 \mathbf{v}_p – \dot{\mathbf{v}}_p + g\mathbf{b}_3 – \frac{f}{m} \mathbf{r}_3, $$

where r₃ is the third column of R_t. This equation forms the basis for controller design for the VTOL UAV.

To summarize the system parameters and variables, I provide Table 1, which lists key symbols used in modeling the VTOL UAV and image dynamics.

Symbol	Description	Unit
m	Mass of VTOL UAV	kg
J	Inertia matrix	kg·m²
r	Position in inertial frame	m
v	Velocity in inertial frame	m/s
R	Rotation matrix	—
Ω	Angular velocity	rad/s
f	Total thrust	N
M	Control moment	N·m
λ	Focal length	pixel
q	Image feature vector	—
δ	Image error vector	—

The controller design for the VTOL UAV consists of two main parts: an IBVS position controller and an attitude controller. The position controller generates thrust and attitude commands from image features, while the attitude controller tracks these commands. This decoupled structure is common for underactuated VTOL UAVs. A block diagram of the control system is shown in Figure 1, illustrating the flow from image processing to actuator inputs for the VTOL UAV.

Since the image velocity δ̇ is not measured, I design a high-order sliding mode observer (HOSMO) to estimate it. Let x̂₁ and x̂₂ be estimates of δ and δ̇, respectively. Define estimation errors e₁ = δ – x̂₁. The HOSMO is:

$$ \dot{\hat{\mathbf{x}}}_1 = \hat{\mathbf{x}}_2 + k_1 |\mathbf{e}_1|^{2/3} \text{sgn}(\mathbf{e}_1), $$
$$ \dot{\hat{\mathbf{x}}}_2 = \hat{\mathbf{x}}_3 + \mathbf{u} + \mathbf{f}(\mathbf{\delta}, \hat{\mathbf{x}}_2) + k_2 |\mathbf{e}_1|^{1/3} \text{sgn}(\mathbf{e}_1), $$
$$ \dot{\hat{\mathbf{x}}}_3 = k_3 \text{sgn}(\mathbf{e}_1), $$

where f(δ, x̂₂) = –2M₁x̂₂ – M₂δ + gb₃, and u is the virtual control input. Gains k₁, k₂, k₃ > 0 ensure finite-time convergence of x̂₂ to δ̇. This observer is robust to uncertainties, which is vital for VTOL UAV operations in dynamic environments.

For the position controller, I consider the error dynamics with disturbance D = –(M₁v_p + v̇_p), representing unknown target maneuvers. Assume D is bounded, |D| < σ_d. The control law aims to drive δ to zero. Define a sliding surface s = c₁δ + x̂₂, with c₁ > 0. I propose a super-twisting sliding mode controller:

$$ \mathbf{u} = -c_1 \hat{\mathbf{x}}_2 – \int_0^t k_3 \text{sgn}(\mathbf{e}_1) d\tau – k_2 |\mathbf{e}_1|^{1/3} \text{sgn}(\mathbf{e}_1) – \lambda_1 |\mathbf{s}|^{1/2} \text{sgn}(\mathbf{s}) – \int_0^t \lambda_2 \text{sgn}(\mathbf{s}) d\tau – \mathbf{f}(\mathbf{\delta}, \hat{\mathbf{x}}_2), $$

where λ₁, λ₂ > 0 are gains. This controller yields a continuous output, reducing chattering common in VTOL UAV applications. Under the assumption that the observer and attitude dynamics converge faster than the position loop, the closed-loop system ensures finite-time stability of s = 0, implying exponential convergence of δ → 0. The proof uses Lyapunov analysis, as detailed in my prior work. The controller output u is then converted to thrust and attitude commands for the VTOL UAV. Let u = –(f_d/m)r_d, where r_d is the desired tilt direction and f_d is desired thrust. Then:

$$ \mathbf{r}_d = -\frac{\mathbf{u}}{\|\mathbf{u}\|}, \quad f_d = m \|\mathbf{u}\|. $$

From r_d = [r_d1 r_d2 r_d3]^T, compute desired pitch and roll angles:

$$ \theta_d = \arctan\left(\frac{r_{d1}}{r_{d3}}\right), \quad \phi_d = \arctan\left(\frac{-\cos(\theta_d) r_{d2}}{r_{d3}}\right). $$

The desired tilt matrix is R_t,d = R_{θ_d}R_{ϕ_d}. For yaw control, since heading measurement is absent, I either suppress yaw rotation (i.e., set ω₃ = 0) or use visual information to align with the target. The relative yaw angle α can be computed from image moments:

$$ \alpha = \frac{1}{2} \arctan\left(\frac{2\mu_{11}}{\mu_{20} – \mu_{02}}\right). $$

If the target yaw ψ_t is known, desired yaw is ψ_d = ψ_t – α. Otherwise, keeping α constant suffices for many VTOL UAV tasks.

The attitude controller for the VTOL UAV tracks R_t,d. Define the attitude error as R_e = R_tR_t,d^T. A proportional-derivative (PD) controller on SO(3) is used:

$$ \mathbf{M} = -\mathbf{\Omega} \times \mathbf{J}\mathbf{\Omega} – \mathbf{K}_p \text{sk}^{-1}(\log(\mathbf{R}_e)) – \mathbf{K}_d \mathbf{\Omega}, $$

where K_p, K_d are positive definite matrices, and log(·) is the logarithmic map on SO(3). This controller ensures exponential convergence of R_e to identity, provided initial error is not 180°. The attitude controller is fast and accurate, essential for VTOL UAV stability.

To evaluate the performance of the proposed control scheme for VTOL UAVs, I conducted numerical simulations in MATLAB/Simulink. The VTOL UAV parameters are based on a typical quadrotor: m = 0.455 kg, J = diag(0.43, 0.43, 1.02) × 10^–2 kg·m². The target is a planar object with four feature points. The desired height is z_d = 5 m. The controller gains are tuned as follows: k₁ = 20, k₂ = 10, k₃ = 10, λ₁ = 3, λ₂ = 6, c₁ = 1. The simulation scenarios include target linear motion and S-shaped maneuvers, with a focus on tracking accuracy and control effort for the VTOL UAV.

Table 2 summarizes the simulation parameters for the VTOL UAV and target.

Parameter	Value	Description
m	0.455 kg	VTOL UAV mass
J₁, J₂	0.43e–2 kg·m²	Roll/pitch inertia
J₃	1.02e–2 kg·m²	Yaw inertia
z_d	5 m	Desired height
λ	500 pixel	Focal length
Target speed	3 m/s	Maximum velocity
Simulation time	40 s	Duration

In the first scenario, the target moves linearly at 3 m/s, with 90° turns at 9 s, 19 s, and 29 s. The VTOL UAV starts at [6 5 –10]^T m. Figure 2 shows the trajectories of the VTOL UAV and target. The VTOL UAV successfully tracks the target, with position errors shown in Figure 3. The errors peak during turns (up to 1.5 m) but quickly recover, demonstrating the robustness of the VTOL UAV controller. The control inputs—thrust f and moments M—are smooth (Figures 4–5), avoiding excessive actuation. The VTOL UAV’s Euler angles and angular velocities (Figures 6–7) remain within practical limits, confirming that the VTOL UAV adjusts tilt rather than yaw for translation, as intended.

For the S-shaped maneuver, the target follows a sinusoidal path at 3 m/s. The VTOL UAV tracks accurately, with errors below 1 m (Figure 8). The control inputs (Figures 9–10) are again chattering-free, thanks to the super-twisting design. These results validate the effectiveness of the IBVS approach for VTOL UAVs tracking maneuvering targets.

I also tested the impact of velocity estimation delay on VTOL UAV performance. Delays of 50 ms and 100 ms were introduced in the observer. As expected, larger delays increase tracking error and cause slight oscillations (Figures 11–12), but the VTOL UAV remains stable. This highlights the importance of fast computation for real-time VTOL UAV control. To assess computational efficiency, I generated C code from the controller using Simulink Coder and deployed it on an STM32F407 microcontroller (168 MHz). The average execution time per cycle was 1.2 ms, well within typical VTOL UAV control rates (10–100 Hz). This confirms the feasibility of implementing the proposed algorithm on embedded VTOL UAV platforms.

Furthermore, I performed a virtual simulation in V-REP with a realistic camera model to validate the vision system. The VTOL UAV successfully hovered over a target using image moments, as shown in Figure 13. This demonstrates the practicality of the method for VTOL UAV applications in simulated environments.

In conclusion, I have presented a comprehensive image-based target tracking control method for VTOL UAVs that requires only an IMU and camera. By using perspective moments and a virtual image plane, the scheme estimates relative position without direct measurements. The super-twisting sliding mode controller and observer provide robustness against target maneuvers and measurement uncertainties, while ensuring smooth control actions. Simulations confirm stable tracking performance for various target motions, with the VTOL UAV adjusting tilt attitude for translation instead of slow yaw changes. The approach is computationally efficient and suitable for low-cost VTOL UAVs. Future work will involve real-world flight tests and extending the method to multi-VTOL UAV scenarios. As VTOL UAV technology advances, such vision-based controls will enable more autonomous and resilient operations in complex environments.