In recent years, quadrotor unmanned aerial vehicles (UAVs) have gained significant attention due to their versatility in applications such as surveillance, agriculture, and disaster response. However, single quadrotor systems often fall short in complex missions, leading to increased interest in multi-quadrotor formations. This article addresses the critical challenges of path planning and cooperative control for quadrotor UAV formations in unknown 3D environments. By integrating reinforcement learning, artificial potential fields, and multi-agent consensus theory, we propose a comprehensive framework that ensures safe and efficient formation flight. The core innovations include a Q-learning-based path planning algorithm for a virtual leader, a novel repulsion function for obstacle and collision avoidance, an extended state observer for disturbance rejection, and a distributed cooperative control protocol that accounts for the full six-degree-of-freedom (6-DOF) dynamics of quadrotors. Through rigorous stability analysis and simulations, we demonstrate the effectiveness of our approach in achieving precise trajectory tracking and robust performance under uncertainties.
The problem involves a formation of one virtual leader and N follower quadrotors operating in an unknown 3D environment with obstacles. The communication topology among followers is represented by an undirected graph, where each node corresponds to a quadrotor, and edges denote information exchange. The dynamics of each quadrotor are modeled using Newton-Euler equations, capturing the coupled translational and rotational motions. The control objectives are to ensure that all followers track the leader’s planned path while avoiding obstacles and inter-agent collisions, with all signals remaining globally uniformly ultimately bounded. Key assumptions include the smoothness of the leader’s trajectory and bounded disturbances.

The quadrotor dynamics are divided into positional and attitude subsystems. The positional dynamics for the i-th quadrotor are given by:
$$ \dot{P}_i = V_i, \quad \dot{V}_i = U^P_i, $$
where \( P_i = [x_i, y_i, z_i]^T \) is the position, \( V_i \) is the velocity, and \( U^P_i \) is the control input. The attitude dynamics are:
$$ \dot{\Theta}_i = \Omega_i, \quad \dot{\Omega}_i = F(\Theta_i, \Omega_i) + G U^\Theta_i + D_i(t), $$
where \( \Theta_i = [\phi_i, \theta_i, \psi_i]^T \) represents the Euler angles, \( \Omega_i \) is the angular velocity, \( U^\Theta_i \) is the torque input, and \( D_i(t) \) denotes lumped disturbances. The function \( F(\Theta_i, \Omega_i) \) captures the nonlinear coupling effects.
For path planning, we employ a Q-learning algorithm combined with B-spline curve fitting to generate smooth and collision-free trajectories for the virtual leader. The environment is discretized into a grid, and the action space includes 26 possible directions to enhance exploration. The reward function is designed to penalize large movements and obstacle proximity while rewarding goal attainment. The Q-table is updated iteratively using the Bellman equation:
$$ Q(s, a) \leftarrow Q(s, a) + \alpha \left[ r + \zeta \max_{a’} Q(s’, a’) – Q(s, a) \right], $$
where \( \alpha \) is the learning rate, \( \zeta \) is the discount factor, and \( r \) is the immediate reward. The planned discrete path points are then fitted with a B-spline curve to ensure continuity:
$$ P_0(t) = \sum_{m=0}^{M} P^m_0 \Re_{m,l}(t), $$
where \( \Re_{m,l}(t) \) are the B-spline basis functions.
The distributed cooperative control protocol is designed using back-stepping and multi-agent consensus theory. For the positional layer, the virtual control law for the i-th follower is:
$$ U^P_i = -\sum_{j=1}^{N} a_{ij} \left[ \gamma_{1i} (\tilde{P}_i – \tilde{P}_j) + \gamma_{2i} (\tilde{V}_i – \tilde{V}_j) \right] – a_{i0} (\gamma_{1i} \tilde{P}_i + \gamma_{2i} \tilde{V}_i) + F_i, $$
where \( \tilde{P}_i = P_i – P^d_i \) and \( \tilde{V}_i = V_i – \dot{P}^d_i \) are tracking errors, \( \gamma_{1i} \) and \( \gamma_{2i} \) are positive gains, and \( F_i \) is the repulsive force for obstacle avoidance. The repulsive force is derived from a novel artificial potential field:
$$ F_i = \sum_{c \in \mathcal{N}^o_i} F_{i,c} \frac{R_{i,c}}{||R_{i,c}||}, $$
with
$$ F_{i,c} = \begin{cases}
-\kappa_{ci} \left( \frac{1}{R_{i,c} – R_s} – \frac{1}{R^\ell_p} \right) \left( \frac{1}{R_{i,c} – R_s} \right)^2, & \text{if } R_s < R^o_{i,c} < R^\ell_p, \\
0, & \text{otherwise},
\end{cases} $$
where \( \kappa_{ci} > 0 \) is a repulsion coefficient, \( R_{i,c} \) is the distance to obstacle c, and \( R_s \) is the safety threshold.
For the attitude layer, an extended state observer (ESO) is designed to estimate and compensate for lumped disturbances:
$$ \dot{\hat{\Omega}}_i = F(\Theta_i, \Omega_i) + G U^\Theta_i + \hat{D}_i – \gamma_{3i} \tilde{\Omega}_i, \quad \dot{\hat{D}}_i = -\gamma_{4i} \tilde{\Omega}_i, $$
where \( \hat{D}_i \) is the disturbance estimate, and \( \gamma_{3i}, \gamma_{4i} > 0 \) are observer gains. The control law for the attitude subsystem is:
$$ U^\Theta_i = G^{-1} \left( -\kappa_{\Omega i} E_{\Omega i} – E_{\Theta i} – F(\Theta_i, \Omega_i) – \hat{D}_i + \dot{\Omega}^d_i \right), $$
where \( E_{\Theta i} = \Theta_i – \Theta^d_i \) and \( E_{\Omega i} = \Omega_i – \Omega^d_i \) are tracking errors, and \( \kappa_{\Omega i} > 0 \) is a control gain.
Stability analysis is conducted using Lyapunov theory. Consider the Lyapunov function for the positional subsystem:
$$ L_p = \frac{1}{2} P^T \gamma_1 (L + B) \otimes I_3 P + \frac{1}{2} V^T V + U_o(P), $$
where \( U_o(P) \) is the artificial potential field. Its derivative satisfies \( \dot{L}_p \leq 0 \), ensuring convergence. For the attitude subsystem, the Lyapunov function is:
$$ L_\Theta = \frac{1}{2} \sum_{i=1}^{N} E_{\Theta i}^T E_{\Theta i} + \frac{1}{2} \sum_{i=1}^{N} E_{\Omega i}^T E_{\Omega i} + \frac{1}{2} \sum_{i=1}^{N} \Upsilon_i^T P_i \Upsilon_i, $$
where \( \Upsilon_i = [\tilde{\Omega}_i, \tilde{D}_i/\eta_i]^T \). Using Young’s inequality and bounded disturbances, we show that \( \dot{L}_\Theta \leq -\Gamma L_\Theta + \Xi \), proving ultimate boundedness.
Simulations are performed in MATLAB to validate the approach. The environment includes static obstacles, and the quadrotor parameters are listed in Table 1.
| Parameter | Value |
|---|---|
| Mass (m) | 1.3 kg |
| Moment of Inertia (J_x, J_y, J_z) | 6.23e-3, 6.23e-3, 1.12e-3 Nm·s²/rad |
| Gravity (g) | 9.81 m/s² |
The path planning algorithm parameters are summarized in Table 2.
| Parameter | Value |
|---|---|
| Learning Rate (α) | 0.1 |
| Discount Factor (ζ) | 0.8 |
| Exploration Rate (ε) | 0.1 |
| Grid Dimensions (nx, ny, nz) | 30, 40, 15 |
The simulation results demonstrate that the virtual leader’s path is smoothly planned and avoids all obstacles. Followers accurately track the leader while maintaining formation and avoiding collisions. The ESO effectively estimates disturbances, and the control protocols ensure robust performance. Comparative studies with simplified models highlight the superiority of the full 6-DOF approach.
In conclusion, this work presents a integrated framework for quadrotor UAV formation flight that addresses path planning, obstacle avoidance, and distributed control under uncertainties. Future research will focus on dynamic environments, communication constraints, and real-time implementation. The proposed methods significantly advance the capabilities of multi-quadrotor systems in complex scenarios.
