Rolling Horizon Control: A Practical Framework for Drone Formation Flight

The control of multiple unmanned aerial vehicles (UAVs), or drone formation, presents a significantly more complex challenge than controlling a single vehicle. The potential benefits, however, are substantial, offering enhanced mission capability through distributed sensing, increased operational flexibility, and improved system-level robustness and fault tolerance. These advantages make drone formation highly attractive for a wide range of applications, from coordinated search and environmental monitoring to sophisticated cooperative missions.

Among the various approaches to drone formation control, Receding Horizon Control (RHC), also known as Model Predictive Control (MPC), has emerged as a particularly compelling framework. Its appeal lies in its inherent ability to systematically handle state and input constraints—a fundamental requirement for safe drone formation flight. Furthermore, RHC naturally accommodates multi-objective cost functions and adapts to changing conditions through its online, rolling-window optimization. This article provides an in-depth exploration of applying RHC to the problem of fixed, loose drone formation control, focusing on the prevalent leader-follower description. We will derive a Quadratic Programming (QP) based problem formulation, design both a standard and a stability-guaranteeing dual-mode RHC controller, and analyze key design parameters through simulation.

1. Core Challenges in Drone Formation Control

Designing an effective drone formation control system involves addressing several interconnected sub-problems:

Formation Description: Defining the desired spatial relationship between vehicles. Common methods include the leader-follower approach, virtual structure, and behavioral or consensus-based schemes for swarming.
Aerodynamic Coupling: In tight formations, the wake vortex of a leading UAV can significantly affect the aerodynamics of a trailing vehicle, offering potential drag reduction but requiring highly precise sensing and control.
Relative State Description: Choosing the coordinate frames and error metrics (e.g., inertial frame errors, body-frame relative positions) for defining the control objective.
Control Law Synthesis: The algorithm that computes actuator commands to achieve and maintain the desired formation despite disturbances and constraints.

This work focuses on loose drone formation control using the leader-follower paradigm. Here, the aerodynamic coupling is considered negligible, simplifying the model but still requiring robust control to maintain fixed relative positions. The leader tracks a predefined trajectory, while each follower’s objective is to maintain a specified offset relative to the leader’s position.

2. Mathematical Modeling for RHC

A critical step in applying RHC is developing a suitable prediction model. For real-time implementation in drone formation, complexity must be balanced with accuracy. We adopt a widely used discrete-time linear model, often resulting from feedback linearization of inner-loop attitude dynamics, representing the outer-loop translational kinematics and simplified dynamics.

The state and control vectors for a single UAV are defined as:
$$ x_k = [p_x(k),\ p_y(k),\ v_x(k),\ v_y(k)]^T $$
$$ u_k = [a_x(k),\ a_y(k)]^T $$
where $p$, $v$, and $a$ denote position, velocity, and acceleration (control input) components in a 2D plane (the formulation extends naturally to 3D). The discrete-time linear time-invariant model is:
$$ x_{k+1} = A x_k + B u_k $$
For a double-integrator system with sampling time $T_s$, the matrices are:
$$ A = \begin{bmatrix} I_2 & T_s I_2 \\ 0 & I_2 \end{bmatrix}, \quad B = \begin{bmatrix} 0 \\ T_s I_2 \end{bmatrix} $$
where $I_2$ is the 2×2 identity matrix.

2.1 Prediction Model Derivation

The power of RHC lies in predicting future system behavior over a finite horizon $N$. We derive a compact prediction equation. Let the current state at time $k$ be $x_{k|k} = x_k$. The predicted state at step $k+i$ is:
$$ x_{k+i|k} = A^i x_k + \sum_{j=0}^{i-1} A^{i-1-j} B u_{k+j|k} $$
By stacking predictions for $i=1$ to $N$ and corresponding control inputs, we define:
$$ \mathbf{x} = [x_{k+1|k}^T, x_{k+2|k}^T, …, x_{k+N|k}^T]^T $$
$$ \mathbf{u} = [u_{k|k}^T, u_{k+1|k}^T, …, u_{k+N-1|k}^T]^T $$
This allows us to write the entire prediction horizon as a linear function of the current state and the future input sequence:
$$ \mathbf{x} = \mathbf{H}_x x_k + \mathbf{H}_u \mathbf{u} $$
where the matrices $\mathbf{H}_x$ and $\mathbf{H}_u$ are constructed from $A$ and $B$:
$$ \mathbf{H}_x = \begin{bmatrix} A \\ A^2 \\ \vdots \\ A^N \end{bmatrix}, \quad \mathbf{H}_u = \begin{bmatrix} B & 0 & \dots & 0 \\ A B & B & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ A^{N-1}B & A^{N-2}B & \dots & B \end{bmatrix} $$
This compact form $\mathbf{x} = \mathbf{H}_x x_k + \mathbf{H}_u \mathbf{u}$ is fundamental for the QP formulation, as it expresses all future states linearly in terms of the single optimization variable $\mathbf{u}$.

3. Quadratic Programming Formulation for Drone Formation

The core of the RHC controller is a constrained optimization problem solved at each time step. We formulate the drone formation tracking problem as a Quadratic Program.

3.1 Cost Function Design

The control objective for a follower drone is twofold: 1) minimize tracking error relative to its desired position (defined by the leader’s trajectory plus an offset), and 2) minimize control effort. We define the desired state for the follower at time $k+i$ as $x_{ref, k+i}$. For a leader state $x_L$, the desired follower state is $x_{ref} = x_L + \Delta$, where $\Delta$ is the formation offset. The cost over the prediction horizon $N$ is:
$$ J = \sum_{i=0}^{N-1} \left( (x_{k+i+1|k} – x_{ref, k+i+1})^T Q (x_{k+i+1|k} – x_{ref, k+i+1}) + u_{k+i|k}^T R u_{k+i|k} \right) $$
where $Q \succeq 0$ and $R \succ 0$ are symmetric weighting matrices. Note the control reference is typically set to zero to penalize effort.

Substituting the prediction model $\mathbf{x} = \mathbf{H}_x x_k + \mathbf{H}_u \mathbf{u}$ and defining block-diagonal matrices $\mathbf{Q} = \text{diag}(Q,…,Q)$ and $\mathbf{R} = \text{diag}(R,…,R)$, the cost simplifies to the standard QP form:
$$ J_{QP} = \mathbf{u}^T (\mathbf{H}_u^T \mathbf{Q} \mathbf{H}_u + \mathbf{R}) \mathbf{u} + 2 (\mathbf{H}_x x_k – \mathbf{x}_{ref})^T \mathbf{Q} \mathbf{H}_u \mathbf{u} + \text{constant} $$
where $\mathbf{x}_{ref} = [x_{ref,k+1}^T, …, x_{ref,k+N}^T]^T$. The constant term can be omitted during optimization. The cost is thus a quadratic function of the decision variable $\mathbf{u}$: $J_{QP} = \frac{1}{2} \mathbf{u}^T \mathbf{M} \mathbf{u} + \mathbf{c}^T \mathbf{u}$.

3.2 Constraint Formulation

Practical drone formation flight mandates respecting physical limits. The two primary constraints are:

Input Constraints: Acceleration/control effort is bounded: $|u_k| \le u_{max}$.
State Constraints: Velocity is bounded for safety and performance: $|v_k| \le v_{max}$.

These constraints must be enforced over the entire prediction horizon. The input constraint in vector form is straightforward:
$$ \mathbf{U}_{min} \le \mathbf{u} \le \mathbf{U}_{max} $$
The velocity constraint requires selecting only the velocity states from the predicted state vector $\mathbf{x}$. This is done with a selection matrix $C_v$ (e.g., $C_v x = [v_x, v_y]^T$). The constraint is:
$$ |C_v \mathbf{x}| \le \mathbf{V}_{max} $$
Substituting the prediction equation $\mathbf{x} = \mathbf{H}_x x_k + \mathbf{H}_u \mathbf{u}$ transforms this into a linear inequality constraint on $\mathbf{u}$:
$$ C_v \mathbf{H}_u \mathbf{u} \le \mathbf{V}_{max} – C_v \mathbf{H}_x x_k $$
$$ -C_v \mathbf{H}_u \mathbf{u} \le \mathbf{V}_{max} + C_v \mathbf{H}_x x_k $$

3.3 The Complete QP Problem

At each sampling instant $k$, the RHC controller for a follower in the drone formation solves:
$$ \begin{aligned}
& \min_{\mathbf{u}} \quad \mathbf{u}^T (\mathbf{H}_u^T \mathbf{Q} \mathbf{H}_u + \mathbf{R}) \mathbf{u} + 2 (\mathbf{H}_x x_k – \mathbf{x}_{ref})^T \mathbf{Q} \mathbf{H}_u \mathbf{u} \\
& \text{subject to:} \\
& \quad \mathbf{u}_{min} \le \mathbf{u} \le \mathbf{u}_{max} \\
& \quad C_v \mathbf{H}_u \mathbf{u} \le \mathbf{V}_{max} – C_v \mathbf{H}_x x_k \\
& \quad -C_v \mathbf{H}_u \mathbf{u} \le \mathbf{V}_{max} + C_v \mathbf{H}_x x_k
\end{aligned} $$
The solution is an optimal control sequence $\mathbf{u}^* = [u_{k|k}^*, …, u_{k+N-1|k}^*]^T$. Only the first element $u_{k|k}^*$ is applied to the follower drone. At the next time step $k+1$, the state $x_{k+1}$ is measured (or estimated), the prediction horizon recedes, and the QP is solved again with the updated initial condition. This is the standard RHC algorithm.

Table 1: Summary of QP Problem Elements for Drone Formation RHC
Symbol	Description	Dimension
$x_k$	Current state of follower drone (position, velocity)	$n_x \times 1$
$\mathbf{u}$	Optimization variable: control sequence over horizon $N$	$(N \cdot n_u) \times 1$
$\mathbf{x}_{ref}$	Desired state sequence (from leader trajectory + offset)	$(N \cdot n_x) \times 1$
$Q, R$	State and control weighting matrices	$n_x \times n_x$, $n_u \times n_u$
$\mathbf{H}_x, \mathbf{H}_u$	Prediction matrices derived from $(A, B, N)$	$(N n_x) \times n_x$, $(N n_x) \times (N n_u)$
$C_v$	Selection matrix to extract velocity states	$(N \cdot n_v) \times (N \cdot n_x)$
$u_{max}, v_{max}$	Bounds on control input and velocity	Scalars/Vectors

4. Enhancing Stability: Dual-Mode RHC Controller

A known challenge with standard RHC is that feasibility and stability are not automatically guaranteed for arbitrary choices of horizon $N$ and weights $Q, R$. The dual-mode RHC strategy is a classic method to ensure closed-loop stability for the drone formation system. The core idea is to combine the global optimization of RHC with a stabilizing local controller inside a terminal region.

4.1 Conceptual Framework

The design involves two key components:

Terminal Constraint Set $\Omega_f$: A positively invariant set around the target state (zero tracking error). If the state enters $\Omega_f$, a local controller $u = h(x)$ can keep it inside forever.
Terminal Cost $P$: A weight matrix for the terminal state, often chosen as the cost-to-go of the local controller, making the finite-horizon cost approximate an infinite-horizon one.

The modified RHC algorithm works as follows: At time $k$, solve the QP with the additional terminal constraint $x_{k+N|k} \in \Omega_f$. Apply the first control input. If the subsequent state enters $\Omega_f$, switch to and remain with the local controller $h(x)$. This guarantees stability.

4.2 Design for Drone Formation Tracking

We assume the leader is in steady flight (or its future trajectory is known). The tracking error $e = x – x_{ref}$ dynamics become autonomous: $e_{k+1} = A e_k + B u_k$. The target is to drive $e$ to zero.

1. Local Controller: We choose a simple unconstrained Linear Quadratic Regulator (LQR) as the local controller:
$$ u_k = -K e_k $$
where $K$ is the LQR gain minimizing $\sum_{k=0}^{\infty} (e_k^T Q e_k + u_k^T R u_k)$.

2. Terminal Invariant Set: We design an ellipsoidal invariant set $\Omega_f = \{ e : e^T W e \le 1 \}$, where $W \succ 0$. For invariance under $u=-Ke$, we require:
$$ (A – BK)^T W (A – BK) – W \preceq 0 $$
This ensures if $e_k \in \Omega_f$, then $e_{k+1} \in \Omega_f$. The largest such ellipsoid under input constraints $|Ke| \le u_{max}$ can be found by solving a convex optimization problem (e.g., a Linear Matrix Inequality (LMI) problem):
$$ \begin{aligned}
& \max_{Y, \gamma} \log \det(Y) \\
& \text{s.t.} \quad \begin{bmatrix} Y & Y(A-BK)^T \\ (A-BK)Y & Y \end{bmatrix} \succeq 0, \\
& \quad \begin{bmatrix} \gamma & K_i Y \\ Y K_i^T & Y \end{bmatrix} \succeq 0, \quad \gamma \le u_{max,i}^2, \quad \forall i
\end{aligned} $$
where $Y = W^{-1}$ and $K_i$ is the $i$-th row of $K$.

3. Terminal Cost: The terminal cost is chosen as $e^T P e$, where $P$ is the solution to the discrete-time algebraic Riccati equation associated with the LQR, or a matrix satisfying the Lyapunov inequality for the closed-loop system:
$$ (A-BK)^T P (A-BK) – P \preceq -Q – K^T R K $$
This ensures the terminal cost is a control Lyapunov function.

4.3 The Dual-Mode RHC QP

The online optimization problem for the dual-mode drone formation controller becomes:
$$ \begin{aligned}
& \min_{\mathbf{u}} \quad \sum_{i=0}^{N-1} \left( e_{k+i+1|k}^T Q e_{k+i+1|k} + u_{k+i|k}^T R u_{k+i|k} \right) + e_{k+N|k}^T P e_{k+N|k} \\
& \text{s.t.} \\
& \quad \text{Dynamics: } e_{k+i+1|k} = A e_{k+i|k} + B u_{k+i|k} \\
& \quad \text{Constraints: } |u_{k+i|k}| \le u_{max},\ |v_{k+i|k}| \le v_{max} \quad \forall i \\
& \quad \text{Terminal Constraint: } e_{k+N|k}^T W e_{k+N|k} \le 1
\end{aligned} $$
This is still a QP (if the terminal set is ellipsoidal, the constraint is quadratic convex). While this guarantees stability, it introduces conservatism and can limit the feasible region, especially for a maneuvering drone formation. Its main practical value is in guiding the selection of a sufficiently long prediction horizon $N$ and appropriate terminal weight $P$ for the standard RHC, which often yields stable performance without the explicit terminal constraint.

Table 2: Comparison of Standard vs. Dual-Mode RHC for Drone Formation
Aspect	Standard RHC	Dual-Mode RHC
Stability Guarantee	Not automatic; depends on $N$, $Q$, $R$.	Guaranteed by design (terminal set & cost).
Online Complexity	QP with linear/box constraints.	QP with one additional convex quadratic constraint.
Feasibility Region	Generally larger for a given $N$.	Reduced by the terminal set constraint.
Design Complexity	Low. Tune $N$, $Q$, $R$, $P$.	High. Requires off-line design of $K$, $W$, $P$.
Practical Use Case	Common in practice when $N$ is chosen adequately.	Theoretical benchmark; guides standard RHC parameter selection.

5. Simulation Analysis and Parameter Impact

To validate the RHC framework for drone formation, we consider a simulation scenario with a leader and a follower. The leader executes a maneuver. The follower, modeled with the double-integrator dynamics ($T_s=0.2s$), must maintain a fixed offset. We impose constraints: $|a_{max}| = 0.6 \, m/s^2$, $|v_{max}| = 2.0 \, m/s$. The baseline weights are $Q = \text{diag}(1, 0.2)$ (penalizing position error more than velocity error) and $R=0.5$.

5.1 Effect of Prediction Horizon $N$

The prediction horizon is the most critical tuning parameter. A short horizon leads to myopic, unstable control, while a long horizon improves stability but increases computational load.

Case 1: Short Horizon ($N=5$). The standard RHC controller becomes unstable. The follower’s position error oscillates with growing amplitude. The optimizer only “sees” a short future and cannot plan adequate corrective maneuvers, failing to achieve stable drone formation tracking.

Case 2: Adequate Horizon ($N=10$). With a longer horizon, the controller achieves stable and accurate tracking. The follower smoothly converges to and maintains the desired offset, even during leader maneuvers. Control inputs are anticipatory and remain within limits. This demonstrates that a well-tuned standard RHC is sufficient for practical drone formation control.

5.2 Effect of Terminal Weight $P$

In standard RHC (without a terminal set), the terminal weight $P$ can be used to approximate an infinite-horizon cost and improve stability. For the unstable case with $N=5$, we can recover stability by increasing the terminal state penalty. Let $P = \alpha \cdot P_{lqr}$, where $P_{lqr}$ is the LQR Riccati solution. By setting $\alpha$ to a large value (e.g., 60), the controller effectively penalizes any terminal error severely, forcing the finite-horizon plan to steer the state closer to zero at the end of the horizon. This mimics the effect of a longer horizon and stabilizes the system, though convergence is typically slower than with a genuinely long horizon.

5.3 Performance of Dual-Mode RHC

Designing a dual-mode controller per Section 4, we obtain an LQR gain $K$, an invariant ellipsoid matrix $W$, and a terminal cost $P$. With $N=10$, the terminal state constraint $e_{k+N|k}^T W e_{k+N|k} \le 1$ is active only during the initial convergence. The controller performance is nearly identical to the standard RHC with $N=10$. If we reduce $N$ to the minimum length that still allows the optimizer to find a feasible path into the terminal set (e.g., $N=9$), the controller remains stable but the feasible region shrinks. This illustrates the conservatism: standard RHC with $N=9$ might work, but dual-mode requires $N \ge 9$ for a guarantee.

Table 3: Simulation Results for Different Controller Parameters
Controller Type	Prediction Horizon $N$	Terminal Weight $P$	Stability	Remarks
Standard RHC	5	Low ($Q$)	Unstable	Myopic control, oscillations.
Standard RHC	10	Low ($Q$)	Stable	Good tracking, anticipatory control.
Standard RHC	5	High ($60 \cdot P_{lqr}$)	Stable	Recovered stability, slower convergence.
Dual-Mode RHC	10	$P_{lqr}$	Guaranteed Stable	Performance equal to Standard RHC (N=10).
Dual-Mode RHC	9	$P_{lqr}$	Guaranteed Stable*	Feasible region smaller. May be infeasible for large initial errors.

*Stability is guaranteed only if the QP is feasible at each step.

6. Conclusion and Future Directions

Receding Horizon Control provides a powerful and practical framework for drone formation flight control. Its formulation as a Quadratic Programming problem directly addresses the core requirements: multi-objective optimization (tracking accuracy vs. control effort), explicit handling of state and input constraints, and adaptability through online re-planning. The standard RHC algorithm, when implemented with a sufficiently long prediction horizon and appropriate weighting matrices, delivers robust performance for loose drone formation tracking.

The dual-mode RHC approach offers a rigorous method to guarantee closed-loop stability by incorporating a terminal invariant set and cost. While it can be conservative and computationally more intensive for the drone formation problem, its design principles provide valuable guidance for tuning the standard RHC, particularly in selecting the prediction horizon $N$ and the terminal weight matrix $P$.

Future research directions for RHC in drone formation are abundant. Key areas include:

Robust and Distributed RHC: Developing formulations that account for uncertainties in dynamics, communication delays, and disturbances across the drone formation, often leading to distributed optimization schemes.
Dynamic Environment Adaptation: Integrating obstacle and collision avoidance constraints directly into the QP, enabling safe navigation in complex, cluttered environments.
Non-linear Models: Employing more accurate non-linear drone dynamics within the RHC framework using successive linearization or non-linear programming solvers, balancing real-time performance needs.
Learning-based Enhancements: Using machine learning to adapt cost functions, predict other agents’ behavior, or warm-start the QP solver to improve computational speed and performance for large-scale drone formation.

The continued advancement of optimization algorithms and computational hardware will further solidify RHC as a cornerstone technology for autonomous, coordinated multi-drone systems.