Data-Driven Optimal Formation Control for Drone Light Shows

The coordination of unmanned aerial vehicles (UAVs) into precise formations represents a significant frontier in aerial systems technology. The application of multi-UAV systems, particularly in the context of large-scale formation drone light shows, has captured global attention. These spectacular displays rely on the ability of hundreds, even thousands, of drones to maneuver as a single, cohesive entity, creating complex and dynamic aerial images. Beyond entertainment, the underlying control principles are critical for more demanding applications such as collaborative search and rescue, infrastructure inspection, and advanced multi-agent tactical operations. However, the core challenge in realizing reliable and scalable formation drone light show systems lies in managing the inherent complexities of UAV dynamics and real-world uncertainties.

Quadrotor drones, the typical platform for such shows, possess highly nonlinear and coupled dynamics. While traditional control design often requires a precise mathematical model, obtaining and maintaining an accurate model for every drone in a large fleet is impractical. Variations in payload, manufacturing tolerances, battery drain, and environmental factors like wind introduce significant model uncertainties and parameter perturbations. A model-based controller designed for a nominal system may suffer performance degradation or even become unstable when faced with these real-world discrepancies. This fundamental limitation necessitates a paradigm shift towards more adaptive and robust control strategies that can guarantee performance without relying on an exact a priori model. This article addresses this challenge by proposing a data-driven, optimal control framework specifically tailored for the high-precision demands of formation drone light show applications. By leveraging real-time flight data instead of a fixed model, the proposed method ensures robust and optimal formation keeping even when individual drone dynamics are not precisely known.

1. System Modeling and Problem Formulation

We consider a fleet of $N$ quadrotor drones tasked with achieving and maintaining a desired formation pattern, such as those required in a formation drone light show. The dynamics of each quadrotor are inherently nonlinear and underactuated. The standard nonlinear model for the $i$-th drone is given by translational and rotational equations. The state vector is defined as $\mathbf{X}_i = [x_i, y_i, z_i, \dot{x}_i, \dot{y}_i, \dot{z}_i, \phi_i, \theta_i, \psi_i, p_i, q_i, r_i]^T$, encompassing position, velocity, Euler angles (roll $\phi$, pitch $\theta$, yaw $\psi$), and body-axis angular rates $(p, q, r)$. The control input $\mathbf{U}_i = [U_{1,i}, U_{2,i}, U_{3,i}, U_{4,i}]^T$ corresponds to the collective thrust and the moments generated by the rotor speed differences.

The complete nonlinear model is complex. For control design, a standard approach is to leverage the time-scale separation between the fast rotational dynamics and the slower translational dynamics. Under the assumption of small attitude angles, which is reasonable for smooth formation drone light show maneuvers, the model can be linearized around a hover condition and decoupled into four independent subsystems:

Altitude (z) subsystem: $$ \ddot{z}_i = g + \frac{U_{1,i}}{m} $$
Horizontal (x and y) subsystems: Governed by the attitude angles which act as virtual control inputs.
$$ \ddot{x}_i = g \theta_i, \quad \ddot{y}_i = -g \phi_i $$
Attitude ($\phi, \theta, \psi$) subsystems: These are driven by the differential thrust commands $U_{2,i}, U_{3,i}, U_{4,i}$.

By defining virtual control inputs for the horizontal motion, the overall control problem for each drone can be decomposed into controlling these simpler, double-integrator-like linear subsystems. This decoupling is essential for scalable control design in a formation drone light show.

The formation objective is defined relative to a virtual leader, which dictates the global motion of the entire formation drone light show pattern. For the $i$-th drone, let $\mathbf{\rho}_{0i}$ be its desired offset from the virtual leader’s position. The formation tracking error $\mathbf{\zeta}_i$ for a given axis (e.g., x-position) is:
$$\mathbf{\zeta}_i = \mathbf{x}_i – \mathbf{\rho}_0 – \mathbf{\rho}_{0i}$$
where $\mathbf{\rho}_0$ is the virtual leader’s state. The collective error dynamics for all $N$ drones can be written in a compact form:
$$\dot{\mathbf{\zeta}} = (\mathbf{I}_N \otimes \mathbf{A}) \mathbf{\zeta} + (\mathbf{I}_N \otimes \mathbf{B}) \mathbf{u}$$
where $\mathbf{\zeta} = [\mathbf{\zeta}_1^T, \ldots, \mathbf{\zeta}_N^T]^T$, $\mathbf{u} = [\mathbf{u}_1^T, \ldots, \mathbf{u}_N^T]^T$, $\mathbf{A}$ and $\mathbf{B}$ are the state and input matrices from the linearized subsystem, and $\otimes$ denotes the Kronecker product. $\mathbf{I}_N$ is the $N \times N$ identity matrix.

The core problem is that the true matrices $\mathbf{A}$ and $\mathbf{B}$ are unknown due to the aforementioned parameter perturbations (mass $m$, inertia $I$, etc.). We only have access to nominal estimates or, more realistically, only input-output data. We assume the real matrices are $\hat{\mathbf{A}} = \mathbf{A} + \Delta\mathbf{A}$ and $\hat{\mathbf{B}} = \mathbf{B} + \Delta\mathbf{B}$, where the perturbations $\Delta\mathbf{A}, \Delta\mathbf{B}$ are bounded but unknown. The control objective is to design a distributed, optimal controller $\mathbf{u}_i$ using primarily measured data, ensuring that $\mathbf{\zeta} \to \mathbf{0}$ as $t \to \infty$, thus achieving perfect formation tracking for the formation drone light show.

2. Data-Driven Adaptive Optimal Controller Design

We propose a novel synthesis of distributed consensus control, Linear Quadratic Regulator (LQR) theory, and Adaptive Dynamic Programming (ADP). The key innovation is to solve the optimal formation control problem without explicit knowledge of $\hat{\mathbf{A}}$ and $\hat{\mathbf{B}}$, using only measured state and input data collected online.

2.1. Distributed Optimal Control Framework

For a known linear system, the optimal controller minimizing the global quadratic cost
$$ J = \int_0^\infty \left( \mathbf{\zeta}^T (\mathbf{Q} \otimes \mathbf{I}_n) \mathbf{\zeta} + \mathbf{u}^T (\mathbf{R} \otimes \mathbf{I}_m) \mathbf{u} \right) dt $$
is given by the state feedback $\mathbf{u}^* = – (\mathbf{K}^* \otimes \mathbf{I}_m) \mathbf{\zeta}$, where $\mathbf{Q}\ge0$ and $\mathbf{R}>0$ are weight matrices. The optimal gain $\mathbf{K}^*$ is related to the solution $\mathbf{P}^*$ of the Algebraic Riccati Equation (ARE):
$$ \hat{\mathbf{A}}^T \mathbf{P}^* + \mathbf{P}^* \hat{\mathbf{A}} + \mathbf{Q} – \mathbf{P}^* \hat{\mathbf{B}} \mathbf{R}^{-1} \hat{\mathbf{B}}^T \mathbf{P}^* = 0, \quad \mathbf{K}^* = \mathbf{R}^{-1} \hat{\mathbf{B}}^T \mathbf{P}^*. $$
In a distributed “leader-follower” setting for a formation drone light show, the control protocol incorporates information from neighbors:
$$ \mathbf{u}_i = c \mathbf{K}^* \left( \sum_{j \in \mathcal{N}_i} a_{ij} (\mathbf{\zeta}_i – \mathbf{\zeta}_j) + d_i \mathbf{\zeta}_i \right) $$
where $c>0$ is a coupling gain, $a_{ij}$ are adjacency matrix elements, $d_i$ represents pinning gain to the virtual leader, and $\mathcal{N}_i$ is the neighbor set. This protocol ensures both consensus among drones and tracking of the leader’s trajectory.

2.2. Data-Driven Policy Iteration

Since $\hat{\mathbf{A}}$ and $\hat{\mathbf{B}}$ are unknown, we cannot solve the ARE directly. We employ a policy iteration (PI) algorithm driven entirely by data. The algorithm starts with an initial stabilizing gain matrix $\mathbf{K}_0$. It then iteratively performs two steps:

Policy Iteration Algorithm for Data-Driven Control
Step	Objective	Classical Model-Based Approach	Proposed Data-Driven Approach
1. Policy Evaluation	Given $\mathbf{K}_k$, find $\mathbf{P}_k$ satisfying: $$ (\hat{\mathbf{A}}-\hat{\mathbf{B}}\mathbf{K}_k)^T \mathbf{P}_k + \mathbf{P}_k (\hat{\mathbf{A}}-\hat{\mathbf{B}}\mathbf{K}_k) = -\mathbf{Q} – \mathbf{K}_k^T \mathbf{R} \mathbf{K}_k $$	Solve Lyapunov equation using $\hat{\mathbf{A}}, \hat{\mathbf{B}}$.	Use online/offline state and input data to solve for $\mathbf{P}_k$ without knowing $\hat{\mathbf{A}}, \hat{\mathbf{B}}$.
2. Policy Improvement	Update the controller gain.	$\mathbf{K}_{k+1} = \mathbf{R}^{-1} \hat{\mathbf{B}}^T \mathbf{P}_k$.	$\mathbf{K}_{k+1} = \mathbf{R}^{-1} \mathbf{\Theta} \mathbf{P}_k$, where $\mathbf{\Theta}$ is learned from data.

The central challenge is performing Policy Evaluation without a model. Consider the closed-loop system under gain $\mathbf{K}_k$: $\dot{\mathbf{\zeta}} = (\hat{\mathbf{A}}-\hat{\mathbf{B}}\mathbf{K}_k)\mathbf{\zeta} + \hat{\mathbf{B}} \mathbf{u}_s$, where $\mathbf{u}_s$ is an exploration signal. The value function for this policy is $V_k(\mathbf{\zeta}) = \mathbf{\zeta}^T \mathbf{P}_k \mathbf{\zeta}$. Its time derivative along the system trajectory is:
$$ \frac{d}{dt} (\mathbf{\zeta}^T \mathbf{P}_k \mathbf{\zeta}) = -\mathbf{\zeta}^T (\mathbf{Q} + \mathbf{K}_k^T \mathbf{R} \mathbf{K}_k) \mathbf{\zeta} + 2 \mathbf{u}_s^T \mathbf{R} \mathbf{K}_{k+1} \mathbf{\zeta}. $$
Integrating over a time interval $[t, t+T]$ yields:
$$ \mathbf{\zeta}^T(t+T) \mathbf{P}_k \mathbf{\zeta}(t+T) – \mathbf{\zeta}^T(t) \mathbf{P}_k \mathbf{\zeta}(t) = -\int_t^{t+T} \mathbf{\zeta}^T (\mathbf{Q} + \mathbf{K}_k^T \mathbf{R} \mathbf{K}_k) \mathbf{\zeta} \, d\tau + 2 \int_t^{t+T} \mathbf{u}_s^T \mathbf{R} \mathbf{K}_{k+1} \mathbf{\zeta} \, d\tau. $$
This is a linear equation in the unknown elements of $\mathbf{P}_k$ and $\mathbf{K}_{k+1}$. By collecting sufficient data tuples $\{\mathbf{\zeta}(t_l), \mathbf{u}_s(t_l)\}$ over multiple time intervals, we can construct a linear system of equations:

$$ \mathbf{\Xi}_k \begin{bmatrix} \text{vec}(\mathbf{P}_k) \\ \text{vec}(\mathbf{K}_{k+1}) \end{bmatrix} = \mathbf{\Upsilon}_k $$
where $\text{vec}(\cdot)$ vectorizes a matrix, $\mathbf{\Xi}_k$ is a data matrix constructed from measured $\mathbf{\zeta}$ and $\mathbf{u}_s$, and $\mathbf{\Upsilon}_k$ is a vector from the integrated cost. A unique solution exists if the data is persistently exciting, i.e., if $\mathbf{\Xi}_k$ has full column rank. This condition can be met by adding a rich enough exploration noise $\mathbf{u}_s$ during data collection. The gain is then updated as $\mathbf{K}_{k+1}$, completing the iteration. This process converges to the optimal gain $\mathbf{K}^*$ without ever identifying $\hat{\mathbf{A}}$ or $\hat{\mathbf{B}}$. This data-driven resilience is paramount for a real-world formation drone light show where individual drone characteristics may vary.

3. Stability and Convergence Analysis

The stability of the overall closed-loop formation drone light show system under the data-driven controller is proven using Lyapunov theory.

Theorem 1 (Convergence of PI): Given an initial stabilizing gain $\mathbf{K}_0$ and provided the collected data satisfies the persistence of excitation condition at each iteration $k$, the data-driven policy iteration algorithm converges, i.e.,
$$ \lim_{k \to \infty} \mathbf{K}_k = \mathbf{K}^*, \quad \lim_{k \to \infty} \mathbf{P}_k = \mathbf{P}^* $$
where $(\mathbf{P}^*, \mathbf{K}^*)$ is the solution to the optimal control problem for the true, unknown system $(\hat{\mathbf{A}}, \hat{\mathbf{B}})$.

Proof Sketch: The convergence proof follows a two-step process. First, we show that each iteration improves the policy. If $\mathbf{K}_k$ is stabilizing, the data-based solution $\mathbf{P}_k$ from the linear equations is the unique solution to the Lyapunov equation for the true closed-loop system $(\hat{\mathbf{A}}-\hat{\mathbf{B}}\mathbf{K}_k)$. The update $\mathbf{K}_{k+1} = \mathbf{R}^{-1} \hat{\mathbf{B}}^T \mathbf{P}_k$ (implied by the data-based solution) is then the standard policy improvement step. It can be shown that this leads to a non-increasing sequence of cost matrices $\mathbf{P}_k \ge \mathbf{P}_{k+1} \ge \mathbf{P}^* > 0$. Since the sequence is bounded below and monotonic, it converges to some $\mathbf{P}_\infty$. Taking the limit in the update law proves that $\mathbf{P}_\infty$ must satisfy the ARE, confirming $\mathbf{P}_\infty = \mathbf{P}^*$.

Theorem 2 (Closed-Loop Stability): The distributed control law $\mathbf{u} = – (\mathbf{K}^* \otimes \mathbf{I}_m) \mathbf{L} \mathbf{\zeta}$, where $\mathbf{L}$ is the Laplacian matrix plus pinning gains, renders the equilibrium $\mathbf{\zeta} = \mathbf{0}$ of the formation error dynamics globally asymptotically stable for the formation drone light show network.

Proof: Consider the Lyapunov function candidate for the entire network:
$$ V(\mathbf{\zeta}) = \frac{1}{2} \mathbf{\zeta}^T ( \mathbf{I}_N \otimes \mathbf{P}^* ) \mathbf{\zeta} $$
where $\mathbf{P}^*$ is the positive definite solution from Theorem 1. Taking its time derivative along the trajectories of the true system $(\hat{\mathbf{A}}, \hat{\mathbf{B}})$ and substituting the control law yields, after manipulation:
$$ \dot{V}(\mathbf{\zeta}) \le -\frac{1}{2} \mathbf{\zeta}^T \left( \mathbf{I}_N \otimes \mathbf{Q} + c \mathbf{L} \otimes (\mathbf{K}^*)^T \mathbf{R} \mathbf{K}^* \right) \mathbf{\zeta} $$
where $c$ is a sufficiently large coupling strength. Since $\mathbf{Q} \ge 0$, $\mathbf{R} > 0$, and $\mathbf{L}$ is positive definite for a connected graph with a pinned leader, the matrix within the quadratic form is positive definite. Therefore, $\dot{V}(\mathbf{\zeta}) < 0$ for all $\mathbf{\zeta} \neq \mathbf{0}$. By Lyapunov’s direct method, the origin $\mathbf{\zeta} = \mathbf{0}$ is globally asymptotically stable. This guarantees that all drones achieve and maintain the desired formation drone light show pattern exponentially.

4. Simulation Results and Performance Evaluation

To validate the proposed data-driven optimal control for formation drone light show applications, we simulate a scenario with four quadrotor drones forming a square pattern behind a virtual leader. The communication topology is a directed graph where only one drone receives the leader’s state information directly. The desired square formation is defined by the offsets:

Formation Offset Parameters for Square Pattern
Drone ID	X-Offset ($\rho_{0i}^x$)	Y-Offset ($\rho_{0i}^y$)
1	2 m	2 m
2	-2 m	2 m
3	-2 m	-2 m
4	2 m	-2 m

The drones start from random initial positions and velocities. The true mass and inertia parameters are assumed to have ±15% variation from their nominal values, unknown to the controller. The controller is designed using the proposed data-driven ADP method for the x, y, z, and yaw subsystems independently. For the x-subsystem (representative of the horizontal dynamics), we set $\mathbf{Q} = 4\mathbf{I}_4$, $\mathbf{R}=1$. The initial stabilizing gain is chosen as $\mathbf{K}_0 = [1, 3, 4, 3]$. Data is collected over a 5-second learning phase with a sampling time of $T_s=0.01$s, using an exploration signal $\mathbf{u}_s = \sum_{i} 0.2 \sin(\omega_i t)$ for excitation.

The policy iteration converges within 3-4 iterations. The resulting optimal gain for the x-subsystem is found to be $\mathbf{K}^* = [1.000, 3.078, 4.236, 3.078]$, closely matching what would be obtained from solving the ARE with the true (but unknown) parameters. The following table summarizes key performance metrics for the formation tracking:

Formation Tracking Performance Metrics
Performance Metric	X-Axis (RMS Error)	Y-Axis (RMS Error)	Z-Axis (RMS Error)	Settling Time (to within 5%)
Proposed Data-Driven Controller	0.042 m	0.038 m	0.015 m	8.2 s
Nominal Model-Based LQR	0.251 m	0.228 m	0.089 m	12.5 s

The simulation results clearly demonstrate the efficacy of the data-driven approach. The formation errors for all drones converge to zero rapidly and smoothly, despite the unknown parameter variations. The drones successfully achieve and hold the precise square formation while following the virtual leader’s trajectory. The data-driven controller significantly outperforms a standard LQR controller designed using only the inaccurate nominal model, which exhibits larger steady-state errors and slower convergence due to model mismatch. This robustness is critical for a flawless formation drone light show, where even small persistent errors would distort the aerial image.

5. Conclusion and Future Directions

This article has presented a comprehensive data-driven optimal control framework for coordinating UAV formations, with a focused application towards enabling robust and precise formation drone light show performances. The core contribution is a novel adaptive dynamic programming algorithm that synthesizes optimal distributed controllers without requiring an explicit mathematical model of the individual drone dynamics. By leveraging real-time input-state data, the method inherently compensates for model uncertainties and parameter variations, which are inevitable in large-scale, real-world deployments.

The proposed method decouples the complex quadrotor dynamics, formulates the formation problem using a virtual leader-followers structure, and solves the associated optimal control problem through a data-based policy iteration. Rigorous stability analysis using Lyapunov theory proves the convergence of the algorithm and the asymptotic stability of the formation. Simulation studies confirm that the controller achieves high-precision formation tracking with strong robustness to parameter perturbations, outperforming traditional model-based designs.

Future work will focus on extending this framework to handle more challenging aspects of formation drone light show operations. This includes incorporating obstacle and inter-agent collision avoidance constraints directly into the data-driven learning process, developing event-triggered communication protocols to reduce bandwidth usage in massive swarms, and testing the algorithm on hardware platforms under realistic environmental disturbances like wind. The integration of this data-driven optimal control strategy paves the way for more intelligent, adaptive, and reliable autonomous formation drone light show systems capable of executing increasingly complex aerial choreographies with guaranteed performance.