Vision-Based Indoor Formation Drone Light Show System

In recent years, unmanned aerial vehicles (UAVs), commonly known as drones, have garnered significant attention due to their structural simplicity, high mobility, low cost, and adaptability. These characteristics have led to widespread applications in fields such as aerial photography, geological exploration, agricultural plant protection, and disaster rescue. However, the reliance on GPS for navigation and positioning has limited their use in indoor environments where GPS signals are weak or unavailable. This limitation is particularly critical for applications like indoor formation drone light shows, which require precise coordination and positioning without external signals. To address this challenge, our research team has developed an innovative vision-based system for indoor formation flight of small drones, leveraging AprilTags as fiducial markers for localization. This system enables robust trajectory tracking, formation maintenance, and dynamic shape transformations, paving the way for intricate indoor formation drone light shows that captivate audiences with synchronized maneuvers.

The core of our system lies in replacing GPS-dependent navigation with computer vision techniques. By using downward-facing cameras on drones to detect AprilTags placed on the ground, we compute real-time position and velocity estimates, facilitating autonomous control. We implemented this approach using four Ardrone 2.0 quadrotors as flight platforms and four graphics workstations for image processing. Through extensive experimentation, we demonstrated that our system can effectively perform trajectory tracking and formation control indoors, validating its feasibility and effectiveness for applications such as formation drone light shows. This breakthrough not only expands the operational domain of drones but also opens new avenues for entertainment and industrial displays where GPS is unreliable.

In this article, I will describe our system in detail from a first-person perspective, covering the hardware and software design, algorithmic foundations, experimental validation, and implications for future formation drone light shows. We will incorporate numerous tables and mathematical formulas to summarize key concepts and results, ensuring a comprehensive understanding. Throughout the discussion, we will emphasize the potential of this technology for formation drone light shows, highlighting how visual localization can enable dazzling aerial performances in confined spaces.

Our work builds upon existing research in UAV swarm robotics and visual fiducial systems. Traditionally, drone formations rely on differential GPS and INS (Inertial Navigation System) combinations, which are ineffective indoors. Alternative methods include motion capture systems, but these are costly and require controlled environments. In contrast, AprilTags offer a low-cost, robust solution for visual localization. By integrating AprilTags with onboard sensors, we achieve high-frequency position updates through sensor fusion, crucial for real-time formation drone light shows that demand smooth and synchronized movements.

The system hardware comprises four key components: the drones, cameras, AprilTags ground markers, and a ground control station. Each Ardrone 2.0 is equipped with a bottom-mounted camera that captures images of the ground. Wi-Fi modules on the drones are configured in AP-Client mode to bridge to a wireless router, allowing four graphics workstations connected via Ethernet to receive image data. These workstations process the images to compute position and velocity, while a master workstation runs formation control algorithms. The ground markers are arranged in a grid pattern using AprilTags from the tag36h11 family, with a spacing of 0.4 meters between rows and columns. This setup creates a predictable reference map for localization, essential for coordinating a formation drone light show where drones must maintain precise relative positions.

To illustrate the hardware specifications, we present Table 1, which summarizes the key components and their parameters.

Table 1: Hardware Components and Specifications
Component	Model/Specification	Role in System
Drone	Ardrone 2.0 Quadrotor	Flight platform with IMU, camera, and Wi-Fi
Camera	Bottom-mounted, 720p resolution	Captures ground images for visual localization
Ground Markers	AprilTags (tag36h11), 180 units	Fiducial markers for position calculation
Workstation	Graphics workstation with GPU	Processes images and runs control algorithms
Router	Wireless router with Ethernet ports	Facilitates communication between drones and workstations

The software architecture is designed to handle image processing, localization, and multi-agent control. As shown in Figure 4 of the original paper (not reproduced here), the system integrates several modules: AprilTags detection, position estimation, sensor fusion, trajectory planning, and formation control. The master workstation receives IMU data and visual position estimates from all drones, computes control commands based on formation algorithms, and sends them to individual drones. This closed-loop control enables autonomous flight and formation maintenance, which is vital for a seamless formation drone light show where drones must adapt to dynamic patterns.

We now delve into the algorithmic details, starting with the AprilTags indoor localization algorithm. The goal is to compute the drone’s position and orientation relative to a world coordinate system defined by the AprilTags grid. Let $O_w$ be the world coordinate system with its origin at the center of the Tag ID=0. Let $O_c$ be the camera coordinate system centered at the camera’s optical center, and $O_p$ be the drone body coordinate system at its center of gravity. The AprilTags library provides the pixel coordinates of Tag corners in the image and their corresponding IDs. Given the known world coordinates of Tag corners from the map, we use the Perspective-n-Point (PnP) algorithm to compute the transformation matrix from $O_c$ to $O_w$.

The PnP algorithm solves for the rotation matrix $R$ and translation vector $t$ that minimize the reprojection error. Formally, for a set of 3D points $P_i$ in world coordinates and their 2D projections $p_i$ in image coordinates, we find $R$ and $t$ such that:

$$ p_i = K [R | t] P_i $$

where $K$ is the camera intrinsic matrix. Once $R$ and $t$ are obtained, the camera position in world coordinates is given by $t$, and the orientation is derived from $R$. The transformation from world to camera coordinates is then $[R^T | -R^T t]$. However, visual position updates occur at only 10 Hz due to computational latency, which is insufficient for real-time control in a formation drone light show. To address this, we fuse visual data with the drone’s onboard IMU and optical flow data, which are available at 200 Hz. We implement a complementary filter that integrates optical flow velocities to interpolate position between visual updates, accounting for a measured image delay of 150 ms. The fused position $p_f(t)$ at time $t$ is computed as:

$$ p_f(t) = p_v(t – \Delta t) + \int_{t-\Delta t}^{t} v_{of}(\tau) d\tau $$

where $p_v$ is the visual position estimate, $v_{of}$ is the optical flow velocity, and $\Delta t$ is the delay compensation. This sensor fusion approach yields a 200 Hz position output, ensuring smooth control for formation drone light shows.

The control algorithm for individual drones follows a hierarchical structure: position loop, velocity loop, and attitude loop. The Ardrone 2.0 provides an inner attitude control loop that accepts commands for pitch, roll, yaw, and vertical speed. We design outer loops for velocity and position using PID controllers. The position controller generates desired velocity commands based on position error, and the velocity controller produces attitude commands. For a drone $i$, let $p_i^{des}(t)$ be the desired position from the formation trajectory, and $p_i(t)$ be the estimated position. The position error $e_{p,i}(t)$ is:

$$ e_{p,i}(t) = p_i^{des}(t) – p_i(t) $$

The desired velocity $v_i^{des}(t)$ is computed by a PID controller:

$$ v_i^{des}(t) = K_{p,p} e_{p,i}(t) + K_{i,p} \int e_{p,i}(t) dt + K_{d,p} \frac{de_{p,i}(t)}{dt} $$

Similarly, the velocity error $e_{v,i}(t) = v_i^{des}(t) – v_i(t)$, where $v_i(t)$ is the estimated velocity, is fed into a velocity PID controller to generate attitude commands. This cascaded control structure ensures stable trajectory tracking, which is fundamental for any formation drone light show requiring precise path following.

For formation control, we designate one drone as the leader (e.g., drone 1) and others as followers. The desired formation shape is defined relative to the leader’s position. For a follower drone $j$, the ideal position $p_j^{ideal}(t)$ is:

$$ p_j^{ideal}(t) = p_1(t) + d_j $$

where $d_j$ is the desired offset vector for the formation pattern. During flight, a formation compensation term is added to the velocity control loop to maintain the shape. The compensation velocity $v_{comp,j}(t)$ is proportional to the formation error:

$$ v_{comp,j}(t) = K_f (p_j^{ideal}(t) – p_j(t)) $$

where $K_f$ is a formation gain. This compensation is integrated into the velocity command, ensuring that drones adjust their motions cohesively. This algorithm enables dynamic formation transformations, such as switching from a square to a line pattern, which are common in formation drone light shows to create visually appealing sequences.

To quantify the control parameters, we present Table 2, which lists the PID gains used in our experiments.

Table 2: PID Controller Gains for Position and Velocity Loops
Controller	Proportional Gain ($K_p$)	Integral Gain ($K_i$)	Derivative Gain ($K_d$)
Position	0.8	0.05	0.2
Velocity	1.2	0.1	0.3

We conducted two main experiments to validate our system: visual localization accuracy testing and four-drone formation flight. For the localization test, we fixed a drone and moved it along a straight path, comparing the estimated position from our algorithm with ground truth measurements. The results showed an average positioning error of 0.2 meters, which is significantly smaller than the minimum inter-drone spacing of 2 meters in our formations. This accuracy is sufficient for indoor formation drone light shows where tight tolerances are needed for aesthetic effects.

The formation flight experiment involved four drones performing predefined trajectories in square and line formations, with in-flight shape transitions. We recorded the actual positions of each drone and computed deviations from the ideal trajectory. As summarized in Table 3, the maximum trajectory error was less than 0.4 meters, and the formation maintained coherence throughout the 2-minute flight. These results demonstrate the system’s capability to support complex formation drone light shows indoors.

Table 3: Formation Flight Performance Metrics
Metric	Value	Description
Positioning Accuracy	0.2 m	Average error in visual localization
Trajectory Tracking Error	< 0.4 m	Maximum deviation from desired path
Formation Maintenance Error	< 0.3 m	Average offset from ideal formation position
Flight Duration	120 seconds	Total time of formation experiment

The success of these experiments underscores the potential of our vision-based system for enabling formation drone light shows in GPS-denied environments. By leveraging AprilTags and sensor fusion, we achieve reliable localization and control, allowing drones to execute synchronized maneuvers with high precision. This technology can be adapted to various indoor venues, such as theaters, stadiums, or exhibition halls, where traditional GPS-based systems fail. Moreover, the scalability of the system—through adding more drones or optimizing AprilTags placement—makes it suitable for large-scale formation drone light shows involving dozens or hundreds of drones.

However, there are limitations to address. The reliance on pre-placed AprilTags restricts the operational area to prepared environments. Future work could explore simultaneous localization and mapping (SLAM) techniques to enable autonomous navigation without fiducial markers. Additionally, enhancing the robustness to lighting changes and occlusions would improve reliability for real-world formation drone light shows. We also plan to integrate LED lights on drones to create visual effects, further aligning with the concept of a formation drone light show that combines motion and illumination.

In conclusion, our vision-based indoor formation system provides a viable solution for drone swarms operating without GPS. The integration of AprilTags for localization, coupled with hierarchical control algorithms, enables precise trajectory tracking and formation maintenance. Experimental results confirm the system’s accuracy and effectiveness, paving the way for innovative applications like indoor formation drone light shows. As drone technology advances, such systems will play a crucial role in expanding the boundaries of aerial entertainment and automation. We envision a future where formation drone light shows become commonplace indoors, captivating audiences with intricate patterns and seamless coordination, all made possible by robust visual localization and control.

To further illustrate the mathematical foundations, we can express the overall system dynamics. Let the state of drone $i$ be $x_i = [p_i, v_i, \theta_i]^T$, where $p_i$ is position, $v_i$ is velocity, and $\theta_i$ is attitude. The control input $u_i$ includes attitude commands. The system model can be linearized for control design:

$$ \dot{x}_i = A x_i + B u_i $$

where $A$ and $B$ are matrices derived from quadrotor dynamics. The formation control law can be formulated as a consensus problem, minimizing the error between drones’ states. For $N$ drones, the global formation error $E$ is:

$$ E = \sum_{i=1}^N \| p_i – p_i^{ideal} \|^2 $$

By applying gradient descent, we derive control inputs that reduce $E$, ensuring cohesive movement. This approach aligns with strategies used in formation drone light shows to achieve synchronized patterns.

In summary, this article has presented a comprehensive overview of our indoor vision-based formation system, emphasizing its relevance to formation drone light shows. Through detailed algorithms, experimental validation, and future directions, we hope to inspire further research and development in this exciting field. The fusion of computer vision and multi-agent control holds immense promise for creating mesmerizing aerial displays that transcend the limitations of outdoor GPS-dependent systems.