From Battlefield to Sky Canvas: A Comprehensive Effectiveness Evaluation Framework for Formation Drone Light Shows

The evolution of unmanned aerial vehicle (UAV) technology has transcended its military origins, seeding revolutionary applications in the civilian sphere. Among the most visually spectacular is the formation drone light show, where hundreds or thousands of synchronized drones act as dynamic pixels in the night sky, creating complex, animated three-dimensional imagery. While aesthetically driven, the execution of a flawless formation drone light show is a problem of immense technical complexity, paralleling the challenges of military UAV swarm operations—precise coordination, robust communication, fault tolerance, and mission execution under constraints. Therefore, systematically evaluating the effectiveness of a drone swarm system, whether for strategic reconnaissance or for public entertainment, is crucial for design optimization, operational planning, and risk mitigation. In this article, I develop a comprehensive, quantitative effectiveness evaluation methodology tailored for formation drone light show systems, synthesizing established multi-criteria decision-making tools into a novel, dynamic framework.

The core challenge in evaluating a formation drone light show system lies in defining what constitutes “effectiveness.” It is a multifaceted concept encompassing not just the final visual output, but the reliability, efficiency, and robustness of the entire process from pre-flight checks to show completion. A successful evaluation must translate subjective artistic and experiential goals into objective, measurable indicators. Drawing parallels from systems engineering and operational research, I construct a hierarchical evaluation index system that breaks down the overarching goal into manageable, quantifiable components.

The primary objective of a formation drone light show is to deliver a captivating and reliable aerial performance. This objective can be deconstructed into four primary capability domains: Visual Fidelity, Operational Reliability, Command and Control Efficiency, and Logistical Efficiency. Each of these first-level indices is further decomposed into more specific, directly measurable second-level and third-level indicators, forming a complete capability hierarchy.

For instance, Visual Fidelity (A1) concerns the quality of the displayed image. Its sub-components include:
– Geometric Precision (A11): Accuracy in maintaining assigned positions within the formation.
– Color Uniformity & Brightness (A12): Consistency and intensity of LED lights across all drones.
– Animation Smoothness (A13): The fluidity of transitions between formation shapes.
– Formation Scale Density (A14): The number of drones per unit volume, affecting image resolution.

Operational Reliability (A2) ensures the show proceeds without critical failures. Its indicators are:
– Single-Point Failure Resistance (A21): The system’s ability to continue functioning after the loss of one or several drones.
– Weather Tolerance (A22): Operational limits for wind speed, precipitation, and temperature.
– Communication Link Robustness (A23): Resistance to interference and signal dropout.
– Battery Life Safety Margin (A24): The ratio of actual flight time to required show time.

Command and Control Efficiency (A3) deals with the planning and execution layer:
– Pre-show Calibration Time (A31): Time required for system initialization, GPS locking, and self-checks.
– Real-time Monitoring Granularity (A32): The level of detail (e.g., individual drone status) available to the operator.
– Abort & Safety Protocol Speed (A33): Time to execute an emergency landing or safe hold pattern.
– Software Usability (A34): Ease of choreography design and flight path planning.

Logistical Efficiency (A4) covers the practical deployment aspects:
– Deployment/Packing Time (A41): Time to set up or dismantle the ground equipment and drone arrays.
– Transportation Footprint (A42): Volume and weight of the total system for transportation.
– Personnel Requirement (A43): Number of trained personnel needed to operate the show.
– Mean Time Between Failures (MTBF) (A44): Average operational time before a hardware failure occurs.

This hierarchical breakdown can be summarized in the following table:

Primary Capability (Level 1)	Secondary Capability (Level 2)	Tertiary Indicator (Level 3) – Examples
Visual Fidelity (A1)	Geometric Precision (A11)	Positioning Error (RMS), Formation Shape Distortion Metric
	Color & Luminance (A12)	Color Deviation Index, Minimum Visible Brightness
	Animation Smoothness (A13)	Frame Rate of Formation Change, Jerk (derivative of acceleration) in paths
	Scale & Density (A14)	Drones per Cubic Meter, Total Usable Drones in Fleet
Operational Reliability (A2)	Failure Resistance (A21)	Redundancy Level, Graceful Degradation Score
	Weather Tolerance (A22)	Max Wind Speed, IP Rating, Operating Temperature Range
	Communication Robustness (A23)	Packet Loss Rate, Frequency Hopping Capability, Signal-to-Noise Ratio
	Power Safety (A24)	Battery Redundancy, Remaining Power Post-Show
C2 Efficiency (A3)	Calibration Time (A31)	Time from power-on to “ready for takeoff” status
	Monitoring Granularity (A32)	Number of simultaneously monitored parameters per drone
	Safety Protocol Speed (A33)	Time to initiate and complete emergency landing
	Software Usability (A34)	Task completion time for standard choreography edits
Logistical Efficiency (A4)	Deployment Time (A41)	Man-hours to set up full system
	Transport Footprint (A42)	Cubic volume of packed system, Total weight
	Personnel Requirement (A43)	Minimum crew size for safe operation
	System Reliability (A44)	MTBF (hours), Mean Time To Repair (MTTR)

With the evaluation system established, the next step is to quantify performance. Different indicators have different units and scales (e.g., time in seconds, error in meters, a score from 1-10). To enable comparison and aggregation, all raw data must be normalized. I classify indicators into three types and apply corresponding normalization. Let $ a_{ij} $ be the actual measured value for indicator $ j $ of drone model or show $ i $, and $ a’_{j} $ be the desired target or requirement for that indicator.

Benefit-type Indicators: Higher is better (e.g., Battery Safety Margin, MTBF).
$$ r_{ij} = \frac{a_{ij}}{a’_{j}} \quad \text{(capped at 1 if applicable)} $$

Cost-type Indicators: Lower is better (e.g., Deployment Time, Positioning Error).
$$ r_{ij} = \frac{a’_{j}}{a_{ij}} \quad \text{(capped at 1 if applicable)} $$

Boolean-type Indicators: Presence or absence of a feature (e.g., Frequency Hopping: Yes/No).
$$ r_{ij} = \begin{cases}
1, & \text{if capability is present} \\
0, & \text{if capability is absent}
\end{cases} $$

This process yields a normalized decision matrix $ \mathbf{R} = [r_{ij}]_{m \times n} $, where $ m $ is the number of alternative systems or shows being compared, and $ n $ is the number of terminal indicators.

The heart of the proposed methodology is a synthesized algorithm I term the Dynamic Hybrid Delphi-AHP Deviation Grey-Fuzzy (DHDGF) method. It integrates the strengths of several established techniques to balance expert judgment with objective data, and static weights with dynamic performance-based adjustments, specifically for the context of a formation drone light show.

Step 1: Determining Static Weights via AHP. The Analytic Hierarchy Process (AHP) is used to establish the initial importance weights of the indicators based on expert judgment. Experts compare pairs of indicators at each level of the hierarchy using a standard 1-9 scale of relative importance. This yields pairwise comparison matrices. For example, for the primary capabilities affecting a formation drone light show, an expert might judge that Visual Fidelity (A1) is moderately more important than Logistical Efficiency (A4). The eigenvector method is then applied to these matrices to derive local weights, which are subsequently synthesized to produce global static weights $ w_j^{static} $ for each terminal indicator $ j $. Consistency ratios (CR) are calculated to ensure the experts’ judgments are logically coherent.

Step 2: Dynamic Weight Adjustment via Deviation Degree Model. Static weights from AHP represent general importance but do not account for the specific performance profile of a given drone fleet or show setup. In a complex system like a formation drone light show, overall effectiveness is often constrained by the weakest links (the “barrel effect”). The Deviation Degree Model dynamically adjusts weights to penalize indicators where performance is relatively poor compared to the fleet’s average, thus highlighting critical deficiencies. For a specific show configuration, we calculate the average normalized value for each indicator $ j $ across all considered drone units or historical shows:
$$ \bar{r}_j = \frac{1}{m} \sum_{i=1}^{m} r_{ij} $$
The deviation factor $ d_{ij} $ for system $ i $ on indicator $ j $ is:
$$ d_{ij} = \begin{cases}
\frac{\alpha + \bar{r}_j}{\alpha + r_{ij}}, & \text{if } r_{ij} \le \bar{r}_j \quad \text{(Penalty for below-average performance)} \\
\frac{\beta + r_{ij}}{\beta + \bar{r}_j}, & \text{if } r_{ij} > \bar{r}_j \quad \text{(Reward for above-average performance)}
\end{cases} $$
Here, $ \alpha $ and $ \beta $ are tunable parameters controlling the severity of the barrel/reverse-barrel effect. The dynamically adjusted weight $ w_{ij}^{dynamic} $ is then:
$$ w_{ij}^{dynamic} = \frac{w_j^{static} \cdot d_{ij}}{\sum_{k=1}^{n} w_k^{static} \cdot d_{ik}} $$
This results in a unique weight vector for each evaluated formation drone light show configuration, reflecting its specific strengths and weaknesses.

Step 3: Building the Fuzzy Evaluation Matrix via Grey Relational Analysis. To handle the inherent subjectivity and fuzziness in judging show quality (e.g., “smooth,” “reliable”), we employ a Grey-Fuzzy approach. Experts are asked to rate a specific show or system configuration against each terminal indicator on a scale (e.g., 1-10). Let this score be $ s_{ij}^e $ for expert $ e $. We define an evaluation grade set, for example, $ V = \{\text{Poor}, \text{Fair}, \text{Good}, \text{Excellent}\} $, and assign numerical grey thresholds to each grade. Using grey relational analysis, we construct whitening weight functions for each grade. For instance, for the “Excellent” grade ($ V_4 $), a typical whitening function might be:
$$ f_4(s) = \begin{cases}
1, & s \in [9, 10] \\
\frac{s – 7}{2}, & s \in [7, 9) \\
0, & s \in [0, 7)
\end{cases} $$
For each indicator $ j $, we calculate the grey statistic for each grade $ g $ by summing the whitening function results across all $ E $ experts:
$$ \sigma_{gj} = \sum_{e=1}^{E} f_g(s_{ij}^e) $$
The total grey statistic for indicator $ j $ is $ \sigma_j = \sum_{g=1}^{4} \sigma_{gj} $. The fuzzy membership degree of indicator $ j $ to grade $ g $ is:
$$ r_{gj} = \frac{\sigma_{gj}}{\sigma_j} $$
All $ r_{gj} $ form the fuzzy evaluation matrix $ \mathbf{F} = [r_{gj}]_{4 \times n} $.

Step 4: Comprehensive Fuzzy Synthesis and Final Score. The final step is to synthesize the dynamic weights with the fuzzy evaluation matrix. The comprehensive evaluation vector $ \mathbf{B} $ for a specific formation drone light show is obtained through fuzzy composition:
$$ \mathbf{B} = \mathbf{W}_{dynamic} \circ \mathbf{F} = (b_1, b_2, b_3, b_4) $$
where $ \mathbf{W}_{dynamic} = (w_{i1}^{dynamic}, w_{i2}^{dynamic}, …, w_{in}^{dynamic}) $ is the row vector of dynamic weights for the evaluated system $ i $, and $ \circ $ denotes an appropriate fuzzy operator (e.g., weighted average). Finally, to obtain a crisp overall effectiveness score $ E $, we assign numerical values to the evaluation grades (e.g., $ V’ = [2, 5, 7, 9] $) and calculate the weighted sum:
$$ E = \sum_{g=1}^{4} b_g \cdot V’_g $$
This score $ E $, typically between 0 and 10, provides a quantitative measure of the formation drone light show system’s effectiveness.

To illustrate the practical application of the DHDGF framework, consider evaluating three different drone fleet configurations (Fleet A, B, C) for a large-scale public formation drone light show celebrating a national event. The show requires high visual fidelity and extreme reliability.

First, experts use AHP to determine static weights. For this high-profile event, Visual Fidelity (A1) and Operational Reliability (A2) are deemed most critical. A simplified resulting static weight vector for some key terminal indicators might be:
$$ \mathbf{W}_{static} = [0.12_{(A11)}, 0.08_{(A12)}, 0.10_{(A13)}, 0.15_{(A21)}, 0.18_{(A23)}, 0.07_{(A31)}, 0.10_{(A41)}, …] $$

Second, the normalized performance matrix $ \mathbf{R} $ is constructed from technical specs and test data. For example:

Indicator	Fleet A (r_Aj)	Fleet B (r_Bj)	Fleet C (r_Cj)	Avg (r̄_j)
Positioning Error (A11)	0.95	0.85	0.99	0.93
Comms Robustness (A23)	0.70	0.95	0.90	0.85
Deployment Time (A41)	0.80	0.95	0.60	0.78

Third, dynamic weights are calculated. For Fleet C on indicator A41 (Deployment Time, $ r_{C,A41}=0.60 $, below average $ \bar{r}_{A41}=0.78 $), with $ \alpha=0.2 $:
$$ d_{C,A41} = \frac{0.2 + 0.78}{0.2 + 0.60} = \frac{0.98}{0.80} = 1.225 $$
This increases the effective weight for this weak point for Fleet C. The dynamic weight vector $ \mathbf{W}_{C}^{dynamic} $ is then derived by normalizing $ w_j^{static} \cdot d_{Cj} $ across all $ j $.

Fourth, experts rate a test show performed by Fleet B. Their scores for “Animation Smoothness (A13)” are [9, 8, 8, 10, 7]. Using the whitening functions, we compute the fuzzy membership for A13, e.g., $ r_{Excellent, A13} = 0.45 $, $ r_{Good, A13} = 0.50 $, etc. This is repeated for all indicators to build Fleet B’s fuzzy matrix $ \mathbf{F}_B $.

Finally, we compute Fleet B’s comprehensive score. Assume:
$$ \mathbf{W}_{B}^{dynamic} = (0.13, 0.07, 0.11, 0.16, 0.20, 0.06, 0.08, …) $$
$$ \mathbf{B}_B = \mathbf{W}_{B}^{dynamic} \circ \mathbf{F}_B = (0.10, 0.25, 0.50, 0.15) $$
$$ E_B = 0.10*2 + 0.25*5 + 0.50*7 + 0.15*9 = 6.70 $$
A score of 6.70 suggests a “Good” overall effectiveness, with notable room for improvement in reliability to reach “Excellent.” Similar calculations for Fleets A and C would allow for a comparative selection, clearly showing that the DHDGF method provides a nuanced, multi-faceted evaluation crucial for planning a successful formation drone light show.

The proposed DHDGF framework offers a robust, adaptable, and scientifically-grounded methodology for evaluating formation drone light show systems. By integrating AHP’s structured judgment, the dynamic correction of the deviation model, and the fuzziness handling of grey-fuzzy theory, it addresses both the subjective priorities of show designers and the objective performance data of the hardware and software. This method moves beyond simple checklists or one-dimensional metrics, providing a holistic score that can guide procurement decisions (e.g., choosing between drone models), operational planning (e.g., identifying system vulnerabilities before a major show), and post-event analysis. As the technology behind formation drone light shows continues to advance, applying such rigorous systems engineering evaluation techniques will be key to pushing the boundaries of scale, reliability, and artistic expression in this mesmerizing field.