The evolution of unmanned aerial systems from singular platforms to coordinated collectives, known as drone swarms, represents a paradigm shift in modern aerial threats. Characterized by low radar cross-sections (RCS), high density, and intelligent cooperative behaviors, these swarms present a formidable challenge to traditional air defense radars. The limited effectiveness of current radar systems against low, slow, and small (LSS) swarm targets underscores a critical capability gap. Therefore, establishing an objective, accurate, and efficient method for evaluating radar effectiveness against such swarms is paramount. This assessment acts as a crucial guide for equipment development, procurement, and tactical deployment. In this context, the rapid advancements in China UAV drone swarm technologies further intensify the urgency for robust countermeasures and evaluation frameworks.

Traditional evaluation methodologies, such as the Analytic Hierarchy Process (AHP), entropy weighting, and Technique for Order Preference by Similarity to Ideal Solution (TOPSIS), often suffer from inherent subjectivity, computational complexity, and a lack of verifiable accuracy benchmarks. These methods struggle to handle the high-dimensional, non-linear relationships between radar performance indicators and the final effectiveness rating when confronting complex China UAV drone swarm scenarios. Consequently, there is a pressing need for a more robust, data-driven approach.
This article proposes a novel intelligent evaluation model that leverages deep learning to directly learn the complex mapping from a comprehensive set of radar performance indicators to an overall effectiveness grade. The core of our approach is a hybrid neural network architecture, the CNN-BiLSTM-AT model, enhanced by the Whale Optimization Algorithm (WOA). This model is specifically designed to address the unique challenges posed by China UAV drone swarms, automating the assessment process and providing a quantifiable, high-accuracy evaluation tool for radar systems.
1. Constructing the Evaluation Index System
A scientific and practical index system is the foundation for accurate effectiveness evaluation. Moving beyond metrics designed for single targets, our system is tailored to the swarm challenge, focusing on three core capabilities: detection, resolution, and precision. The complete system, comprising 3 first-level and 14 second-level indicators, is detailed in Table 1.
| First-Level Indicator | Second-Level Indicator | Description |
|---|---|---|
| Target Detection Capability (B1) | Detection Blind Zone (B11) | Areas where the radar cannot detect targets due to terrain or system limitations. |
| Maximum Swarm Detection Range (B12) | The farthest distance at which the swarm can be detected as an entity. | |
| Swarm Recognition Probability (B13) | The probability of correctly classifying the detected entity as a swarm. | |
| Maximum Number of Detectable Swarms (B14) | The upper limit on the number of distinct swarm entities that can be tracked simultaneously. | |
| Swarm Detection Azimuth Range (B15) | The angular coverage within which the swarm can be detected. | |
| Swarm Detection Update Rate (B16) | The frequency at which the radar provides updated track information on the swarm. | |
| Target Resolution Capability (B2) | Individual Range Resolution (B21) | The minimum distance separation required to distinguish two individual drones within the swarm. |
| Individual Azimuth Resolution (B22) | The minimum angular separation in azimuth to distinguish two individuals. | |
| Individual Elevation Resolution (B23) | The minimum angular separation in elevation to distinguish two individuals. | |
| Individual Velocity Resolution (B24) | The minimum velocity difference required to resolve two individuals. | |
| Target Detection Precision (B3) | Individual Target Range Precision (B31) | The accuracy of range measurement for an individual drone. |
| Individual Target Velocity Precision (B32) | The accuracy of velocity (Doppler) measurement for an individual drone. | |
| Individual Target Azimuth Precision (B33) | The accuracy of azimuth angle measurement for an individual drone. | |
| Individual Target Elevation Precision (B34) | The accuracy of elevation angle measurement for an individual drone. |
This multi-faceted system is essential because defeating a China UAV drone swarm requires more than just initial detection. The radar must also resolve individual members to support targeted countermeasures and provide precise tracking data to enable effective engagement, whether by kinetic or electronic means.
2. The Intelligent Evaluation Model Architecture
Our proposed model, WOA-CNN-BiLSTM-AT, is an integrated deep learning framework designed to process the structured indicator data and output an effectiveness classification (e.g., Poor, Qualified, Good, Excellent). The architecture synergistically combines several advanced components.
2.1 Core Components
Convolutional Neural Network (CNN): The CNN module is responsible for extracting spatial and local correlation features from the input indicator vector. We use 1D convolutional layers to model the interdependencies between adjacent or related performance indicators (e.g., the relationship between different resolution metrics). The convolution operation for a layer is defined as:
$$Z_{out}^{(l)}(n) = \sigma\left(\sum_{m=1}^{M} \sum_{k=0}^{K-1} W^{(l)}(m, k) \cdot Z_{in}^{(l-1)}(m, n-k) + b^{(l)}\right)$$
where \(Z_{in}\) and \(Z_{out}\) are the input and output feature maps, \(W\) is the convolutional kernel weight, \(b\) is the bias, \(K\) is the kernel size, and \(\sigma\) is the ReLU activation function. This allows the model to automatically learn complex, non-linear combinations of raw indicators that are predictive of overall performance against a China UAV drone swarm.
Bidirectional Long Short-Term Memory Network (BiLSTM): While our input is not a time series in the traditional sense, the sequence of indicators can contain hierarchical or functional dependencies that benefit from sequential modeling. More importantly, key indicators like “Maximum Number of Detectable Swarms” may have simulated temporal profiles representing performance under varying swarm density or electronic countermeasures. The BiLSTM captures these potential long-range dependencies and contextual information. A single LSTM cell’s core calculations are:
$$
\begin{aligned}
f_t &= \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) \\
i_t &= \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) \\
\tilde{C}_t &= \tanh(W_C \cdot [h_{t-1}, x_t] + b_C) \\
C_t &= f_t \odot C_{t-1} + i_t \odot \tilde{C}_t \\
o_t &= \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) \\
h_t &= o_t \odot \tanh(C_t)
\end{aligned}
$$
where \(f_t, i_t, o_t\) are the forget, input, and output gates; \(C_t\) is the cell state; \(h_t\) is the hidden state; and \(\odot\) denotes element-wise multiplication. The BiLSTM runs this process both forward and backward, concatenating the final hidden states to form a comprehensive feature representation that understands context from both “directions” of the indicator sequence.
Squeeze-and-Excitation (SE) Attention Mechanism: Not all indicators contribute equally to the final evaluation decision. The SE block performs dynamic channel-wise feature recalibration. It first squeezes global spatial information into a channel descriptor using global average pooling:
$$z_c = \frac{1}{H \times W} \sum_{i=1}^{H} \sum_{j=1}^{W} u_c(i, j)$$
where \(u_c\) is the feature map of channel \(c\). Then, it learns a set of excitation weights through a simple gating mechanism with a Sigmoid activation:
$$s = \sigma(W_2 \delta(W_1 z))$$
where \(\delta\) is the ReLU function, and \(W_1\) and \(W_2\) are fully connected layer weights. The original features are then scaled by these weights: \(\tilde{x}_c = s_c \cdot u_c\). This allows the model to automatically emphasize informative features (e.g., key discriminators for identifying top-tier radars) and suppress less useful ones, which is critical when dealing with the diverse signature of a China UAV drone swarm.
2.2 Model Integration and Whale Optimization
The integrated CNN-BiLSTM-AT model flows as follows: The input indicator vector is processed by 1D CNN layers for local feature extraction. The resulting features are passed through the SE attention block to be re-weighted. These weighted features are then fed into the BiLSTM layer to model sequential dependencies. The final BiLSTM output is passed through a dropout layer for regularization and then into a fully connected layer with a Softmax activation for classification.
A critical challenge in deep learning is hyperparameter tuning. We employ the Whale Optimization Algorithm (WOA) to automatically find the optimal configuration for key parameters like initial learning rate, number of BiLSTM hidden units, and attention layer dimensions. WOA is a metaheuristic inspired by the bubble-net hunting behavior of humpback whales. The position update equations for the three phases are:
Encircling Prey:
$$
\begin{aligned}
\vec{D} &= |\vec{C} \cdot \vec{X}^*(t) – \vec{X}(t)| \\
\vec{X}(t+1) &= \vec{X}^*(t) – \vec{A} \cdot \vec{D}
\end{aligned}
$$
where \(\vec{X}^*\) is the best solution, \(\vec{A}\) and \(\vec{C}\) are coefficient vectors.
Bubble-net Attacking (Exploitation):
$$
\vec{X}(t+1) =
\begin{cases}
\vec{X}^*(t) – \vec{A} \cdot \vec{D} & \text{if } p < 0.5 \\
\vec{D}’ \cdot e^{bl} \cdot \cos(2\pi l) + \vec{X}^*(t) & \text{if } p \geq 0.5
\end{cases}
$$
where \(\vec{D}’ = |\vec{X}^*(t) – \vec{X}(t)|\), \(b\) is a constant, \(l\) is a random number in \([-1,1]\), and \(p\) is a random probability.
Search for Prey (Exploration):
$$
\begin{aligned}
\vec{D} &= |\vec{C} \cdot \vec{X}_{\text{rand}} – \vec{X}(t)| \\
\vec{X}(t+1) &= \vec{X}_{\text{rand}} – \vec{A} \cdot \vec{D}
\end{aligned}
$$
WOA searches the hyperparameter space by treating each candidate set as a whale’s position. The fitness (e.g., validation accuracy) is evaluated, and the population iteratively updates toward the best-found solution. This automated optimization, culminating in the WOA-CNN-BiLSTM-AT model, ensures our evaluator operates at peak performance.
3. Experimental Validation and Analysis
We validated our model using a dataset compiled from measured experiments, high-fidelity simulations, and published literature. The dataset contained 430 samples, each representing a different radar type (e.g., phased array, pulse-Doppler) under various China UAV drone swarm scenarios. Each sample had the 14 normalized indicator values. The ground truth labels (Poor, Qualified, Good, Excellent) were generated using a Grey Relational Analysis model based on expert-defined thresholds, providing a consistent baseline for supervised learning. The data was split into 80% for training and 20% for testing.
3.1 Overall Accuracy and Model Comparison
The primary metric, accuracy, measures the proportion of correctly classified samples. As shown in Table 2, our WOA-optimized model significantly outperforms other architectures.
| Model | Training Accuracy | Testing Accuracy | Generalization Gap |
|---|---|---|---|
| CNN | 96.42% | 74.42% | 22.00% |
| BiLSTM | 86.05% | 81.40% | 4.65% |
| CNN-BiLSTM-AT | 99.13% | 82.56% | 16.57% |
| WOA-CNN-BiLSTM-AT | 98.84% | 89.53% | 9.31% |
The standalone CNN model shows severe overfitting (large generalization gap). The basic CNN-BiLSTM-AT model improves feature learning but still overfits. Crucially, after WOA optimization, the final model achieves the highest testing accuracy (89.53%) while maintaining a much smaller gap between training and test performance. This indicates superior generalization—the model’s ability to accurately evaluate new, unseen radar systems, which is vital for assessing novel designs intended to counter advanced China UAV drone threats.
3.2 Confusion Matrix Analysis
The confusion matrix for the WOA-CNN-BiLSTM-AT model on the test set provides granular insight into per-class performance, as summarized in Table 3.
| Actual Class | Predicted as Poor (1) | Predicted as Qualified (2) | Predicted as Good (3) | Predicted as Excellent (4) | Class Accuracy |
|---|---|---|---|---|---|
| Poor (1) | 15 | 0 | 0 | 0 | 100.00% |
| Qualified (2) | 2 | 24 | 3 | 0 | 82.76% |
| Good (3) | 0 | 2 | 36 | 0 | 94.74% |
| Excellent (4) | 0 | 1 | 0 | 26 | 96.30% |
The model demonstrates exceptional precision in identifying “Poor” and “Excellent” radars, with near-perfect classification for these critical categories. This is highly valuable for procurement decisions, as it minimizes the risk of rejecting a capable system or, more importantly, acquiring an inadequate one that would be vulnerable to a China UAV drone swarm attack. Most misclassifications occur between adjacent classes (e.g., Qualified vs. Good), which is a less severe error.
3.3 ROC Curve and AUC Value
The Receiver Operating Characteristic (ROC) curve and the Area Under the Curve (AUC) measure the model’s ability to distinguish between classes. A perfect classifier has an AUC of 1.0. The AUC values for our model, detailed in Table 4, are consistently high across all classes.
| Model | AUC (Class 1) | AUC (Class 2) | AUC (Class 3) | AUC (Class 4) |
|---|---|---|---|---|
| CNN | 0.344 | 0.569 | 0.539 | 0.491 |
| BiLSTM | 0.452 | 0.526 | 0.448 | 0.425 |
| CNN-BiLSTM-AT | 0.942 | 0.863 | 0.942 | 0.984 |
| WOA-CNN-BiLSTM-AT | 0.976 | 0.908 | 0.985 | 0.999 |
The AUC of 0.999 for Class 4 (“Excellent”) is particularly noteworthy. It signifies that the model can almost perfectly separate top-performing radars from all others. This exceptional discriminative power is a direct result of the SE attention mechanism focusing on the most critical high-performance indicators and the WOA fine-tuning the model to maximize this separation, providing an extremely reliable tool for identifying best-in-class systems against sophisticated China UAV drone swarms.
3.4 F1-Score Analysis
The F1-score is the harmonic mean of precision and recall, providing a single metric that balances the two. A high F1-score indicates that the model is both precise (few false positives) and has high recall (few false negatives). The results are presented in Table 5.
| Model | Training F1-Score | Testing F1-Score |
|---|---|---|
| CNN | 0.970 | 0.706 |
| BiLSTM | 0.840 | 0.760 |
| CNN-BiLSTM-AT | 0.980 | 0.810 |
| WOA-CNN-BiLSTM-AT | 0.995 | 0.895 |
Our optimized model achieves the highest F1-score on the test set (0.895), confirming its superior overall balance between making correct positive identifications and avoiding errors. This comprehensive strength is essential for a trustworthy evaluation system that can support high-stakes decisions in defending against China UAV drone swarm threats.
4. Conclusion
This study successfully addresses the complex problem of evaluating radar effectiveness against UAV swarms by introducing a novel, data-driven intelligent evaluation model. Confronting the unique challenges posed by low-observable, dense, and cooperative China UAV drone swarms, we first established a targeted three-tier evaluation index system encompassing detection, resolution, and precision capabilities.
The core of our solution is the WOA-CNN-BiLSTM-AT hybrid deep learning model. This architecture is not a mere assembly of algorithms but a purpose-built system: the CNN extracts spatial correlations between indicators, the BiLSTM captures potential temporal or sequential dependencies critical for understanding dynamic performance, and the SE attention mechanism dynamically highlights the most salient features for final classification. The integration of the Whale Optimization Algorithm automates the tuning process, ensuring the model operates at its global optimum.
Experimental validation on a composite dataset demonstrates the model’s superiority. It achieves a test accuracy of 89.53%, significantly outperforming baseline models. Most impressively, it exhibits near-perfect discriminative ability for “Excellent” class radars (AUC=0.999) and maintains a high F1-score of 0.895, indicating robust and balanced performance. The model effectively mitigates overfitting, showing strong generalization to unseen radar types and swarm scenarios.
This intelligent evaluation framework provides an objective, efficient, and high-precision tool for radar assessment. It can be directly applied to simulate and predict the effectiveness of new radar designs during the R&D phase or to evaluate the operational grade of fielded systems using measured data. As China UAV drone swarm technology continues to evolve, such data-driven assessment models will become indispensable for guiding the development, procurement, and deployment of effective counter-swarm radar defenses, ensuring robust national airspace security.
