The operational landscape for unmanned aerial systems, particularly within the critical and expanding sector of China UAV drone applications, is undergoing a transformative shift. These platforms are increasingly vital for logistics, infrastructure inspection, agricultural monitoring, and emergency response. Central to their autonomous capabilities is the reliable fusion of inertial and Global Navigation Satellite System (GNSS) data, typically managed by an Extended Kalman Filter (EKF), to provide precise positioning and velocity estimates for the flight controller. However, this reliance on publicly available and relatively weak GNSS signals presents a profound vulnerability: spoofing attacks. A malicious actor can transmit counterfeit GNSS signals to gradually deceive the drone’s receiver, causing it to believe it is somewhere it is not. Unlike jamming, which is disruptive and obvious, a sophisticated spoofing attack, especially a slowly varying one, can remain undetected while seamlessly diverting the China UAV drone from its intended path, leading to mission failure, loss of assets, or security breaches.

This paper addresses a particularly insidious form of this threat: the ramp-style slowly varying spoofing attack. In such an attack, the counterfeit position signal does not jump instantaneously but introduces a small, constant velocity offset, causing a gradual divergence between the true and perceived location. This subtlety allows the attack to bypass traditional instantaneous anomaly detectors that monitor for large, abrupt changes in the navigation solution. For a China UAV drone executing a precision task, even a slow drift can have catastrophic consequences if not detected promptly. The core challenge lies in the EKF’s inherent robustness. As the spoofed measurement is fed into the filter, the Kalman gain attempts to correct the state estimate, slowly “pulling” the drone’s believed position towards the false signal. The key to detection, therefore, lies not in the raw measurements alone but in the internal consistency checks of the filter—specifically, the innovation sequence. The innovation is the difference between the actual sensor measurement (e.g., from GNSS) and the filter’s prediction based on previous states (primarily from the Inertial Measurement Unit, IMU). Under normal conditions, the innovation is a zero-mean white noise process. A spoofing attack injects a bias into the measurement, which manifests as a persistent, structured anomaly within this innovation sequence.
The problem can be formally described within the state estimation framework of a China UAV drone’s navigation system. The measurement equation during an attack is:
$$y_k = h(x_k) + v_k + \Lambda y^a_k$$
where \(y_k\) is the sensor measurement vector, \(h(\cdot)\) is the nonlinear observation model, \(x_k\) is the system state vector (attitude, velocity, position, biases), \(v_k\) is measurement noise, \(\Lambda\) is a diagonal attack selection matrix indicating which states are under attack, and \(y^a_k\) is the additive spoofing signal. For a ramp attack targeting, for instance, the North and East position channels, the spoofing signal for GNSS position can be modeled as \(s(t) = \frac{t – \tau}{T_s} \times s_{\text{total}}\), where \(\tau\) is the attack start time, \(T_s\) is the ramp duration, and \(s_{\text{total}} = [\Delta N, \Delta E, 0]^T\) is the total desired position offset. The attacked GNSS measurement becomes:
$$y_{k}^{\text{gnss\_spoofed}} = p_{k}^{\text{gnss\_true}} + s(t_k)$$
The corresponding position innovation within the EKF is:
$$r_{k}^{\text{pos}} = y_{k}^{\text{gnss\_spoofed}} – \hat{p}_{k|k-1} = (p_{k}^{\text{true}} + s(t_k)) – \hat{p}_{k|k-1}$$
where \(\hat{p}_{k|k-1}\) is the predicted position from the IMU-propagated state. This innovation, which should be noisy but unbiased, now carries the slowly growing signature of the attack \(s(t_k)\). Detecting this slow drift against the background of sensor noise and dynamic motion is the primary objective.
Existing detection methods often fall short for this scenario. Signal-level techniques (e.g., monitoring carrier-to-noise ratio) may not react to slow changes. Residual monitoring methods can suffer from significant detection lag. While machine learning, particularly deep learning, offers promise by learning complex patterns from data, many approaches treat feature extraction and fusion suboptimally. Simple concatenation of features or static weighting does not adapt to the varying salience of temporal dynamics versus statistical anomalies present in the innovation sequence during different phases of a slow attack.
To overcome these limitations, we propose a novel detection framework based on a Gated Cross-Attention (GCA) mechanism for multi-feature fusion, applied to the innovation sequence of a China UAV drone. Our core contributions are threefold. First, we design a parallel feature extraction architecture using Long Short-Term Memory (LSTM) networks and 1D Convolutional Neural Networks (1D-CNN). The LSTM branch is tasked with capturing the long-term temporal dependencies and gradual drift induced by the slow spoofing, learning patterns like:
$$h_t, c_t = \text{LSTM}(z_t, h_{t-1}, c_{t-1})$$
where \(h_t\) and \(c_t\) are the hidden and cell states capturing temporal context from the innovation sequence window \(Z = [z_{t-m}, …, z_t]\). Simultaneously, the CNN branch processes a derived feature: the Mahalanobis distance between the current innovation and the distribution of normal innovations. This distance, \(d_k = \sqrt{(r_k – \mu_n)^T \Sigma_n^{-1} (r_k – \mu_n)}\), provides a statistical measure of anomaly. The 1D-CNN applies convolutional filters to the sequence of these distances:
$$C^{(l+1)} = \text{ReLU}(\text{BN}(W^{(l)} * C^{(l)} + b^{(l)}))$$
to extract local abnormal patterns and shape features that might be indicative of the attack’s onset or specific character, complementing the LSTM’s temporal view.
Second, we introduce a Gated Cross-Attention fusion module to intelligently combine these two distinct feature streams. Traditional fusion methods, like concatenation followed by a fully connected layer, assign a fixed or learned-but-static importance to each feature type. Our GCA module dynamically recalibrates the contribution of each branch based on the global context of the input sample. It first computes a cross-attention matrix to align features from both branches:
$$A = \text{Softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right), \quad \text{where } Q = XW_q, \ K, V = \text{Conv1D}(Y’)$$
Here, \(X\) and \(Y’\) are the LSTM and aligned CNN features, respectively. The attention output is \(\tilde{Y} = AV\). Crucially, a gating mechanism then generates adaptive weights \(\alpha\) and \(\beta\):
$$g = \text{Softmax}(W_g[\text{GlobalAvgPool}(X); \text{GlobalAvgPool}(\tilde{Y})] + b_g) = [\alpha, \beta]$$
The final fused feature is \(F = \alpha \cdot X + \beta \cdot \text{Linear}(\tilde{Y})\). This allows the model to emphasize temporal dynamics (LSTM) when the drift is the primary cue, or highlight statistical-local anomalies (CNN) when they are more pronounced, leading to a more robust and adaptive representation for the downstream classifier.
Third, to address the common issue of class imbalance in attack detection datasets (where normal data often predominates), we employ a weighted label-smoothed cross-entropy loss function:
$$\mathcal{L} = -\frac{1}{B}\sum_{j=1}^{B}\sum_{k=1}^{K} \omega_{jk} u_{jk} \log(p_{jk})$$
where \(u_{jk}\) is the smoothed label (e.g., \(u_{jk} = 1-\epsilon\) for the true class, \(\epsilon/(K-1)\) for others), and \(\omega_{jk}\) is a weight inversely proportional to the class frequency. This combination prevents overconfidence on the majority class and improves generalization to the rarer attack types a China UAV drone might encounter.
We validate our proposed GCA-LSTM-CNN model through extensive simulations emulating a China UAV drone platform. We compare its performance against several benchmarks: a) a Support Vector Machine (SVM) with Radial Basis Function (RBF) kernel, b) a standard Fully Connected Network (FCN), and c) a sequential CNN-LSTM model. The metrics of primary importance are Accuracy, Recall (True Positive Rate), and F1-Score, with a particular focus on Recall to minimize missed detections of dangerous spoofing attacks.
| Detection Model | Accuracy | F1-Score | Avg. Recall (Attack Classes) |
|---|---|---|---|
| SVM (RBF) | 0.9699 | 0.9709 | 0.9584 |
| FCN | 0.9675 | 0.9694 | 0.9536 |
| CNN-LSTM | 0.9748 | 0.9761 | 0.9663 |
| GCA-LSTM-CNN (Proposed) | 0.9883 | 0.9887 | 0.9823 |
The results clearly demonstrate the superiority of our approach. The proposed network achieves the highest accuracy and F1-score. More importantly, it attains a significantly higher average recall for the attack classes (Normal, Latitude-only, Longitude-only, Dual-channel), meaning it misses fewer actual spoofing events—a critical safety feature for China UAV drone operations. Ablation studies confirm the individual contributions of both the gated cross-attention mechanism and the weighted label-smoothing loss. Furthermore, testing under different ramp speeds shows the model maintains robust performance even for very slow attacks (e.g., 0.05 m/s offset), with accuracy remaining above 97.5%.
The computational complexity of the proposed model is also analyzed. While more complex than a simple FCN, it is significantly more efficient than the sequential CNN-LSTM model (23.02 kFLOPs vs. 38.36 kFLOPs) due to its parallel design and efficient fusion module, making it a feasible candidate for real-time deployment on the computational hardware typical of many China UAV drone platforms.
In conclusion, the threat of slow-varying GNSS spoofing presents a stealthy and dangerous challenge to the autonomy and security of China UAV drones. By leveraging a dual-branch deep learning architecture for complementary feature extraction and a novel gated cross-attention mechanism for dynamic, context-aware feature fusion, we have developed a detection network that significantly outperforms existing methods. The GCA-LSTM-CNN model demonstrates high accuracy, excellent recall for attack states, and robust performance across varying attack intensities. This work provides a potent defensive tool, enhancing the resilience of China UAV drone navigation systems against one of the most sophisticated forms of electronic interference. Future work will focus on extending the model to detect a wider variety of attack waveforms and optimizing it for efficient execution on embedded flight control hardware.
