We present a comprehensive defense framework for China UAV drones against GPS spoofing attacks. The framework integrates state awareness, trajectory prediction, anomaly detection, and adaptive correction. A dual-task autoencoder performs data dimensionality reduction and flight scene classification. Based on the identified scene, a multi-head attention-based sequence-to-sequence (Seq2Seq) model generates high-fidelity future trajectory predictions. In the detection phase, we construct a multimodal feature vector combining state reconstruction error and trajectory prediction error, and employ the Isolation Forest algorithm for efficient attack detection. Once an attack is detected, we activate a correction strategy: the prediction model acts as a virtual sensor capable of self-assessing uncertainty via Monte Carlo dropout, whose output is converted into adaptive measurement noise for an extended Kalman filter (EKF). Simulation results show that our framework achieves an F1-score of 95.9% in attack detection and reduces navigation RMSE from 4.86 m to 0.80 m during correction.
1. Introduction
China UAV drones are widely used in surveying, logistics, surveillance, and agriculture due to their maneuverability and cost-effectiveness. Their autonomous navigation heavily relies on GPS. However, civilian GPS signals are vulnerable to spoofing attacks, where attackers broadcast counterfeit signals to induce incorrect position and velocity estimates, causing drones to deviate from their intended paths or crash. This poses a serious threat to mission safety and airspace security.
Existing detection methods can be categorized into signal-based approaches, such as monitoring signal power, S-curve zero offsets, carrier-to-noise density ratios, and angles of arrival. Another category uses state estimation and consistency checking, e.g., using inertial measurement unit (IMU) data to verify GPS validity under a Kalman filter framework. Recently, machine learning methods have been introduced to extract deeper features from sensor data, including support vector machines (SVM), XGBoost, ensemble methods, and deep temporal networks like TimesNet. However, traditional recurrent architectures like LSTM suffer from strong temporal inductive bias, limiting their ability to capture long-range dependencies.
In detection-triggered correction, the common practice is to cut off GPS and resort to IMU and vision-based dead reckoning, but IMU errors accumulate rapidly. Sophisticated attackers can bypass detection and continue polluting the state estimate. We propose a closed-loop defense framework that integrates state awareness, trajectory prediction, anomaly detection, and adaptive correction. Our main contributions are:
- An integrated detection-to-correction framework that uses deep learning combined with EKF for sustained safe navigation of China UAV drones under GPS-untrustworthy environments.
- Flight scene classification via autoencoder to drive scene-specific Seq2Seq models, significantly improving trajectory prediction accuracy under complex maneuvers.
- A multimodal feature space fusing reconstruction and prediction errors to enhance attack detection robustness.
- An adaptive correction strategy using Monte Carlo dropout to quantify prediction uncertainty and translate it into EKF’s adaptive measurement noise, enabling reliable state correction.
2. System Model and Problem Formulation
We consider a quadrotor China UAV drone equipped with GPS, IMU, and barometer. The onboard extended Kalman filter (EKF) fuses GPS measurements with IMU predictions to estimate the state vector x = [x, y, z, v_x, v_y, v_z, φ, θ, ψ]^T. Under a spoofing attack, the attacker generates a counterfeit signal that replaces the true GPS observations z with a fake observation z_sp. The EKF update step becomes:
$$ \hat{\mathbf{x}}_k = \hat{\mathbf{x}}_k^- + \mathbf{K}_k (\mathbf{z}^{\mathrm{sp}}_k – h(\hat{\mathbf{x}}_k^-)) $$
where h is the measurement model. This directly corrupts the state estimate, leading to navigation errors. Our goal is to detect such attacks and correct the state estimate using predictions from a deep learning model trained on normal flight data.
3. Proposed Defense Framework
3.1 Data Dimensionality Reduction and Flight Scene Classification
We extract a temporal feature vector from multi-source sensor data, as shown in Table 1. The features are synchronized via interpolation.
| Category | Symbols | Physical meaning | Dimension |
|---|---|---|---|
| Position | px, py, pz | 3D position (NED) | 3 |
| Velocity | vx, vy, vz | 3-axis body velocity | 3 |
| Acceleration | ax, ay, az | 3-axis acceleration (IMU) | 3 |
| Attitude | φ, θ, ψ | Roll, pitch, yaw | 3 |
| Control input | u1…u4 | Normalized PWM of motors | 4 |
We build an autoencoder with an encoder E and a decoder D. For a given input sample x ∈ RDin, the encoder maps it to a latent representation z:
$$ z = E(x; \theta_e) $$
The decoder reconstructs the input:
$$ \hat{x} = D(z; \theta_d) $$
We use tied weights: Wd(l) = (We(l))T. On top of the latent space, we add a scene classifier that takes the average pooled latent vector over time and outputs probabilities via a softmax layer:
$$ \hat{y} = \mathrm{softmax}(W_{cls} \bar{z} + b_{cls}) $$
The loss function combines reconstruction loss, classification loss, and weight decay:
$$ \min_{\theta} \frac{1}{N} \sum_{n} [ \alpha \|x^{(n)} – \hat{x}^{(n)}\|^2 + (1-\alpha) \ell_{cls}(y^{(n)},\hat{y}^{(n)}) ] + \lambda \Omega(\theta) $$
where α gradually increases during training. After training, only the encoder is retained for feature extraction, and the latent sequence {zt} serves as input to the trajectory prediction network.
3.2 Seq2Seq Trajectory Prediction Network
The prediction network takes the latent sequence Z = (z1,…,zT) and autoregressively generates future displacement predictions ΔP̂ = (Δp̂1,…,Δp̂M). We adopt a bidirectional LSTM (BiLSTM) encoder. The hidden states are:
$$ h_t = [\overrightarrow{h}_t; \overleftarrow{h}_t] $$
The decoder uses a multi-head attention mechanism to dynamically focus on different parts of the encoder outputs. For each head j, the query, key, and value are computed as:
$$ Q_j = h_{i-1}^{dec} W_j^Q,\quad K_j = H_{enc} W_j^K,\quad V_j = H_{enc} W_j^V $$
The scaled dot-product attention for each head is:
$$ \mathrm{head}_j = \mathrm{softmax}\left( \frac{Q_j K_j^T}{\sqrt{d_k}} \right) V_j $$
Concatenating all heads and applying a linear projection yields the context vector ct:
$$ c_t = \mathrm{Concat}(\mathrm{head}_1,…,\mathrm{head}_{N_h}) W^O $$
The decoder updates its hidden state:
$$ (h_t^{dec}, s_t^{dec}) = \mathrm{LSTM}_{dec} ( \mathrm{concat}(\Delta p_{t-1}, c_t), (h_{t-1}^{dec}, s_{t-1}^{dec}) ) $$
Finally, the displacement prediction is:
$$ \Delta \hat{p}_t = \mathrm{Linear}( \mathrm{concat}(h_t^{dec}, c_t) ) $$
and the absolute position is p̂t = p̂t-1 + Δp̂t.
3.3 Attack Detection
We build a two-dimensional feature vector q = [Erecon, m] where:
$$ E_{recon} = \frac{1}{T}\sum_{i=1}^T \| x_i – \hat{x}_i \|_2 $$
$$ m = \sum_{t=1}^T \| p_t – \hat{p}_t \|_2 $$
Erecon is the state reconstruction error, and m is the trajectory prediction error. The feature space is processed by an Isolation Forest model, which does not require prior assumptions on data distribution. The anomaly score s(q) is:
$$ s(q) = 2^{- \frac{E(h(q))}{c(N)}} $$
where E(h(q)) is the expected path length, and c(N) is a normalization constant. The detection decision is:
$$ \mathcal{D}(q) = \begin{cases} 1, & s(q) \ge s_{th} \\ 0, & \text{otherwise} \end{cases} $$
We use grid search on a validation set to find the optimal threshold and the contamination parameter that maximizes the F1-score.
3.4 Adaptive Correction Strategy
Once an attack is detected, the prediction model acts as a virtual sensor. To quantify its prediction uncertainty, we use Monte Carlo dropout: during inference, the dropout layers remain active, and we perform NMC forward passes on the same input, yielding multiple displacement samples {Δp̂(i)}. The pseudo-measurement (predicted position) is the mean:
$$ \mathbf{p}’_k = \hat{\mathbf{p}}_{k-1} + \frac{1}{N_{MC}} \sum_{i=1}^{N_{MC}} \Delta \hat{\mathbf{p}}^{(i)} $$
The measurement noise covariance matrix is estimated from the sample variance:
$$ \mathbf{R}’_k = \mathrm{diag}( \mathrm{Var}(x’), \mathrm{Var}(y’), \mathrm{Var}(z’) ) $$
We then perform an EKF update using this virtual measurement. The prediction step is:
$$ \hat{\mathbf{x}}_k^- = \hat{\mathbf{x}}_{k-1} $$
$$ \mathbf{P}_k^- = \mathbf{P}_{k-1} + \mathbf{Q}_k $$
The update step uses the pseudo-measurement:
$$ \mathbf{r}_k = \mathbf{p}’_k – \hat{\mathbf{x}}_k^- $$
$$ \mathbf{S}_k = \mathbf{P}_k^- + \mathbf{R}’_k $$
$$ \mathbf{K}_k = \mathbf{P}_k^- \mathbf{S}_k^{-1} $$
$$ \hat{\mathbf{x}}_k = \hat{\mathbf{x}}_k^- + \mathbf{K}_k \mathbf{r}_k $$
$$ \mathbf{P}_k = (\mathbf{I} – \mathbf{K}_k) \mathbf{P}_k^- $$
This adaptive mechanism allows the EKF to trust the prediction model more when the uncertainty is low, and rely less when the model is uncertain.
4. Simulation and Experimental Results
4.1 Simulation Setup
We built the simulation environment using ROS and Gazebo physics engine with an Iris quadcopter model. The PX4 Autopilot (v1.13) provided onboard state estimation via its built-in EKF2 module. Flight control was done in Offboard mode using QGroundControl. We collected 30 normal flight data for training and 4 spoofed flight data for validation. The spoofing attack gradually introduced a 30 m offset over 100 seconds. Flights included ascending, level flight, descending, and circular patterns.
We compared our model (denoted AE-Seq2Seq) against baselines: PCA-FCN (Principal Component Analysis + Fully Connected Network) and PCA-LSTM. Table 2 lists the model configurations.
| Model | Autoencoder / PCA | Classifier | Predictor |
|---|---|---|---|
| AE-Seq2Seq | Encoder: Linear+ReLU (32→16); Decoder: Linear+ReLU (16→32); Classifier: Linear(16) | Integrated softmax | Encoder: BiLSTM(64,64), Linear(64); Decoder: LSTM+Attention(64,64), Linear(32); dropout=0.2, teacher forcing=0.3 |
| PCA-FCN | PCA 20→8 | SVM, one-vs-one, RBF kernel | FCN: Linear+ReLU(128,128,64,64), dropout=0.2 |
| PCA-LSTM | PCA 20→8 | SVM, one-vs-one, RBF kernel | LSTM(128,128,64), Linear(64), dropout=0.2 |
All models were trained with Adam (lr=0.001) using MSE loss for prediction. The autoencoder used a combined MSE + cross-entropy loss.
4.2 Trajectory Prediction Performance
Figure 10 in the original work shows sample predictions. Our AE-Seq2Seq model achieved superior performance across all flight phases. Table 3 summarizes the prediction errors for different horizons.
| Horizon (steps) | AE-Seq2Seq | PCA-LSTM | PCA-FCN |
|---|---|---|---|
| 10 | 0.32 | 0.55 | 0.78 |
| 20 | 0.64 | 1.12 | 1.55 |
| 50 | 1.21 | 2.08 | 3.10 |
The scene classifier achieved 100% accuracy on the test set, as shown by the confusion matrix in the original work (not reprinted). This confirms that the integrated autoencoder effectively extracts discriminative latent features for different flight modes.
4.3 Attack Detection Performance
We evaluated detection using precision, recall, F1-score, and accuracy. Table 4 shows the results.
| Model | Precision | Recall | F1-score | Accuracy |
|---|---|---|---|---|
| AE-Seq2Seq | 92.4% | 99.6% | 95.9% | 95.4% |
| PCA-FCN | 98.2% | 81.2% | 88.9% | 89.2% |
| PCA-LSTM | 99.9% | 85.2% | 92.0% | 92.1% |
Our AE-Seq2Seq achieved the highest recall (99.6%) and F1-score (95.9%), indicating few missed attacks. The PCA-LSTM baseline had high precision but lower recall, missing many attacks. The FCN baseline performed worst overall.
The Isolation Forest learned a nonlinear decision boundary in the 2D feature space, as visualized in the original work (not shown). The anomaly score heatmap indicates that the model effectively separates normal and attack samples.
4.4 Adaptive Correction Performance
We tested the correction strategy under two attack scenarios: a circular trajectory and a long-range trajectory. Table 5 reports the RMSE of the EKF state estimate with and without correction.
| Scenario | Uncorrected EKF | Corrected EKF |
|---|---|---|
| Circular trajectory | 4.86 | 0.80 |
| Long-range trajectory | 8.70 | 0.62 |
Figure 12 from the original work shows that the corrected trajectory closely follows the ground truth, while the uncorrected trajectory diverges. The correction activates after detection, causing a brief offset, but quickly converges to the true path and maintains stable tracking for over 200 seconds of sustained attack.
We also conducted closed-loop software-in-the-loop (SITL) tests using Gazebo. When the defense was active, the China UAV drone continued to fly stably along its intended circular path despite the GPS spoofing attack, as shown in the original Figure 13.
4.5 Computational Overhead
Table 6 lists the parameter count and floating-point operations (FLOPs) for the AE-Seq2Seq model.
| Component | Parameters | FLOPs |
|---|---|---|
| Encoder | 144.4K | 13.5M |
| Decoder | 108.7K | 92.1M |
| Attention inside decoder | 24.8K | 83.5M |
| Total | 253.1K | 105.6M |
With input/output sequence length of 50, the single forward pass takes approximately 3 ms on an i9-13900HX + RTX 4060 Laptop GPU, and is feasible on edge devices like NVIDIA Jetson for real-time inference.
5. Conclusion
We have presented a comprehensive defense framework for China UAV drones against GPS spoofing attacks. The framework integrates state awareness via a dual-task autoencoder, trajectory prediction via an attention-based Seq2Seq model, anomaly detection via multi-modal features and Isolation Forest, and adaptive correction via Monte Carlo dropout-driven EKF. Simulation results demonstrate that our approach achieves 95.9% F1-score in detection and reduces navigation RMSE from 4.86 m to 0.80 m under attack. The closed-loop SITL tests confirm the practical feasibility of deploying this defense on board China UAV drones without requiring external hardware. Future work will focus on lightweight uncertainty estimation methods and field tests on real hardware.

