In our recent work, we address the critical challenge of GPS spoofing attacks on quadrotor unmanned aerial vehicles (UAVs), particularly those manufactured and operated as China UAV drone platforms that rely heavily on civilian GPS signals. The widespread adoption of China UAV drone systems in surveying, logistics, surveillance, and agriculture makes them attractive targets for adversarial spoofing. We propose a unified defense framework that integrates state awareness, trajectory prediction, anomaly detection, and adaptive state correction to ensure robust navigation even when GPS measurements are compromised. Our framework leverages a deep learning architecture coupled with a Bayesian filtering strategy to detect and mitigate spoofing attacks in real time.
To comprehensively capture the dynamic behavior of a quadrotor China UAV drone, we extract multi-source heterogeneous data from flight logs, including position, velocity, acceleration (from IMU), attitude angles, and control inputs. After temporal alignment, each time-step sample becomes a 16‑dimensional vector. The composition of this feature vector is summarized in the table below.
| Feature Category | Symbol | Physical Meaning | Dimension |
|---|---|---|---|
| Position | \(p_x, p_y, p_z\) | 3D position in NED frame | 3 |
| Velocity | \(v_x, v_y, v_z\) | Linear velocities in body frame | 3 |
| Acceleration | \(a_x, a_y, a_z\) | Accelerometer readings | 3 |
| Attitude | \(\phi, \theta, \psi\) | Roll, pitch, yaw angles | 3 |
| Control Inputs | \(u_1,u_2,u_3,u_4\) | Normalized PWM outputs to four rotors | 4 |
Our framework consists of three tightly coupled modules: a dual-task autoencoder for dimensionality reduction and flight‑scene classification, a Seq2Seq trajectory predictor with multi‑head attention, and an isolation‑forest‑based anomaly detector that fuses reconstruction and prediction errors. Once an attack is detected, we activate an adaptive correction strategy that treats the predictor as a virtual sensor with uncertainty quantified via Monte Carlo dropout, and feeds pseudo‑measurements into an extended Kalman filter (EKF) with dynamically adjusted measurement noise covariance. The overall architecture is depicted in the following conceptual illustration.

1. Dual-Task Autoencoder for Feature Compression and Scene Classification
The first stage of our processing pipeline is a dual‑task autoencoder that learns a compact latent representation of the input sensor data while simultaneously classifying the current flight regime (e.g., ascending, level flight, circular, descending). The encoder \(\mathcal{E}\) maps an input sample \(\mathbf{x} \in \mathbb{R}^{16}\) to a low‑dimensional latent code \(\mathbf{z} \in \mathbb{R}^{8}\):
\[
\mathbf{z} = \mathcal{E}(\mathbf{x}; \theta_e) = h_e^{(L)}( \cdots h_e^{(1)}(\mathbf{x}) \cdots )
\]
The decoder \(\mathcal{D}\) then reconstructs the input:
\[
\hat{\mathbf{x}} = \mathcal{D}(\mathbf{z}; \theta_d) = h_d^{(L)}( \cdots h_d^{(1)}(\mathbf{z}) \cdots )
\]
We impose tied weights on the encoder and decoder layers to reduce model complexity. A classification head is attached to the latent code after temporal average pooling over a window of \(T\) time steps. The pooled latent vector \(\bar{\mathbf{z}}\) is fed to a softmax classifier:
\[
\hat{\mathbf{y}} = \text{softmax}( \mathbf{W}_{cls} \bar{\mathbf{z}} + \mathbf{b}_{cls} )
\]
The combined loss function balances reconstruction accuracy and classification correctness, with a weight decay regularizer:
\[
\min_{\theta} \sum_{n=1}^{N} \alpha \|\mathbf{x}^{(n)} – \hat{\mathbf{x}}^{(n)}\|_2^2 + (1-\alpha) \ell_{CE}(\mathbf{y}^{(n)}, \hat{\mathbf{y}}^{(n)}) + \lambda \Phi(\theta)
\]
where \(\ell_{CE}\) is the cross‑entropy loss, \(\alpha\) linearly increases from 0 to 1 during training, and \(\Phi(\theta)\) is the Frobenius norm of all weight matrices. This design forces the autoencoder to learn a latent space that both compresses the input well and separates flight modes. After training, the decoder is discarded; the encoder serves as a feature extractor whose outputs \(\{\mathbf{z}_t\}\) become the input to the trajectory predictor.
2. Seq2Seq Trajectory Predictor with Multi‑Head Attention
Given a sequence of latent codes \(\{\mathbf{z}_1, \mathbf{z}_2, \dots, \mathbf{z}_T\}\) produced by the encoder, we employ a Seq2Seq model to predict the future displacement trajectory \(\{\Delta\mathbf{p}_{T+1}, \dots, \Delta\mathbf{p}_{T+M}\}\). The encoder is a bidirectional LSTM (BiLSTM) that processes the sequence in both forward and backward directions, concatenating the hidden states at each time step:
\[
\overrightarrow{\mathbf{h}}_t = \text{LSTM}_{\text{fwd}}(\mathbf{z}_t, \overrightarrow{\mathbf{h}}_{t-1}), \qquad
\overleftarrow{\mathbf{h}}_t = \text{LSTM}_{\text{bwd}}(\mathbf{z}_t, \overleftarrow{\mathbf{h}}_{t+1})
\]
\[
\mathbf{h}_t = [\overrightarrow{\mathbf{h}}_t ; \overleftarrow{\mathbf{h}}_t ]
\]
The decoder is an LSTM that, at each step \(i\), computes a context vector \(\mathbf{c}_i\) using multi‑head attention over all encoder hidden states. The attention mechanism first computes queries, keys, and values for each head \(j\):
\[
\mathbf{Q}_{j,i} = \mathbf{h}^*_{i-1} \mathbf{W}_j^Q,\quad
\mathbf{K}_j = \mathbf{H}_{\text{enc}} \mathbf{W}_j^K,\quad
\mathbf{V}_j = \mathbf{H}_{\text{enc}} \mathbf{W}_j^V
\]
where \(\mathbf{H}_{\text{enc}}\) is the matrix of all encoder hidden states and \(\mathbf{h}^*_{i-1}\) is the previous decoder hidden state. The attention output for head \(j\) is:
\[
\text{head}_j = \text{softmax}\!\left(\frac{\mathbf{Q}_{j,i} \mathbf{K}_j^\top}{\sqrt{d_k}}\right) \mathbf{V}_j
\]
All heads are concatenated and linearly projected to obtain the final context vector:
\[
\mathbf{c}_i = \text{Concat}(\text{head}_1,\dots,\text{head}_{N_h}) \mathbf{W}^O
\]
The decoder hidden state is updated using the context vector and the previous displacement prediction:
\[
(\mathbf{h}^*_i, \mathbf{s}^*_i) = \text{LSTM}_{\text{dec}}\big( [\Delta\hat{\mathbf{p}}_{i-1}; \mathbf{c}_i], (\mathbf{h}^*_{i-1}, \mathbf{s}^*_{i-1}) \big)
\]
Finally, the displacement and absolute position are predicted:
\[
\Delta\hat{\mathbf{p}}_i = \text{Linear}(\mathbf{h}^*_i, \mathbf{c}_i),\qquad
\hat{\mathbf{p}}_i = \hat{\mathbf{p}}_{i-1} + \Delta\hat{\mathbf{p}}_i
\]
This architecture enables the model to dynamically focus on different parts of the historical input when predicting each step of the future trajectory, which is crucial for accurately predicting abrupt maneuvers such as transition from level flight to a steep climb.
3. Multi‑Modal Anomaly Detection with Isolation Forest
To detect GPS spoofing attacks, we construct a two‑dimensional feature vector \(\mathbf{q} = (E_{\text{recon}}, m)\) that captures both instantaneous state consistency and dynamic prediction consistency. The reconstruction error \(E_{\text{recon}}\) measures the ability of the autoencoder to faithfully reconstruct the current state over a sliding window of length \(T_w\):
\[
E_{\text{recon}} = \frac{1}{T_w} \sum_{t=1}^{T_w} \| \mathbf{x}_t – \hat{\mathbf{x}}_t \|_2
\]
The trajectory prediction error \(m\) quantifies the cumulative Euclidean distance between the predicted trajectory and the actual observed positions over the same window:
\[
m = \sum_{t=1}^{T_w} \| \hat{\mathbf{p}}_t – \mathbf{p}_t \|_2
\]
Under normal conditions, both errors remain small. Under a spoofing attack, even if the attacker manages to generate a plausible instantaneous state (keeping \(E_{\text{recon}}\) low), the inconsistency in future dynamics causes \(m\) to increase significantly. Conversely, a poorly crafted attack may directly distort the instantaneous state, increasing \(E_{\text{recon}}\). Thus, the combined feature space is resilient to sophisticated attacks.
We use an Isolation Forest (iForest) as the anomaly detector because of its efficiency and lack of assumptions on data distribution. The anomaly score for a sample \(\mathbf{q}\) is:
\[
s(\mathbf{q}) = 2^{- \frac{\mathbb{E}[h(\mathbf{q})]}{c(N)} }
\]
where \(h(\mathbf{q})\) is the path length from root to leaf in a tree, and \(c(N)\) is the average path length of an unsuccessful search in a binary search tree of \(N\) samples. A sample is flagged as spoofed if \(s(\mathbf{q}) \geq \text{threshold}\). We determine the optimal threshold and the contamination hyperparameter via grid search on a validation set that includes normal flights and injected spoofing episodes, maximizing the F1-score.
4. Adaptive EKF Correction with Uncertainty‑Aware Virtual Sensor
Once an attack is detected, our framework activates a correction strategy that uses the trajectory predictor as a virtual sensor. To obtain a measure of prediction uncertainty, we employ Monte Carlo dropout during inference. For each correction step \(k\), we generate \(N_{\text{mc}}\) predictions by keeping dropout layers active and performing \(N_{\text{mc}}\) forward passes on the same input sequence. The set of displacement samples \(\{\Delta\hat{\mathbf{p}}_k^{(i)}\}_{i=1}^{N_{\text{mc}}}\) yields a distribution of absolute position estimates:
\[
\hat{\mathbf{p}}_k^{(i)} = \hat{\mathbf{p}}_{k-1} + \Delta\hat{\mathbf{p}}_k^{(i)}
\]
The pseudo‑measurement is taken as the sample mean:
\[
\bar{\mathbf{p}}_k = \frac{1}{N_{\text{mc}}} \sum_{i=1}^{N_{\text{mc}}} \hat{\mathbf{p}}_k^{(i)}
\]
and the measurement noise covariance is set to the sample variance:
\[
\mathbf{R}_k’ = \text{diag}\big( \text{Var}(\{\hat{x}_k^{(i)}\}),\; \text{Var}(\{\hat{y}_k^{(i)}\}),\; \text{Var}(\{\hat{z}_k^{(i)}\}) \big)
\]
These quantities are then used in the EKF update step. The prediction step of the EKF is:
\[
\hat{\mathbf{x}}_k^- = \hat{\mathbf{x}}_{k-1},\qquad
\mathbf{P}_k^- = \mathbf{P}_{k-1} + \mathbf{Q}
\]
where \(\mathbf{Q}\) is the process noise covariance derived from IMU characteristics. The innovation is:
\[
\mathbf{r}_k = \bar{\mathbf{p}}_k – \mathbf{H} \hat{\mathbf{x}}_k^-,\qquad \mathbf{H} = [\mathbf{I}_{3\times 3} \; \mathbf{0}_{3\times 3}]
\]
The Kalman gain is computed with the dynamically adjusted \(\mathbf{R}_k’\):
\[
\mathbf{S}_k = \mathbf{H} \mathbf{P}_k^- \mathbf{H}^\top + \mathbf{R}_k’,\qquad
\mathbf{K}_k = \mathbf{P}_k^- \mathbf{H}^\top \mathbf{S}_k^{-1}
\]
Finally, the state and covariance are updated:
\[
\hat{\mathbf{x}}_k = \hat{\mathbf{x}}_k^- + \mathbf{K}_k \mathbf{r}_k,\qquad
\mathbf{P}_k = (\mathbf{I} – \mathbf{K}_k \mathbf{H}) \mathbf{P}_k^-
\]
This adaptive mechanism allows the filter to automatically down‑weight predictions with high uncertainty (e.g., during aggressive maneuvers) and trust them more when they are confident, leading to robust correction even under sustained attacks.
5. Simulation Setup and Experimental Results
We evaluated our approach using a ROS + Gazebo simulation environment with the Iris quadrotor model (a typical China UAV drone platform) running the PX4 autopilot. The flight scenarios included 150‑meter ascending, level flight, descending, and circular patterns of 8‑meter diameter. Spoofing attacks were simulated by gradually injecting a 30‑meter offset over 100 seconds. Training data comprised 30 normal flight logs; evaluation used four attack flights plus additional normal flights. The model configuration details are given in the table below.
| Component | Ours (AE‑Seq2Seq) | PCA‑FCN | PCA‑LSTM |
|---|---|---|---|
| Dimensionality reduction | Autoencoder (32→16→8) | PCA (20→8) | PCA (20→8) |
| Scene classifier | Integrated in AE (Linear+Softmax) | SVM (one‑vs‑one, RBF) | SVM (one‑vs‑one, RBF) |
| Predictor | BiLSTM‑Encoder + LSTM‑Decoder with Multi‑Head Attention | FCN (128‑128‑64‑64) | LSTM (128‑128‑64) |
| Dropout | 0.2 | 0.2 | 0.2 |
| Optimizer / LR | Adam / 0.001 | Adam / 0.001 | Adam / 0.001 |
The integrated scene classifier achieved 100% accuracy on the test set, as shown by the confusion matrix (no misclassifications). The trajectory prediction performance of our AE‑Seq2Seq model was compared against PCA‑FCN and PCA‑LSTM across different prediction horizons. The results are presented below.
| Metric | AE‑Seq2Seq | PCA‑FCN | PCA‑LSTM |
|---|---|---|---|
| MAE (m) | 0.82 | 1.74 | 1.23 |
| RMSE (m) | 1.05 | 2.30 | 1.61 |
Our model consistently maintained lower errors, particularly under dynamic maneuvers such as circular flight and transitions between regimes. The multi‑head attention mechanism allowed the decoder to attend to relevant past instants, improving accuracy where traditional LSTM‑based predictors suffered from information bottleneck.
For anomaly detection, we compared the three models using F1‑score, precision, recall, and accuracy.
| Model | Precision (%) | Recall (%) | F1‑score (%) | Accuracy (%) |
|---|---|---|---|---|
| AE‑Seq2Seq | 92.4 | 99.6 | 95.9 | 95.4 |
| PCA‑FCN | 98.2 | 81.2 | 88.9 | 89.2 |
| PCA‑LSTM | 99.9 | 85.2 | 92.0 | 92.1 |
The AE‑Seq2Seq architecture achieved the highest recall (99.6%) and overall F1‑score (95.9%), demonstrating its superior capability to detect attacks while keeping false positives low. The isolation‑forest decision boundary in the two‑dimensional feature space efficiently separated normal and spoofed samples.
The correction performance was evaluated by comparing the EKF output with and without our adaptive correction. Under a circular spoofing scenario with 30‑meter offset, the uncorrected EKF produced a position RMSE of 4.86 m over the attack period, while our corrected estimate achieved RMSE of only 0.80 m. In a long‑range trajectory test (200‑second attack), the uncorrected RMSE was 8.7 m, whereas the corrected RMSE dropped to 0.62 m. The initial transient after activation was quickly overcome, and the state remained stable thereafter.
Closed‑loop real‑time simulation in Gazebo confirmed the practical feasibility. When the defense was disabled, the China UAV drone quickly deviated from its intended circular path and became uncontrollable. With our framework running as a ROS node, the same drone continued to follow the commanded trajectory accurately despite the ongoing GPS attack.
Computational overhead analysis shows that our model requires about 105.6 MFLOPs per forward pass (3 ms on an i9‑13900HX + RTX 4060), with the multi‑head attention module contributing 83.5 MFLOPs. On edge devices like NVIDIA Jetson, inference still completes in milliseconds, meeting real‑time requirements.
6. Conclusion
In this paper, we presented a comprehensive defense framework against GPS spoofing for quadrotor China UAV drone platforms. By combining a dual‑task autoencoder for scene‑aware feature extraction, an attention‑based Seq2Seq trajectory predictor, multi‑modal anomaly detection via isolation forest, and an adaptive EKF with Monte‑Carlo‑dropout uncertainty quantification, we achieved high detection accuracy (F1‑score 95.9%) and robust correction (RMSE reduced from 4.86 m to 0.80 m). The solution is computationally efficient enough for onboard deployment and has been validated in closed‑loop simulation. Future work will focus on lightweight uncertainty estimation and field tests on actual China UAV drone hardware.
