In recent years, first person view (FPV) drones have gained significant traction in various fields, including military reconnaissance, racing competitions, and aerial cinematography, due to their high speed, low latency, and cost-effectiveness. The widespread use of China FPV drones, in particular, has highlighted the need for efficient and accurate individual recognition methods to address low-altitude security and spectrum management challenges. However, traditional recognition approaches often suffer from high false positive rates and poor adaptability in complex electromagnetic environments, primarily due to the intricate modulation schemes of FPV drone image transmission signals and strong environmental interference. This paper proposes a novel method for FPV drone individual recognition based on multi-dimensional features, which constructs a two-layer architecture of “rapid screening of external features—deep analysis of signal multidimensional features.” The first layer employs threshold-based judgments on external signal characteristics to quickly detect and screen suspected FPV drone signals, while the second layer utilizes a Residual Network (ResNet) model for fine-grained recognition and matching, thereby enhancing accuracy and reliability. Experimental results demonstrate that the rapid screening layer achieves a rejection rate of over 85%, and the deep analysis layer attains an average recognition accuracy of 94% for FPV drone signals in the 5.8 GHz band. This method not only provides a theoretical foundation for real-time monitoring of FPV drones but also lays the groundwork for intelligent low-altitude defense systems.

The proliferation of FPV drones, especially in applications like China FPV operations, has introduced new challenges in signal detection and identification. These drones typically operate in fixed frequency bands, such as 5.8 GHz, and employ complex modulation techniques for image transmission, making them susceptible to electromagnetic noise and multipath effects. Traditional single-dimension recognition methods, which rely solely on time-domain or frequency-domain features, often fail to achieve high precision in such scenarios. For instance, amplitude modulation and frequency modulation hybrids used in FPV drone signals exhibit nonlinear characteristics that are difficult to analyze with conventional digital signal processing techniques. Moreover, the broad spectrum distribution of these signals, covering approximately 5650 MHz to 5950 MHz, complicates real-time detection. To address these issues, this study integrates time-frequency analysis with deep learning, enabling multi-dimensional feature extraction and robust recognition. The proposed method leverages external features like signal bandwidth and power for initial screening, followed by ResNet-based deep learning for detailed classification, ensuring efficiency and accuracy even in low signal-to-noise ratio (SNR) conditions.
The core of this approach lies in its ability to combine physical layer signal characteristics with semantic-level modeling through deep neural networks. The rapid screening layer uses parameters such as average burst interval, duration, bandwidth, and peak-to-average ratio to filter out non-FPV signals quickly. For example, signals with bandwidth between 5 MHz and 7 MHz, peak-to-average ratio ≤ 5 dB, and SNR ≥ 5 dB are flagged as suspected FPV drone transmissions. This step significantly reduces the computational load on the subsequent deep learning layer. The deep analysis layer, based on ResNet-50, processes short-time Fourier transform (STFT) generated time-frequency images to learn spatiotemporal features automatically. By incorporating residual connections, the ResNet model mitigates gradient vanishing issues in deep networks, allowing it to capture subtle signal variations effectively. This dual-layer framework not only improves recognition speed but also enhances adaptability to diverse environmental conditions, making it suitable for real-world applications involving first person view drone monitoring.
To validate the method, extensive experiments were conducted using hardware including USRP-LW B210 for signal acquisition and software like GNU Radio for processing. The results indicate that the proposed method outperforms existing techniques, such as convolutional neural networks (CNNs) and long short-term memory (LSTM) networks, in terms of accuracy and computational efficiency. For instance, in high SNR conditions (15 dB), the ResNet model achieved up to 96.87% accuracy, while maintaining robust performance in low SNR scenarios. The integration of multi-dimensional features and deep learning exemplifies a significant advancement in FPV drone individual recognition, paving the way for more secure and efficient low-altitude surveillance systems. Future work will focus on expanding the signal database, incorporating reinforcement learning for adaptive recognition, and fusing multi-sensor data to further enhance the capabilities for identifying China FPV and other drone types.
Technical Analysis and Method Principles
The individual recognition of FPV drones presents several challenges, primarily due to the complex nature of their image transmission signals. These signals often employ hybrid modulations, such as amplitude and frequency modulation, resulting in nonlinear characteristics that are difficult to decipher using standard digital signal detection methods. Additionally, FPV drone signals are typically modulated onto carriers in the 5.8 GHz band, with a wide frequency range of 5650 MHz to 5950 MHz, making real-time analysis computationally intensive. Environmental factors, including electromagnetic interference and geographical variations, further exacerbate the recognition difficulty by distorting signal propagation. Moreover, the lack of large, annotated datasets for training models hinders the development of accurate recognition systems. To overcome these obstacles, this study proposes a multi-dimensional feature-based approach that combines external signal characteristics with deep learning for enhanced FPV drone identification.
The proposed method employs a multi-dimensional recognition technology that begins with signal acquisition using USRP-LW B210 hardware and GNU Radio software. This setup enables real-time capture and storage of FPV drone image transmission signals, providing a solid data foundation for subsequent analysis. In the signal processing phase, time-domain waveform analysis, frequency-domain energy distribution, and time-frequency joint feature extraction are performed to reveal the dynamic spectral patterns of FPV drone signals. For instance, time-domain features include signal amplitude and periodicity, which reflect the strength of image information. Frequency-domain features involve center frequency, bandwidth, and power, while time-frequency characteristics, derived from STFT, offer a comprehensive view of signal behavior over time and frequency. A threshold monitoring system, designed with GNU Radio and enhanced by adaptive Kalman filtering, dynamically adjusts the received signal strength indicator (RSSI) threshold based on environmental noise. This system includes a visual interface for real-time feedback, ensuring practical usability and integrity.
The deep learning component utilizes a lightweight ResNet-50 model to identify and classify signals based on their time-frequency representations. ResNet’s residual connections facilitate the learning of deep spatiotemporal features by allowing direct information flow between layers, addressing issues like gradient vanishing in deep networks. The model processes STFT-generated time-frequency images, which are normalized to mitigate power variations, and employs convolutional layers, batch normalization, and ReLU activation functions for efficient training. The overall process can be summarized in key steps: signal monitoring and acquisition, signal discrimination based on time-frequency features, feature extraction, and ResNet-based recognition. This integration of time-frequency analysis with deep learning enables the method to model both physical and semantic layers of signals, improving robustness in low-SNR environments. The dynamic threshold mechanism further enhances performance by adapting to changing noise conditions, making it a reliable solution for first person view drone detection.
The mathematical foundation of the ResNet model is rooted in residual learning, where each residual block performs a mapping function $F$ on the input $X_L$ at layer $L$, and the output $X_{L+1}$ is given by:
$$X_{L+1} = X_L + F(X_L, W_L)$$
Here, $W_L$ represents the convolutional parameters at layer $L$. This formulation allows the network to learn residual functions, making it easier to train deep architectures. For signal recognition, the input to ResNet is the time-frequency image $I(t,f)$ obtained from STFT, which can be expressed as:
$$I(t,f) = \left| \int_{-\infty}^{\infty} x(\tau) w(\tau – t) e^{-j2\pi f\tau} d\tau \right|^2$$
where $x(\tau)$ is the time-domain signal, and $w(\tau – t)$ is the window function. By processing these images, ResNet can automatically extract discriminative features for classifying FPV drone signals, such as those from China FPV models, achieving high accuracy even in challenging conditions.
Experimental Analysis
To evaluate the effectiveness of the proposed method, several experiments were conducted focusing on the rapid screening layer and the deep analysis layer. The hardware setup included a USRP-LW B210 device for signal acquisition, with a sampling rate of 12 MHz and a bandwidth of 6 MHz for capturing FPV drone image transmission signals. The software environment utilized GNU Radio 3.8 for signal processing and TensorFlow 2.8.0 for implementing the ResNet model. Experiments were performed on a server equipped with an NVIDIA GeForce RTX 3080 GPU, AMD Ryzen 9 5900X CPU, and 64 GB DDR4 memory, running Ubuntu 22.04. The dataset comprised signals from various drone types, including Air2S, AVATA, DIV (FPV drone analog image transmission), DJI, FRSKY, Fubuta, and Inspire2, with time-frequency images generated via STFT and split into training, validation, and test sets in a 6:2:2 ratio.
The first experiment assessed the rapid screening layer’s performance using threshold-based judgment. Signals were analyzed for external features such as bandwidth (B), average signal power (p), peak interval (n), and SNR. The thresholds were set as follows: bandwidth between 5 MHz and 7 MHz, peak-to-average ratio ≤ 5 dB, and SNR ≥ 5 dB. A visual interface in GNU Radio displayed the spectrum and indicator lights, turning on only when all thresholds were met. This layer achieved an average rejection rate of over 85%, meaning that more than 85% of non-FPV signals were filtered out early, reducing the computational burden on the deep learning layer. For example, in a test with 1000 signals, only 150 proceeded to the ResNet model, demonstrating the efficiency of this initial screening for first person view drone detection.
The second experiment involved confusion matrix testing to evaluate the ResNet model’s classification accuracy. After 50 training epochs, the model was tested on reserved samples, and the results are summarized in Table 1. The confusion matrix, shown in Figure 9, illustrates the recognition rates for each drone type. Most categories, such as Air2S, AVATA, DIV, DJI, FRSKY, and Fubuta, achieved nearly 100% accuracy, while Inspire2 had an 80% recognition rate due to misclassification with AVATA. This indicates high overall performance, with an average accuracy of 94% for FPV drone signals in the 5.8 GHz band. The ResNet model’s ability to learn deep features from time-frequency images contributed to this success, particularly for China FPV and other analog image transmission signals.
| Actual/Predicted | Air2S | AVATA | DIV | DJI | FRSKY | Fubuta | Inspire2 |
|---|---|---|---|---|---|---|---|
| Air2S | 100% | 0% | 0% | 0% | 0% | 0% | 0% |
| AVATA | 0% | 100% | 0% | 0% | 0% | 0% | 0% |
| DIV | 0% | 0% | 100% | 0% | 0% | 0% | 0% |
| DJI | 0% | 0% | 0% | 100% | 0% | 0% | 0% |
| FRSKY | 0% | 0% | 0% | 0% | 100% | 0% | 0% |
| Fubuta | 0% | 0% | 0% | 0% | 0% | 100% | 0% |
| Inspire2 | 0% | 20% | 0% | 0% | 0% | 0% | 80% |
The third experiment examined the method’s performance under different SNR conditions, ranging from -5 dB to 20 dB. The signal detection rate and classification accuracy were measured, as shown in Figure 10. At SNR ≥ 0 dB, the detection rate remained above 90%, and classification accuracy stabilized at 94% or higher. However, in low SNR conditions (e.g., < 0 dB), the detection rate dropped, leading to increased classification errors. This highlights the robustness of the proposed method in typical environments while indicating areas for improvement in highly noisy scenarios. The ResNet model’s residual learning mechanism helped maintain performance by effectively capturing relevant features despite noise interference.
A comparative analysis was conducted against other deep learning models, including CNN, LSTM, and Transformer, under high SNR (15 dB) and low SNR (0 dB) conditions. The results, presented in Table 2, demonstrate that ResNet outperforms others in terms of accuracy and computational efficiency. For instance, with 100 million parameters and 50 seconds per training epoch, ResNet achieved 96.87% accuracy in high SNR, compared to 91.18% for CNN and 84.82% for LSTM. In low SNR, ResNet maintained 82.27% accuracy, while CNN, LSTM, and Transformer dropped to 68.64%, 58.57%, and 54.29%, respectively. This superiority stems from ResNet’s ability to handle deep feature learning without degradation, making it ideal for real-time FPV drone recognition applications involving first person view signals.
| Model | Parameters (Millions) | Training Time (Seconds/Epoch) | Test Accuracy (High SNR) | Test Accuracy (Low SNR) |
|---|---|---|---|---|
| ResNet | 80 | 40 | 92.12% | 78.45% |
| ResNet | 80 | 50 | 94.85% | 80.12% |
| ResNet | 80 | 60 | 95.27% | 81.33% |
| ResNet | 60 | 50 | 90.35% | 75.67% |
| ResNet | 100 | 40 | 95.36% | 81.89% |
| ResNet | 100 | 50 | 96.87% | 82.27% |
| CNN | 80 | 40 | 85.14% | 65.23% |
| CNN | 100 | 40 | 87.29% | 67.45% |
| CNN | 100 | 50 | 91.18% | 68.64% |
| LSTM | 80 | 40 | 76.46% | 55.78% |
| LSTM | 100 | 40 | 81.58% | 57.91% |
| LSTM | 100 | 50 | 84.82% | 58.57% |
| Transformer | 80 | 40 | 75.95% | 52.34% |
| Transformer | 100 | 40 | 79.34% | 53.67% |
| Transformer | 100 | 50 | 81.56% | 54.29% |
The experimental findings underscore the efficacy of the multi-dimensional feature-based approach for FPV drone individual recognition. The rapid screening layer’s high rejection rate ensures that only relevant signals are processed further, optimizing resource usage. Meanwhile, the ResNet-based deep analysis layer delivers superior accuracy across various SNR levels, outperforming alternative models. This combination makes the method particularly suitable for monitoring China FPV drones and other first person view systems in dynamic environments. Future enhancements could involve expanding the dataset to include more drone types and incorporating advanced noise reduction techniques to improve low-SNR performance.
Conclusion
This paper has presented a comprehensive method for FPV drone individual recognition based on multi-dimensional features, addressing the limitations of existing approaches in terms of efficiency and accuracy. The two-layer architecture, comprising external feature rapid screening and signal multidimensional feature deep analysis, effectively combines threshold-based filtering with ResNet deep learning to achieve high performance in real-time scenarios. The rapid screening layer reduces computational load by rejecting over 85% of non-target signals, while the deep analysis layer attains an average recognition accuracy of 94% for FPV drone signals in the 5.8 GHz band. Experimental results demonstrate the method’s robustness across different SNR conditions and its superiority over models like CNN, LSTM, and Transformer, particularly in handling the complex characteristics of first person view drone transmissions.
The integration of time-frequency analysis and deep learning enables the method to capture both physical and semantic signal features, making it adaptable to diverse electromagnetic environments. This is especially relevant for applications involving China FPV drones, where reliable identification is crucial for security and management. However, challenges remain in recognizing unknown drone signals and improving performance in extreme noise conditions. Future work will focus on building a more extensive drone signal database, employing reinforcement learning for adaptive recognition, and integrating multi-sensor data fusion with radar and electro-optical systems. By advancing these aspects, the proposed method can evolve into a more versatile solution for low-altitude defense, contributing to safer and more efficient airspace management in the era of proliferating FPV drone usage.
