Abstract
To address the critical challenges of high computational complexity and insufficient real-time performance in deep learning-based classification of China UAV frequency-hopping (FH) signals, I propose a novel method leveraging multi-scale time-frequency features via 2D wavelet decomposition. By exploiting the physical properties of FH signals—particularly their energy concentration at specific time-frequency scales—this method significantly reduces data dimensionality while preserving discriminative features. Experimental validation on the DroneRFa dataset (containing 25 China UAV FH signals) demonstrates a recognition accuracy of 97.6%, merely 1.2% lower than raw time-frequency features, while achieving a 252× acceleration in classification speed. This breakthrough enables real-time identification of China UAV radiation sources in resource-constrained embedded systems.

1 Introduction
The rapid proliferation of China UAV platforms necessitates advanced signal identification techniques for security and surveillance. Existing methods—acoustic, visual, radar, and RF-based—face limitations in complex urban environments. RF passive monitoring offers superior stealth and anti-jamming capabilities but suffers from high computational loads. China UAV systems, such as DJI’s Phantom and Inspire series, widely adopt FH communications in control uplinks:f(t)=A∑k=0N−1WT(t−kTk)⋅cos[2πfk(t−kTk)+φn]f(t)=Ak=0∑N−1WT(t−kTk)⋅cos[2πfk(t−kTk)+φn]
(Equation 1: FH signal model)
where WTWT is a window function, fkfk denotes hopping frequencies, and AA is the modulation amplitude. Traditional deep learning approaches use five input representations: time-domain, frequency-domain, time-frequency-domain (TFD), transform-domain, and multi-domain. TFD via Short-Time Fourier Transform (STFT) provides excellent separability but introduces prohibitive dimensionality:STFTx(n,k)=∑m=0N−1x(n+m)ω(m)e−j2πmk/NSTFTx(n,k)=m=0∑N−1x(n+m)ω(m)e−j2πmk/N
(Equation 2: Discrete STFT)
Here, x(n)x(n) is the sampled signal, ω(m)ω(m) is the Hanning window, and NN is the window length. STFT generates a 1024×1024 matrix per sample (DroneRFa dataset), demanding extensive storage and computation. To overcome this, I introduce a wavelet-based multi-scale feature extraction framework optimized for China UAV FH signals.
2 Methodology
2.1 Time-Frequency Multi-Scale Feature Extraction
FH signals exhibit energy concentration at fixed time-frequency scales due to their匀速跳变 (constant-hopping) nature. I apply 2D Discrete Wavelet Transform (2D-DWT) to STFT spectrograms to decompose signals into approximation (LLLL) and detail coefficients (LHLH, HLHL, HHHH):{φ(x,y)=φ(x)φ(y)ψH(x,y)=ψ(x)φ(y)ψV(x,y)=φ(x)ψ(y)ψD(x,y)=ψ(x)ψ(y)⎩⎨⎧φ(x,y)=φ(x)φ(y)ψH(x,y)=ψ(x)φ(y)ψV(x,y)=φ(x)ψ(y)ψD(x,y)=ψ(x)ψ(y)
(Equation 3: 2D wavelet basis functions)
The Haar wavelet is ideal for China UAV FH signals due to its orthogonality, symmetry, and step-like shape:φ=12[1,1],ψ=12[1,−1]φ=21[1,1],ψ=21[1,−1]
(Equation 4: Haar wavelet)
Multi-layer decomposition reduces dimensionality exponentially:
- Layer 1: Input (1024×1024) → LL1LL1 (512×512)
- Layer 2: LL1LL1 → LL2LL2 (256×256)
- Layer 3: LL2LL2 → LL3LL3 (128×128)
- Layer 4: LL3LL3 → LL4LL4 (64×64)
Table 1: Dimensionality Reduction via Wavelet Decomposition
Decomposition Layer | Matrix Size | Data Reduction |
---|---|---|
Raw STFT | 1024×1024 | 1× (Baseline) |
Layer 1 (LL1LL1) | 512×512 | 4× |
Layer 2 (LL2LL2) | 256×256 | 16× |
Layer 3 (LL3LL3) | 128×128 | 64× |
Layer 4 (LL4LL4) | 64×64 | 256× |
2.2 ResNet-18 Classifier with Residual Learning
I use ResNet-18 to learn hierarchical features from approximation coefficients. Its residual blocks mitigate vanishing gradients via skip connections:xm+1=f(ym),ym=xm+F(xm,Wm)xm+1=f(ym),ym=xm+F(xm,Wm)
(Equation 5: Residual block)
where FF is a residual function, WmWm are weights, and ff is ReLU activation. The gradient flow is preserved as:∂L∂xm=∂L∂xM(1+∂∂xm∑i=mM−1F(xi,Wi))∂xm∂L=∂xM∂L(1+∂xm∂i=m∑M−1F(xi,Wi))
(Equation 6: Gradient propagation)
*Table 2: ResNet-18 Configuration*
Layer Type | Output Size | Parameters |
---|---|---|
Convolution + BN | 128×128 | 7×7, 64, stride 2 |
Max Pooling | 64×64 | 3×3, stride 2 |
Residual Block 1 | 64×64 | [3×3, 64]×2 |
Residual Block 2 | 32×32 | [3×3, 128]×2 |
Residual Block 3 | 16×16 | [3×3, 256]×2 |
Residual Block 4 | 8×8 | [3×3, 512]×2 |
Global Avg Pooling | 1×1 | 512-dimensional |
Fully Connected | 25 classes | 512×25 |
3 Experiments
3.1 DroneRFa Dataset and Setup
- Dataset: 25 classes (24 China UAV models + background noise).
- Signal Acquisition: Dual-channel RF receiver, 100 MS/s sampling rate.
- Preprocessing: Segmented into 0.01s frames (non-overlapping).
- Train/Validation/Test Split: 11,379/3,792/3,792 samples.
3.2 Training Protocol
- Optimizer: Adam (lr=1e-3).
- Batch Size: 32.
- Loss: Categorical cross-entropy.
- Stopping Criterion: Early stopping after 50 epochs of no improvement.
3.3 Results
Table 3: Performance Comparison of Wavelet Decomposition Layers
Feature Input | Accuracy (%) | Classification Time (s) | Speedup vs. STFT |
---|---|---|---|
Raw STFT (Baseline) | 98.80 | 183.9 | 1× |
Wavelet LL1LL1 | 98.07 | 8.21 | 22.4× |
Wavelet LL2LL2 | 97.28 | 2.29 | 80.3× |
Wavelet LL3LL3 | 97.60 | 0.73 | 252× |
Wavelet LL4LL4 | 96.39 | 0.41 | 448× |
The LL3LL3 coefficients achieve the optimal trade-off:
- Accuracy: 97.60% (only 1.2% drop from STFT).
- Efficiency: 0.73s per inference (252× faster than STFT).
4 Conclusion
I have developed a wavelet-optimized framework for real-time identification of China UAV frequency-hopping radiation sources. By extracting multi-scale time-frequency features via 2D-DWT and selecting Layer 3 approximation coefficients, this method reduces dimensionality by 64× while retaining 97.6% classification accuracy. The integration with ResNet-18 ensures robust feature learning, enabling deployment on embedded platforms for field applications. This work addresses a critical gap in low-altitude China UAV surveillance, providing a scalable solution for modern electronic warfare and spectrum monitoring.