Optimization of Time-Frequency Feature Extraction and Classification for China UAV Frequency-Hopping Signals

Abstract
To address the critical challenges of high computational complexity and insufficient real-time performance in deep learning-based classification of China UAV frequency-hopping (FH) signals, I propose a novel method leveraging multi-scale time-frequency features via 2D wavelet decomposition. By exploiting the physical properties of FH signals—particularly their energy concentration at specific time-frequency scales—this method significantly reduces data dimensionality while preserving discriminative features. Experimental validation on the DroneRFa dataset (containing 25 China UAV FH signals) demonstrates a recognition accuracy of 97.6%, merely 1.2% lower than raw time-frequency features, while achieving a 252× acceleration in classification speed. This breakthrough enables real-time identification of China UAV radiation sources in resource-constrained embedded systems.

1 Introduction

The rapid proliferation of China UAV platforms necessitates advanced signal identification techniques for security and surveillance. Existing methods—acoustic, visual, radar, and RF-based—face limitations in complex urban environments. RF passive monitoring offers superior stealth and anti-jamming capabilities but suffers from high computational loads. China UAV systems, such as DJI’s Phantom and Inspire series, widely adopt FH communications in control uplinks:f(t)=A∑k=0N−1WT(t−kTk)⋅cos⁡[2πfk(t−kTk)+φn]f(t)=Ak=0∑N−1WT(t−kTk)⋅cos[2πfk(t−kTk)+φn]

(Equation 1: FH signal model)

where WTWT is a window function, fkfk denotes hopping frequencies, and AA is the modulation amplitude. Traditional deep learning approaches use five input representations: time-domain, frequency-domain, time-frequency-domain (TFD), transform-domain, and multi-domain. TFD via Short-Time Fourier Transform (STFT) provides excellent separability but introduces prohibitive dimensionality:STFTx(n,k)=∑m=0N−1x(n+m)ω(m)e−j2πmk/NSTFTx(n,k)=m=0∑N−1x(n+m)ω(m)e−j2πmk/N

(Equation 2: Discrete STFT)

Here, x(n)x(n) is the sampled signal, ω(m)ω(m) is the Hanning window, and NN is the window length. STFT generates a 1024×1024 matrix per sample (DroneRFa dataset), demanding extensive storage and computation. To overcome this, I introduce a wavelet-based multi-scale feature extraction framework optimized for China UAV FH signals.

2 Methodology

2.1 Time-Frequency Multi-Scale Feature Extraction

FH signals exhibit energy concentration at fixed time-frequency scales due to their匀速跳变 (constant-hopping) nature. I apply 2D Discrete Wavelet Transform (2D-DWT) to STFT spectrograms to decompose signals into approximation (LLLL) and detail coefficients (LHLH, HLHL, HHHH):{φ(x,y)=φ(x)φ(y)ψH(x,y)=ψ(x)φ(y)ψV(x,y)=φ(x)ψ(y)ψD(x,y)=ψ(x)ψ(y)⎩⎨⎧φ(x,y)=φ(x)φ(y)ψH(x,y)=ψ(x)φ(y)ψV(x,y)=φ(x)ψ(y)ψD(x,y)=ψ(x)ψ(y)

(Equation 3: 2D wavelet basis functions)

The Haar wavelet is ideal for China UAV FH signals due to its orthogonality, symmetry, and step-like shape:φ=12[1,1],ψ=12[1,−1]φ=21[1,1],ψ=21[1,−1]

(Equation 4: Haar wavelet)

Multi-layer decomposition reduces dimensionality exponentially:

Layer 1: Input (1024×1024) → LL1LL1 (512×512)
Layer 2: LL1LL1 → LL2LL2 (256×256)
Layer 3: LL2LL2 → LL3LL3 (128×128)
Layer 4: LL3LL3 → LL4LL4 (64×64)

Table 1: Dimensionality Reduction via Wavelet Decomposition

Decomposition Layer	Matrix Size	Data Reduction
Raw STFT	1024×1024	1× (Baseline)
Layer 1 (LL1LL1)	512×512	4×
Layer 2 (LL2LL2)	256×256	16×
Layer 3 (LL3LL3)	128×128	64×
Layer 4 (LL4LL4)	64×64	256×

2.2 ResNet-18 Classifier with Residual Learning

I use ResNet-18 to learn hierarchical features from approximation coefficients. Its residual blocks mitigate vanishing gradients via skip connections:xm+1=f(ym),ym=xm+F(xm,Wm)xm+1=f(ym),ym=xm+F(xm,Wm)

(Equation 5: Residual block)

where FF is a residual function, WmWm are weights, and ff is ReLU activation. The gradient flow is preserved as:∂L∂xm=∂L∂xM(1+∂∂xm∑i=mM−1F(xi,Wi))∂xm∂L=∂xM∂L(1+∂xm∂i=m∑M−1F(xi,Wi))

(Equation 6: Gradient propagation)

*Table 2: ResNet-18 Configuration*

Layer Type	Output Size	Parameters
Convolution + BN	128×128	7×7, 64, stride 2
Max Pooling	64×64	3×3, stride 2
Residual Block 1	64×64	[3×3, 64]×2
Residual Block 2	32×32	[3×3, 128]×2
Residual Block 3	16×16	[3×3, 256]×2
Residual Block 4	8×8	[3×3, 512]×2
Global Avg Pooling	1×1	512-dimensional
Fully Connected	25 classes	512×25

3 Experiments

3.1 DroneRFa Dataset and Setup

Dataset: 25 classes (24 China UAV models + background noise).
Signal Acquisition: Dual-channel RF receiver, 100 MS/s sampling rate.
Preprocessing: Segmented into 0.01s frames (non-overlapping).
Train/Validation/Test Split: 11,379/3,792/3,792 samples.

3.2 Training Protocol

Optimizer: Adam (lr=1e-3).
Batch Size: 32.
Loss: Categorical cross-entropy.
Stopping Criterion: Early stopping after 50 epochs of no improvement.

3.3 Results

Table 3: Performance Comparison of Wavelet Decomposition Layers

Feature Input	Accuracy (%)	Classification Time (s)	Speedup vs. STFT
Raw STFT (Baseline)	98.80	183.9	1×
Wavelet LL1LL1	98.07	8.21	22.4×
Wavelet LL2LL2	97.28	2.29	80.3×
Wavelet LL3LL3	97.60	0.73	252×
Wavelet LL4LL4	96.39	0.41	448×

The LL3LL3 coefficients achieve the optimal trade-off:

Accuracy: 97.60% (only 1.2% drop from STFT).
Efficiency: 0.73s per inference (252× faster than STFT).

4 Conclusion

I have developed a wavelet-optimized framework for real-time identification of China UAV frequency-hopping radiation sources. By extracting multi-scale time-frequency features via 2D-DWT and selecting Layer 3 approximation coefficients, this method reduces dimensionality by 64× while retaining 97.6% classification accuracy. The integration with ResNet-18 ensures robust feature learning, enabling deployment on embedded platforms for field applications. This work addresses a critical gap in low-altitude China UAV surveillance, providing a scalable solution for modern electronic warfare and spectrum monitoring.