Advanced UAV Frequency-Hopping Signal Classification via Multi-Scale Time-Frequency Analysis

Current Unmanned Aerial Vehicle (UAV) identification methodologies encompass four primary approaches: visual, acoustic, radar, and radio frequency (RF) signal analysis. RF-based passive monitoring offers superior stealth and anti-jamming capabilities despite computational complexity challenges, making it ideal for urban and complex environments. Frequency-hopping (FH) signals – a specialized RF category – dominate uplink control channels in commercial drone technology due to their anti-interference robustness and implementation efficiency. These signals exhibit concentrated energy at specific time-frequency scales with quasi-periodic hopping patterns, mathematically represented as:

$$f_T(t) = A \sum_{k=0}^{N-1} W_T(t – kT_h) \cdot \cos[2\pi(f_k t + \phi_k)]$$

where $W_T$ denotes the window function, $f_k$ represents discrete carrier frequencies, and $A$ is the signal amplitude. Conventional classification methods apply Short-Time Fourier Transform (STFT) to generate time-frequency representations:

$$\text{STFT}_x(t, f) = \int_{-\infty}^{\infty} x(\tau) \omega(\tau – t) e^{-j2\pi f\tau} d\tau$$

Discrete STFT implementation with Hamming windowing yields high-dimensional spectrograms (1024×1024 pixels) that preserve discriminative features but incur significant computational overhead. Direct ResNet-18 classification using STFT features achieves 98.80% accuracy on 25-class drone identification but requires 183.9 seconds for inference, limiting real-time deployment.

Multi-Scale Feature Extraction Framework

Our methodology leverages the inherent physical properties of UAV frequency-hopping signals through 2D discrete wavelet transform (DWT). The Haar wavelet basis is selected for its compact support and step-like characteristics that align with FH signal dynamics:

$$\begin{align*}
\phi_{\text{Haar}} &= \frac{1}{\sqrt{2}}[1, 1] \\
\psi_{\text{Haar}} &= \frac{1}{\sqrt{2}}[1, -1]
\end{align*}$$

Multi-level decomposition separates spectrograms into approximation (LL) and detail coefficients (LH, HL, HH). The approximation coefficients retain essential signal energy distribution while progressively reducing dimensionality:

$$\begin{cases}
\phi(x,y) = \phi(x)\phi(y) \\
\psi^H(x,y) = \psi(x)\phi(y) \\
\psi^V(x,y) = \phi(x)\psi(y) \\
\psi^D(x,y) = \psi(x)\psi(y)
\end{cases}$$

Decomposition depth critically balances information retention and computational efficiency. We systematically evaluate layer-wise performance through rigorous experimentation.

Deep Learning Architecture

ResNet-18 serves as our classification backbone, employing residual blocks that mitigate vanishing gradients through skip connections:

$$x_{m+1} = f(y_m) + h(x_m)$$
$$y_m = g(x_m, W_m)$$

where $g$ denotes residual transformations, $f$ is ReLU activation, and $h(x_m) = x_m$ implements identity mapping. This architecture enables stable gradient flow during backpropagation:

$$\frac{\partial \mathcal{L}}{\partial x_m} = \frac{\partial \mathcal{L}}{\partial x_M} \cdot \left( 1 + \frac{\partial}{\partial x_m} \sum_{i=m}^{M-1} g(x_i, W_i) \right)$$

Experimental Validation

Using the DroneRFa dataset containing 25 signal classes (24 UAV models + background noise), we perform 6:2:2 training/validation/testing splits. Each 0.01s signal segment undergoes STFT followed by Haar wavelet decomposition. Training employs Adam optimization (lr=1e-3, batch=32) with cross-entropy loss.

Decomposition Level	Accuracy (%)	Inference Time (s)	Training Time
Raw STFT	98.80	183.900	–
Level 1 Approximation	98.07	8.209	80m 23s
Level 2 Approximation	97.28	2.293	35m 50s
Level 3 Approximation	97.60	0.728	31m 38s
Level 4 Approximation	96.39	0.411	29m 59s

Level 3 approximation coefficients deliver optimal trade-offs: 252× faster inference than STFT (0.728s vs 183.9s) with minimal accuracy drop (97.60% vs 98.80%). Training convergence analysis confirms comparable learning trajectories for Levels 1-3, while Level 4 exhibits degraded performance.

Conclusion

This work establishes a novel paradigm for UAV frequency-hopping signal identification through multi-scale time-frequency analysis. By exploiting the inherent energy concentration properties of FH signals in drone technology, our wavelet-based feature extraction reduces input dimensionality by >98% compared to conventional spectrograms while preserving discriminative patterns. The ResNet-18 architecture achieves 97.6% classification accuracy with 252× acceleration, enabling real-time deployment on resource-constrained platforms. This approach effectively addresses computational bottlenecks in passive RF monitoring systems for Unmanned Aerial Vehicle detection, providing critical capabilities for urban airspace security and spectrum management.