Current Unmanned Aerial Vehicle (UAV) identification methodologies encompass four primary approaches: visual, acoustic, radar, and radio frequency (RF) signal analysis. RF-based passive monitoring offers superior stealth and anti-jamming capabilities despite computational complexity challenges, making it ideal for urban and complex environments. Frequency-hopping (FH) signals – a specialized RF category – dominate uplink control channels in commercial drone technology due to their anti-interference robustness and implementation efficiency. These signals exhibit concentrated energy at specific time-frequency scales with quasi-periodic hopping patterns, mathematically represented as:
$$f_T(t) = A \sum_{k=0}^{N-1} W_T(t – kT_h) \cdot \cos[2\pi(f_k t + \phi_k)]$$
where \(W_T\) denotes the window function, \(f_k\) represents discrete carrier frequencies, and \(A\) is the signal amplitude. Conventional classification methods apply Short-Time Fourier Transform (STFT) to generate time-frequency representations:
$$\text{STFT}_x(t, f) = \int_{-\infty}^{\infty} x(\tau) \omega(\tau – t) e^{-j2\pi f\tau} d\tau$$
Discrete STFT implementation with Hamming windowing yields high-dimensional spectrograms (1024×1024 pixels) that preserve discriminative features but incur significant computational overhead. Direct ResNet-18 classification using STFT features achieves 98.80% accuracy on 25-class drone identification but requires 183.9 seconds for inference, limiting real-time deployment.

Multi-Scale Feature Extraction Framework
Our methodology leverages the inherent physical properties of UAV frequency-hopping signals through 2D discrete wavelet transform (DWT). The Haar wavelet basis is selected for its compact support and step-like characteristics that align with FH signal dynamics:
$$\begin{align*}
\phi_{\text{Haar}} &= \frac{1}{\sqrt{2}}[1, 1] \\
\psi_{\text{Haar}} &= \frac{1}{\sqrt{2}}[1, -1]
\end{align*}$$
Multi-level decomposition separates spectrograms into approximation (LL) and detail coefficients (LH, HL, HH). The approximation coefficients retain essential signal energy distribution while progressively reducing dimensionality:
$$\begin{cases}
\phi(x,y) = \phi(x)\phi(y) \\
\psi^H(x,y) = \psi(x)\phi(y) \\
\psi^V(x,y) = \phi(x)\psi(y) \\
\psi^D(x,y) = \psi(x)\psi(y)
\end{cases}$$
Decomposition depth critically balances information retention and computational efficiency. We systematically evaluate layer-wise performance through rigorous experimentation.
Deep Learning Architecture
ResNet-18 serves as our classification backbone, employing residual blocks that mitigate vanishing gradients through skip connections:
$$x_{m+1} = f(y_m) + h(x_m)$$
$$y_m = g(x_m, W_m)$$
where \(g\) denotes residual transformations, \(f\) is ReLU activation, and \(h(x_m) = x_m\) implements identity mapping. This architecture enables stable gradient flow during backpropagation:
$$\frac{\partial \mathcal{L}}{\partial x_m} = \frac{\partial \mathcal{L}}{\partial x_M} \cdot \left( 1 + \frac{\partial}{\partial x_m} \sum_{i=m}^{M-1} g(x_i, W_i) \right)$$
Experimental Validation
Using the DroneRFa dataset containing 25 signal classes (24 UAV models + background noise), we perform 6:2:2 training/validation/testing splits. Each 0.01s signal segment undergoes STFT followed by Haar wavelet decomposition. Training employs Adam optimization (lr=1e-3, batch=32) with cross-entropy loss.
| Decomposition Level | Accuracy (%) | Inference Time (s) | Training Time |
|---|---|---|---|
| Raw STFT | 98.80 | 183.900 | – |
| Level 1 Approximation | 98.07 | 8.209 | 80m 23s |
| Level 2 Approximation | 97.28 | 2.293 | 35m 50s |
| Level 3 Approximation | 97.60 | 0.728 | 31m 38s |
| Level 4 Approximation | 96.39 | 0.411 | 29m 59s |
Level 3 approximation coefficients deliver optimal trade-offs: 252× faster inference than STFT (0.728s vs 183.9s) with minimal accuracy drop (97.60% vs 98.80%). Training convergence analysis confirms comparable learning trajectories for Levels 1-3, while Level 4 exhibits degraded performance.
Conclusion
This work establishes a novel paradigm for UAV frequency-hopping signal identification through multi-scale time-frequency analysis. By exploiting the inherent energy concentration properties of FH signals in drone technology, our wavelet-based feature extraction reduces input dimensionality by >98% compared to conventional spectrograms while preserving discriminative patterns. The ResNet-18 architecture achieves 97.6% classification accuracy with 252× acceleration, enabling real-time deployment on resource-constrained platforms. This approach effectively addresses computational bottlenecks in passive RF monitoring systems for Unmanned Aerial Vehicle detection, providing critical capabilities for urban airspace security and spectrum management.
