In practical applications of police drones for investigation and evidence collection, operational stability under adverse conditions like nighttime, haze, and complex terrain is essential. This ensures continuity and reliability while demanding seamless compatibility with other law enforcement systems for information sharing and coordinated operations. To address these requirements, we propose a novel video image feature extraction method.

Our approach begins with detailed analysis of target objects in HSV color space. The HSV model decomposes color information into three components: Hue (H), Saturation (S), and Value (V). This decomposition provides critical advantages for police UAV operations:
- Hue (H): Represents dominant wavelength (color type)
- Saturation (S): Measures color purity (intensity)
- Value (V): Indicates brightness (luminance)
The cylindrical HSV space is mathematically defined as:
$$ H = \begin{cases}
60^\circ \times \left( \frac{G – B}{\Delta} \mod 6 \right) & \text{if } \max = R \\
60^\circ \times \left( \frac{B – R}{\Delta} + 2 \right) & \text{if } \max = G \\
60^\circ \times \left( \frac{R – G}{\Delta} + 4 \right) & \text{if } \max = B
\end{cases} $$
$$ S = \begin{cases}
\frac{\Delta}{\max} & \text{if } \max \neq 0 \\
0 & \text{otherwise}
\end{cases} $$
$$ V = \max(R, G, B) $$
Where $\Delta = \max(R, G, B) – \min(R, G, B)$. For police drone applications, we leverage these properties to isolate targets under varying illumination. When background and target colors are similar, we apply dual-threshold filtering:
$$ \text{Target Mask} = \begin{cases}
1 & \text{if } H_{\text{low}} \leq H \leq H_{\text{high}} \\
& \text{and } S_{\text{low}} \leq S \leq S_{\text{high}} \\
0 & \text{otherwise}
\end{cases} $$
Attention-Based Feature Extraction Networks
Building upon HSV analysis, we introduce attention mechanisms through two specialized networks:
Spectral Attention Multi-scale Network (SeAMN)
Processes spectral features using LSTM with attention weighting to reduce gradient vanishing. The attention mechanism calculates weighted spectral information $u$:
$$ u = O \cdot \alpha $$
$$ \alpha = \text{softmax}(W_2 \cdot e + b_2) $$
$$ e = \tanh(W_1 \cdot O + b_1) $$
Where $O = [h_1, h_2, \ldots, h_n]$ is the LSTM hidden state matrix. The final spectral output combines LSTM memory and attention focus:
$$ y = h_n + u $$
Spatial Attention Multi-scale Network
Processes spatial features through PCA dimensionality reduction followed by convolutional attention blocks. Local patches $Z_{ij}$ are processed using ConvLSTM:
$$ Z_{ij} = X\left[\frac{H(i-1)}{s}:\frac{Hi}{s}, \frac{W(j-1)}{s}:\frac{Wj}{s}\right] $$
For $i < H/s$, $j < W/s$, where $X$ is the spatial neighborhood and $s$ is the scaling factor. Attention weights $\beta$ for spatial features are computed as:
$$ \beta = \sigma(f_{\text{conv}}(Z_{ij})) $$
Spectral-Spatial Joint Feature Network
The spectral and spatial features are fused through score fusion:
$$ S_{\text{se}} = f_{\text{FC}}(F_{\text{se}}) $$
$$ S_{\text{sa}} = f_{\text{FC}}(F_{\text{sa}}) $$
$$ S = m \cdot S_{\text{se}} + (1 – m) \cdot S_{\text{sa}} $$
Where $m$ is a trainable fusion parameter optimized during training. This unified architecture enables police UAVs to extract discriminative features under challenging conditions.
Experimental Validation
We validated our method using 20,000 images captured by PL1ON police drones (1920×1080 resolution). Performance was measured through:
- Feature Deviation: Angular difference between feature vectors
- Extraction Accuracy: Correct feature identification rate
- Processing Speed: Frames processed per second (FPS)
Comparative results demonstrate our method’s superiority:
Samples | Proposed Method | Comparison Method A | Comparison Method B |
---|---|---|---|
2 | 0.032 | 0.058 | 0.085 |
4 | 0.041 | 0.061 | 0.087 |
6 | 0.043 | 0.069 | 0.089 |
8 | 0.045 | 0.078 | 0.092 |
10 | 0.046 | 0.082 | 0.095 |
12 | 0.047 | 0.085 | 0.098 |
14 | 0.050 | 0.090 | 0.100 |
Samples | Proposed Method | Comparison Method A | Comparison Method B |
---|---|---|---|
20 | 94 | 76 | 69 |
40 | 96 | 72 | 78 |
60 | 93 | 75 | 72 |
80 | 94 | 80 | 70 |
100 | 97 | 78 | 79 |
Iterations | Eval Time (s) | Proposed Method | Comparison Method A | Comparison Method B |
---|---|---|---|---|
10 | 12 | 50.00 | 17.65 | 13.64 |
20 | 10 | 60.00 | 18.18 | 13.04 |
30 | 9 | 66.67 | 16.67 | 13.64 |
40 | 8 | 75.00 | 15.00 | 13.33 |
50 | 10 | 60.00 | 18.18 | 13.33 |
60 | 11 | 54.55 | 18.18 | 15.00 |
70 | 12 | 50.00 | 17.14 | 14.29 |
80 | 10 | 60.00 | 15.00 | 14.29 |
90 | 11 | 54.55 | 16.67 | 15.00 |
100 | 9 | 66.67 | 17.65 | 13.64 |
Key findings demonstrate our method’s effectiveness for police drone applications:
- Lowest feature deviation (0.032–0.050 radians) across sample sizes
- Highest extraction accuracy (93–97%) in complex scenarios
- Consistent real-time performance (50–75 FPS) enabling operational deployment
Conclusion
The proposed method demonstrates significant advantages for police UAV investigation and evidence collection. By combining HSV color analysis with attention-based spectral-spatial feature extraction, we achieve high accuracy (97%) and real-time processing (75 FPS) under challenging conditions. The integrated network architecture enables police drones to maintain operational effectiveness during night, haze, and complex terrain operations while ensuring compatibility with existing law enforcement systems. Future work will address computational optimization for embedded deployment on police drone platforms and multi-sensor fusion enhancement.