Video Image Feature Extraction Method for Police UAV Investigation and Evidence Collection

In practical applications of police drones for investigation and evidence collection, operational stability under adverse conditions like nighttime, haze, and complex terrain is essential. This ensures continuity and reliability while demanding seamless compatibility with other law enforcement systems for information sharing and coordinated operations. To address these requirements, we propose a novel video image feature extraction method.

Our approach begins with detailed analysis of target objects in HSV color space. The HSV model decomposes color information into three components: Hue (H), Saturation (S), and Value (V). This decomposition provides critical advantages for police UAV operations:

Hue (H): Represents dominant wavelength (color type)
Saturation (S): Measures color purity (intensity)
Value (V): Indicates brightness (luminance)

The cylindrical HSV space is mathematically defined as:

$$ H = \begin{cases}
60^\circ \times \left( \frac{G – B}{\Delta} \mod 6 \right) & \text{if } \max = R \\
60^\circ \times \left( \frac{B – R}{\Delta} + 2 \right) & \text{if } \max = G \\
60^\circ \times \left( \frac{R – G}{\Delta} + 4 \right) & \text{if } \max = B
\end{cases} $$

$$ S = \begin{cases}
\frac{\Delta}{\max} & \text{if } \max \neq 0 \\
0 & \text{otherwise}
\end{cases} $$

$$ V = \max(R, G, B) $$

Where $\Delta = \max(R, G, B) – \min(R, G, B)$. For police drone applications, we leverage these properties to isolate targets under varying illumination. When background and target colors are similar, we apply dual-threshold filtering:

$$ \text{Target Mask} = \begin{cases}
1 & \text{if } H_{\text{low}} \leq H \leq H_{\text{high}} \\
& \text{and } S_{\text{low}} \leq S \leq S_{\text{high}} \\
0 & \text{otherwise}
\end{cases} $$

Attention-Based Feature Extraction Networks

Building upon HSV analysis, we introduce attention mechanisms through two specialized networks:

Spectral Attention Multi-scale Network (SeAMN)

Processes spectral features using LSTM with attention weighting to reduce gradient vanishing. The attention mechanism calculates weighted spectral information $u$:

$$ u = O \cdot \alpha $$
$$ \alpha = \text{softmax}(W_2 \cdot e + b_2) $$
$$ e = \tanh(W_1 \cdot O + b_1) $$

Where $O = [h_1, h_2, \ldots, h_n]$ is the LSTM hidden state matrix. The final spectral output combines LSTM memory and attention focus:

$$ y = h_n + u $$

Spatial Attention Multi-scale Network

Processes spatial features through PCA dimensionality reduction followed by convolutional attention blocks. Local patches $Z_{ij}$ are processed using ConvLSTM:

$$ Z_{ij} = X\left[\frac{H(i-1)}{s}:\frac{Hi}{s}, \frac{W(j-1)}{s}:\frac{Wj}{s}\right] $$

For $i < H/s$, $j < W/s$, where $X$ is the spatial neighborhood and $s$ is the scaling factor. Attention weights $\beta$ for spatial features are computed as:

$$ \beta = \sigma(f_{\text{conv}}(Z_{ij})) $$

Spectral-Spatial Joint Feature Network

The spectral and spatial features are fused through score fusion:

$$ S_{\text{se}} = f_{\text{FC}}(F_{\text{se}}) $$
$$ S_{\text{sa}} = f_{\text{FC}}(F_{\text{sa}}) $$
$$ S = m \cdot S_{\text{se}} + (1 – m) \cdot S_{\text{sa}} $$

Where $m$ is a trainable fusion parameter optimized during training. This unified architecture enables police UAVs to extract discriminative features under challenging conditions.

Experimental Validation

We validated our method using 20,000 images captured by PL1ON police drones (1920×1080 resolution). Performance was measured through:

Feature Deviation: Angular difference between feature vectors
Extraction Accuracy: Correct feature identification rate
Processing Speed: Frames processed per second (FPS)

Comparative results demonstrate our method’s superiority:

Table 1: Feature Vector Deviation Comparison (radians)
Samples	Proposed Method	Comparison Method A	Comparison Method B
2	0.032	0.058	0.085
4	0.041	0.061	0.087
6	0.043	0.069	0.089
8	0.045	0.078	0.092
10	0.046	0.082	0.095
12	0.047	0.085	0.098
14	0.050	0.090	0.100

Table 2: Feature Extraction Accuracy (%)
Samples	Proposed Method	Comparison Method A	Comparison Method B
20	94	76	69
40	96	72	78
60	93	75	72
80	94	80	70
100	97	78	79

Table 3: Processing Performance (FPS)
Iterations	Eval Time (s)	Proposed Method	Comparison Method A	Comparison Method B
10	12	50.00	17.65	13.64
20	10	60.00	18.18	13.04
30	9	66.67	16.67	13.64
40	8	75.00	15.00	13.33
50	10	60.00	18.18	13.33
60	11	54.55	18.18	15.00
70	12	50.00	17.14	14.29
80	10	60.00	15.00	14.29
90	11	54.55	16.67	15.00
100	9	66.67	17.65	13.64

Key findings demonstrate our method’s effectiveness for police drone applications:

Lowest feature deviation (0.032–0.050 radians) across sample sizes
Highest extraction accuracy (93–97%) in complex scenarios
Consistent real-time performance (50–75 FPS) enabling operational deployment

Conclusion

The proposed method demonstrates significant advantages for police UAV investigation and evidence collection. By combining HSV color analysis with attention-based spectral-spatial feature extraction, we achieve high accuracy (97%) and real-time processing (75 FPS) under challenging conditions. The integrated network architecture enables police drones to maintain operational effectiveness during night, haze, and complex terrain operations while ensuring compatibility with existing law enforcement systems. Future work will address computational optimization for embedded deployment on police drone platforms and multi-sensor fusion enhancement.