Fast Spatiotemporal Event Filtering for Drone Dynamic Vision

High-speed operation of Unmanned Aerial Vehicles (UAVs) presents unique challenges for visual perception systems. Event cameras, or Dynamic Vision Sensors (DVS), offer advantages like high temporal resolution and low latency for drone technology. However, their output event streams suffer from significant noise interference and edge event loss during rapid maneuvers. This paper introduces FEMF-8N (Fast Event Match Filtering for Eight-neighbor), a novel local event stream filtering algorithm designed to address these critical issues in drone applications.

Event cameras operate asynchronously, generating an event $ e = (x, y, t, p) $ at pixel location $(x, y)$ and timestamp $t$ when the logarithmic brightness change exceeds a threshold $C$. The polarity $p$ (+1 or -1) indicates brightness increase or decrease:
$$
p = \begin{cases}
+1 & \text{if } \log L(x,y,t+\Delta t) – \log L(x,y,t) > C \\
-1 & \text{if } \log L(x,y,t+\Delta t) – \log L(x,y,t) < -C
\end{cases}
$$
Unlike frame-based sensors, DVS output forms a continuous stream $E_{\text{stream}} = \{(x, y, t, p)\}$. Noise events exhibit low spatiotemporal correlation with their neighbors, while valid events show high correlation. Existing methods face limitations: deep learning approaches (e.g., EDnCNN, AEDNet) incur high computational overhead unsuitable for real-time drone technology, while classical spatiotemporal filters (e.g., Delbruck’s method) struggle with edge preservation during slow motion or motion initiation and fail to filter non-adjacent ground events efficiently during low-altitude Unmanned Aerial Vehicle flight.

FEMF-8N processes events within a configurable time window $\Delta t$. It constructs a 3D Event Meta Tensor $T[x, y, v]$ capturing spatiotemporal information:
$$ T[x, y, v] = \{ (x_i, y_j, v_k) \mid i=1..W, j=1..H, k=1..n, v \in \{-1, 0, +1\} \} $$
where $W$ and $H$ are sensor resolution dimensions, $n$ is the max events per pixel, and $v$ encodes polarity (-1: no event, 0: negative, +1: positive). To optimize access speed crucial for drone real-time processing, FEMF-8N applies Event Bar Transformation (EBT), flattening the tensor along the polarity dimension into a 1D Event Bar $EB$ via linear mapping:
$$ m = f(x_i, y_j, v_k) = i \times ER \times H + j \times ER + k $$
where $ER$ is the max events per pixel. This ensures contiguous memory storage for events at the same pixel and efficient neighborhood access.

The core filtering uses enhanced spatiotemporal correlation assessment. For corresponding pixels in consecutive time windows $A$ and $B$, FEMF-8N calculates the Hamming distance $dist_h$ over their 8-neighborhoods $D_n$:
$$ D_n = \{(x\pm1,y\pm1), (x\pm1,y), (x,y\pm1)\} $$
$$ dist_h = \sum_{n=0}^{7} \sum_{k=0}^{ER} \left( EB_A[f(x_{D_n}, y_{D_n}, k)] \oplus EB_B[f(x_{D_n}, y_{D_n}, k)] \right) $$
Events are classified as noise and removed if $dist_h$ exceeds a threshold $T_{\text{dist}}$:
$$ E_{\text{noise}} = \{ e(x,y,t,p) \mid dist_h \geq T_{\text{dist}}, 0 \leq x < W, 0 \leq y < H \} $$
$$ E_{\text{filtered}}^* = E_{\text{stream}} \setminus E_{\text{noise}} $$
This approach leverages multi-event temporal context within $\Delta t$, reducing edge loss during slow motion while effectively suppressing spatially uncorrelated noise and non-adjacent ground clutter critical for low-altitude drone operations.

Method	Signal Ratio (SR)	Noise Ratio (NR)	SNR (dB)	Processing Time (ms)
Raw Event Stream	–	–	-10.789	–
Delbruck	0.903	0.223	5.018	7.08
EDnCNN	0.921	0.130	7.512	1529.32
FEMF-8N (Proposed)	0.914	0.129	7.430	8.98

Experimental validation used real-world UAV datasets captured at 1200 fps, processed on a Jetson Xavier NX. FEMF-8N demonstrated superior noise suppression for drone technology, particularly removing 94.74% of non-adjacent ground events compared to Delbruck’s 86.45% (Table 2). Crucially, varying the event vector dimension $k$ (number of events stored per pixel) revealed FEMF-8N’s edge preservation capability. While $k=1$ suffered significant edge loss similar to Delbruck (SR ~0.285), $k=5$ optimally balanced noise removal (NR=0.192) with high signal retention (SR=0.851) and SNR (3.899 dB), crucial for object contour clarity in drone perception during slow maneuvers. $k=10$ further increased SR (0.951) but slightly reduced SNR (2.393 dB) due to residual ground clutter.

Event Vector Dim (k)	Signal Ratio (SR)	Noise Ratio (NR)	SNR (dB)
1	0.285	0.174	-3.142
5	0.851	0.192	3.899
10	0.951	0.103	2.393

Runtime analysis (Table 3) confirmed FEMF-8N’s suitability for real-time drone technology. Using the event polarity dimension for EBT (“Event” principal component) minimized processing time (8.98 ms per 10ms window, variance=0.34), closely matching Delbruck’s speed (7.08 ms) while significantly outperforming learning-based EDnCNN (1529.32 ms). Other principal component choices (“Row”, “Column”) were substantially slower. This efficiency stems from EBT’s optimized memory access and the efficient Hamming distance computation.

FEMF-8N provides a robust solution for event-based vision in drone technology. It enables real-time operation (avg. 8.98ms processing within 10ms windows) on embedded hardware, critical for high-speed Unmanned Aerial Vehicle applications. The algorithm effectively filters spatially uncorrelated noise and non-adjacent ground clutter, reducing event redundancy. Crucially, by leveraging multi-event temporal context within the Event Meta Tensor, FEMF-8N minimizes edge event loss during slow motion or motion initiation, significantly improving object contour clarity in event-based representations. Future work involves adaptive parameter optimization for the event vector dimension across diverse drone operational scenarios and integration into downstream UAV tasks like event-based obstacle avoidance and target tracking.