Fast Filtering Method for Event Information in the Spatiotemporal Domain of UAV Dynamic Vision

In the rapidly evolving field of drone technology, Unmanned Aerial Vehicles (UAVs) are increasingly deployed in high-speed applications such as surveillance, inspection, and autonomous navigation. However, the dynamic vision sensors (DVS) used in these systems often output event streams plagued by significant noise and edge event loss, which can compromise the performance of subsequent algorithms like object detection and obstacle avoidance. To address these challenges, we propose a novel local event stream filtering algorithm, termed Fast Event Match Filtering for Eight-neighbor (FEMF-8N). This method leverages the spatiotemporal characteristics of event data to efficiently filter noise while preserving critical edge information, even under conditions of slow motion or motion initiation. Our approach is specifically designed for real-time operation on UAV platforms, where computational efficiency is paramount. In this paper, we detail the theoretical foundations, algorithmic design, and experimental validation of FEMF-8N, demonstrating its superiority over existing methods in terms of noise reduction, edge preservation, and processing speed.

The core of our work revolves around the unique properties of event cameras, which asynchronously capture changes in log intensity at each pixel, generating events of the form $e = (x, y, t, p)$, where $x$ and $y$ are pixel coordinates, $t$ is the timestamp, and $p \in \{+1, -1\}$ denotes the polarity (positive for increasing intensity, negative for decreasing). This event-based sensing model offers high temporal resolution and low latency, making it ideal for high-speed drone technology. However, it also introduces challenges, such as high sensitivity to noise and the loss of edge events during slow movements. Traditional frame-based filtering methods are ill-suited for this asynchronous data, necessitating specialized approaches that exploit the spatiotemporal correlations inherent in valid events. Noise events typically exhibit random spatiotemporal patterns, whereas valid events from moving objects show strong correlations in both space and time. By capitalizing on this distinction, our FEMF-8N algorithm achieves robust filtering performance.

Existing filtering methods for event data can be broadly categorized into learning-based and spatiotemporal correlation-based approaches. Learning-based methods, such as those employing convolutional neural networks (CNNs), often require extensive training and substantial computational resources, limiting their practicality for real-time UAV applications. In contrast, correlation-based methods, like the one proposed by Delbruck, utilize local spatiotemporal neighborhoods to identify and remove noise. While these are computationally efficient, they can suffer from excessive edge event loss in scenarios involving slow motion or motion initiation. Our FEMF-8N algorithm builds upon these correlation-based principles but introduces key innovations to mitigate these limitations. Specifically, we construct an event meta tensor to capture spatiotemporal features within a defined time window, apply a dimensionality reduction technique to enhance data access efficiency, and implement an improved event matching strategy using Hamming distances in 8-neighborhoods. This combination allows for rapid filtering while maintaining high signal quality, making it highly suitable for integration into drone technology systems.

In the following sections, we first review the relevant theory behind event-based vision and spatiotemporal correlation. We then present the detailed design of the FEMF-8N algorithm, including the construction of the event meta tensor, the event bar transformation (EBT) for efficient data handling, and the event matching module. Subsequently, we describe our experimental setup using a real-world dataset collected from a UAV platform and compare the performance of FEMF-8N against established methods. The results highlight significant improvements in noise reduction, edge preservation, and computational efficiency, underscoring the algorithm’s potential for enhancing UAV operations. We conclude with a discussion of future work aimed at further optimizing parameter selection and expanding the method’s applicability to diverse environments.

Event cameras, inspired by biological vision systems, operate on a fundamentally different principle than conventional frame-based cameras. Instead of capturing full frames at fixed intervals, they asynchronously report changes in log intensity at each pixel. An event is triggered when the difference in log intensity between two time points exceeds a threshold $C$, as defined by:

$$ p = \begin{cases} +1 & \text{if } \log L(x,y,t+\Delta t) – \log L(x,y,t) > C \\ -1 & \text{if } \log L(x,y,t+\Delta t) – \log L(x,y,t) < -C \end{cases} $$

where $L(x,y,t)$ represents the light intensity at pixel $(x,y)$ and time $t$. This mechanism allows event cameras to achieve high dynamic range and minimal motion blur, which are critical advantages in drone technology for capturing fast-moving objects. However, the asynchronous nature also leads to sparse event streams that are susceptible to noise. Noise events often arise from random fluctuations or sensor imperfections and can be distinguished from valid events based on their lack of spatiotemporal correlation. Specifically, valid events from moving edges tend to cluster in both space and time, whereas noise events are isolated. This forms the basis for spatiotemporal filtering methods, which identify noise by checking for the absence of correlated events in local neighborhoods.

The concept of spatiotemporal correlation is central to our filtering approach. For a given event $e_i = (x_i, y_i, t_i, p_i)$, we consider its 8-neighborhood in the pixel plane and a temporal window $\Delta t$. If no correlated events (i.e., events with similar timestamps and polarities) exist within this spatiotemporal region, $e_i$ is classified as noise. Mathematically, the set of noise events $E_{\text{noise}}$ can be defined as:

$$ E_{\text{noise}} = \{ e_k = (x_k, y_k, t_k, p_k) \mid \forall (x_n, y_n) \in D, |t_k – t_n| > \Delta t \} $$

where $D$ represents the 8-neighborhood around $(x_k, y_k)$. This principle has been implemented in various forms, such as Delbruck’s method, which uses a fixed time threshold. However, these methods can fail in scenarios with slow motion, where edge events are sparse and may be mistakenly filtered out. To address this, our FEMF-8N algorithm incorporates multiple events from adjacent time windows, enhancing robustness in such cases.

Another key concept in our method is the use of principal sequences for efficient data access. In multidimensional arrays, the principal sequence determines the order in which elements are stored and accessed in memory. For a 2D array, row-major order stores elements row-by-row, while column-major order stores them column-by-column. By optimizing the principal sequence for event data, we can significantly reduce computational overhead. In FEMF-8N, we introduce the event meta tensor, a 3D data structure that aggregates events over a time window, and then apply a dimensionality reduction to create a 1D event bar. This transformation, called Event Bar Transformation (EBT), maps the 3D tensor to a 1D sequence where events from the same pixel are stored contiguously, facilitating fast access and computation. The mapping function $f$ is defined as:

$$ m = f(x_i, y_j, v_k) = k \times W \times H + j \times W + i $$

where $W$ and $H$ are the width and height of the pixel plane, and $v_k$ is the event polarity vector for pixel $(x_i, y_j)$. This linear mapping ensures that all events for a given pixel are adjacent in memory, reducing the number of operations required for neighborhood queries and improving real-time performance on resource-constrained UAV platforms.

The FEMF-8N algorithm consists of four main steps: construction of the event meta tensor, dimensionality reduction via EBT, event matching using Hamming distances, and output of the filtered event stream. First, for a given time window $\Delta t$, we accumulate events into a 3D tensor $T[x,y,v]$, where each element $v$ is a vector of event polarities for pixel $(x,y)$. The tensor captures the spatiotemporal distribution of events, with $v$ encoding the sequence of polarities over time. This allows us to retain temporal information that is crucial for distinguishing valid events from noise in slow-motion scenarios.

Next, we apply EBT to reduce the 3D tensor to a 1D event bar. This not only saves memory but also optimizes data access patterns. In the event bar, events from the same pixel are stored consecutively, enabling efficient retrieval of neighborhood information during the event matching phase. For example, to access events for pixel $(x,y)$ and its 8-neighbors, we simply compute offsets based on the linear indices, avoiding costly multidimensional indexing. This is particularly beneficial for high-resolution event cameras with millions of pixels, as it minimizes latency in real-time applications for Unmanned Aerial Vehicles.

The event matching module calculates the Hamming distance between event bars from adjacent time windows for corresponding 8-neighborhoods. The Hamming distance $\text{dist}_h$ measures the dissimilarity between two binary sequences, with lower values indicating higher correlation. For two event bars $A$ and $B$ from consecutive windows, we compute:

$$ \text{dist}_h = \sum_{n=1}^{8} \sum_{k=0}^{ER-1} \left( f(x,y,v_k)_{A} \oplus f(x,y,v_k)_{B} \right) $$

where $ER$ is the maximum number of events per pixel, and $\oplus$ denotes the XOR operation. To enhance computational efficiency, we optimize the Hamming distance calculation by counting the number of set bits in the XOR result using bit-level operations, such as $\text{xor} \& (\text{xor} – 1)$ in a loop until $\text{xor}$ becomes zero. This reduces the number of iterations required compared to a naive approach. If $\text{dist}_h$ exceeds a threshold $T_{\text{dist}}$, the events in the neighborhood are considered noise and are filtered out. Otherwise, they are retained as valid events. This improved matching strategy incorporates temporal information from multiple events, reducing false negatives at motion edges and improving overall filtering quality.

Finally, the filtered event stream $E^*$ is obtained by removing the identified noise events from the original stream $E$:

$$ E^* = E – E_{\text{noise}} $$

This output can then be used for downstream tasks in drone technology, such as object detection or SLAM, with enhanced data quality and reduced computational load.

To validate the effectiveness of FEMF-8N, we conducted experiments using a dataset collected from a UAV platform in a complex forest environment. The setup included a Jeston Xavier NX onboard computer, a CUAV V5+ flight controller, and a high-frame-rate camera simulating event data via the ESIM simulator. The UAV flew at an average speed of 7 m/s, and events were accumulated over time windows of $\Delta t = 10$ ms. The dataset comprised 28,800 frames, yielding over 18 million events. We compared FEMF-8N against Delbruck’s method and a learning-based approach (EDnCNN) in terms of signal ratio (SR), noise ratio (NR), signal-to-noise ratio (SNR), and processing time. The metrics are defined as:

$$ \text{SR} = \frac{E_o}{E_i}, \quad \text{NR} = \frac{E_{\text{noise},o}}{E_{\text{noise},i}}, \quad \text{SNR} = 10 \log_{10} \left( \frac{E_o}{E_{\text{noise},o}} \right) $$

where $E_o$ and $E_i$ are the output and input event counts, and $E_{\text{noise},o}$ and $E_{\text{noise},i}$ are the output and input noise event counts, respectively.

In the effectiveness experiment, FEMF-8N demonstrated superior noise reduction and edge preservation compared to Delbruck’s method. For instance, in low-altitude flights, Delbruck’s method retained numerous ground events that were unrelated to the UAV’s trajectory, leading to redundancy. In contrast, FEMF-8N effectively filtered these non-adjacent events while maintaining sharp object contours. The following table summarizes the quantitative results for a sample event stream:

Event Stream	SR	NR	SNR (dB)	Event Count
Original	–	–	-10.789	798,704
Delbruck	0.903	0.223	5.018	415,486
FEMF-8N	0.914	0.129	7.430	378,027

As shown, FEMF-8N achieved a higher SR and lower NR than Delbruck’s method, resulting in a better SNR. This indicates that our method preserves more valid events while removing more noise. Additionally, we analyzed the removal of ground area events, which are often redundant in UAV applications. FEMF-8N achieved a removal rate of 94.74%, compared to 86.45% for Delbruck’s method, further highlighting its efficiency in reducing event redundancy.

To evaluate edge event loss during slow motion, we varied the dimension $k$ of the event polarity vector in the event meta tensor. A larger $k$ incorporates more temporal information, reducing the loss of edge events in motion initiation scenarios. The results for $k=1$, $k=5$, and $k=10$ are presented below:

Event Stream	SR	NR	SNR (dB)	Event Count
Original	–	–	-5.285	87,545
Delbruck	0.286	0.170	-3.036	17,231
$k=1$	0.285	0.174	-3.142	17,448
$k=5$	0.851	0.192	3.899	23,926
$k=10$	0.951	0.103	2.393	29,949

For $k=1$, which corresponds to using only the last event from the previous time window, the performance was similar to Delbruck’s method, with low SR and high edge loss. As $k$ increased to 5 and 10, SR improved significantly, indicating better preservation of edge events. However, at $k=10$, SNR decreased slightly due to the retention of some redundant ground events. This suggests that an intermediate value like $k=5$ strikes a balance between edge preservation and noise removal, making it ideal for UAV applications where motion varies.

In terms of computational performance, we measured the average processing time per event frame for different methods and principal sequences in FEMF-8N. The results, based on an event vector length of 5, are as follows:

Method / Principal Sequence	Average Time (ms/frame)	Time Variance
Delbruck	7.08	0.31
EDnCNN	1529.32	17.58
FEMF-8N (Row)	32.56	18.89
FEMF-8N (Column)	29.88	16.11
FEMF-8N (Event Polarity)	8.98	0.34

EDnCNN, as a learning-based method, exhibited high latency, making it unsuitable for real-time drone technology. Delbruck’s method was the fastest but suffered from edge loss. FEMF-8N with event polarity as the principal sequence achieved an average time of 8.98 ms per frame, close to Delbruck’s method, with low variance, indicating stable performance. This demonstrates that our algorithm meets the real-time requirements of Unmanned Aerial Vehicles while providing superior filtering quality.

In conclusion, the FEMF-8N algorithm effectively addresses the challenges of noise filtering and edge preservation in event streams for UAV dynamic vision. By leveraging spatiotemporal correlations and an efficient data structure, it achieves high performance in real-time scenarios. Future work will focus on adaptive parameter optimization to enhance generalization across different environments and further integration with navigation tasks in drone technology. The advancements presented here contribute to the broader adoption of event-based vision in autonomous systems, paving the way for more robust and efficient Unmanned Aerial Vehicle operations.