Heterogeneous Anti-Drone Vision System

In contemporary society, the proliferation of unmanned aerial vehicles (UAVs) across military, civilian, and commercial sectors has introduced significant security challenges. Incidents involving unauthorized “rogue flights” pose substantial threats to public safety, critical infrastructure, and national security. Traditional detection systems exhibit pronounced limitations in this complex landscape. Radar systems suffer from low probability of detection for slow-moving, low-radar-cross-section UAVs, often below 40%, and are highly susceptible to clutter from urban structures, leading to elevated false alarm rates. Visible-light camera systems are rendered ineffective under poor lighting conditions such as night or heavy overcast, with effective detection ranges typically limited to 200 meters. Their performance also degrades severely under strong backlight scenarios due to diminished signal-to-noise ratio. While thermal imaging systems offer all-weather capability, their widespread deployment is constrained by the high cost of core infrared detector components. These inherent flaws underscore the pressing need for a cost-effective, high-precision, and truly all-weather anti-drone surveillance solution.

To address these critical industry pain points, we propose a comprehensive technological innovation centered on a short-wave infrared (SWIR) intelligent detection system built upon a heterogeneous computing architecture. This system represents a paradigm shift in anti-drone perception. At its core lies a domestically developed Indium Gallium Arsenide (InGaAs) detector. Exploiting the unique imaging properties of this detector, our system achieves all-weather target acquisition, effectively penetrating haze, fog, and performing reliably in low-light conditions where traditional visible systems fail. The superiority of the SWIR band for this application is summarized below:

Comparison Metric	Short-Wave Infrared (SWIR) Band	Visible Light Band
Illumination Dependency	Low dependency, minimal performance degradation in darkness.	Severely compromised at night or under low light.
Atmospheric Penetration	Superior transmission through haze, smoke, and dust; robust in varied weather.	Performance significantly degraded by fog, smoke, and other particulates.
Camouflage Recognition	Capable of identifying drones with visual camouflage.	Often fails to detect visually camouflaged targets.
Dynamic Range	Wider dynamic range preserves details in scenes with high contrast (e.g., bright sky vs. dark shadow).	Limited dynamic range struggles with extreme lighting contrasts.

The proposed system is built on a multi-layered hardware architecture. A high-speed transmission circuit, designed in-house, facilitates robust image capture and data transfer. The focal plane array is integrated with a precision temperature control module to ensure imaging stability. On the software and algorithm front, we have developed and implemented on FPGA a comprehensive processing pipeline including non-uniformity correction (NUC), bad pixel detection and compensation, pseudo-color processing, image enhancement, and high-dynamic-range (HDR) imaging. These algorithms significantly enhance image quality and usable dynamic range. For the critical task of target recognition, an optimized YOLOv5s network is employed. Through channel pruning, operator fusion, and 8-bit fixed-point quantization, the network is accelerated via parallel processing on the FPGA. Deployed on a ZYNQ7100 platform, the system achieves a real-time detection performance of 17 frames per second (fps), marking a substantial improvement in both recognition speed and energy efficiency compared to conventional solutions. Experimental validation confirms an average recognition accuracy of 80% against drones in complex, cluttered backgrounds. A dual-mode human-machine interface supports both local touch control and remote monitoring, providing operational flexibility. This system offers a high-performance, cost-effective solution for low-altitude airspace security, holding significant value for both civilian and military anti-drone applications.

System Architecture and Hardware Design

The hardware foundation of our anti-drone system is a hierarchical, modular design comprising an expansion board and a main board, forming a synergistic processing chain. The central processing platform is built around the ZYNQ System-on-Chip (SoC), which creates a powerful heterogeneous computing environment by integrating a dual-core ARM processor (Processing System, PS) with a Field-Programmable Gate Array (FPGA, Programmable Logic, PL).

The optical front-end on the expansion board features a domestic 640×512 pixel InGaAs focal plane array (FPA) detector with a spectral response from 0.9 μm to 1.7 μm. This SWIR detector is the cornerstone of our all-weather capability. The analog signals from the detector are digitized by two high-speed CBM96AD56-125 Analog-to-Digital Converters (ADCs). Data transfer is achieved via the JESD204B serial protocol, enabling a stable data throughput sufficient for 300 frames per second. A critical subsystem is the integrated precision temperature control, driven by an ADN8835 chip managing a Thermoelectric Cooler (TEC). This system maintains the FPA at a constant temperature, ensuring stable responsivity and low dark current, which is vital for consistent image quality. The core hardware modules and their specifications are detailed below:

Module	Key Component	Specification / Function
Optical Sensing	InGaAs FPA Detector	640 x 512 pixels, 0.9-1.7 μm, 8-channel analog output.
Data Acquisition	High-Speed ADCs (x2)	JESD204B interface, 8 channels @ 2.5 Gbps each. Synchronized by LMK04828 clock generator.
Temperature Control	ADN8835 TEC Driver	Precision control of FPA operating temperature (±0.1°C stability).
Core Processing	ZYNQ-7100 SoC	Dual-core ARM Cortex-A9 (PS) + Artix-7 FPGA (PL). Powered by LMZ31710 multi-rail supply.
Memory System	DDR3, QSPI Flash, eMMC, SD	4x MT41K256M16 for 64-bit DDR3 (1GB). Multi-tier storage for boot, cache, and data.
Communication & I/O	Dual GigE, Camera Link, LCD	RTL8211F-CG PHY chips; Industrial Camera Link output; RGB LCD for local display.

The main board hosts the ZYNQ device and surrounding infrastructure. The memory system employs a compound structure using eMMC for the boot file system, an SD card for data logging, and QSPI Flash for configuration bitstreams. This is complemented by a 64-bit wide DDR3 memory bank for high-speed data buffering during processing. Communication interfaces are designed for versatility: dual Gigabit Ethernet ports enable simultaneous remote video streaming and command/control, a Camera Link interface allows integration with professional imaging equipment, and an RGB LCD interface provides a local real-time display. This modular hardware design ensures reliability, scalability, and ease of maintenance for field-deployable anti-drone units.

Software Algorithm and FPGA Acceleration Pipeline

The software architecture is built on a modular processing pipeline designed to efficiently handle the data flow from capture to detection. The initial stage involves a custom detector driver and a JESD204B protocol parser within the PL, which recovers the high-speed serial data into parallel image frames. A video timing generation module then formats this raw data into standard video streams for output via multiple channels: Camera Link, UDP stream via PL Ethernet, TCP stream via PS network stack, and the local RGB LCD.

The image preprocessing pipeline is fully implemented in the FPGA fabric to minimize latency and offload the processors. Each algorithm is optimized for parallel execution.

Non-Uniformity Correction (NUC): We employ a two-point correction method. The detector’s response is calibrated at two different blackbody temperatures to generate gain and offset coefficients for each pixel. The correction is applied in real-time:
$$ I_{corrected}(x,y) = G(x,y) \cdot I_{raw}(x,y) + O(x,y) $$
where $G(x,y)$ and $O(x,y)$ are the per-pixel gain and offset coefficients stored in block RAM.
Bad Pixel Detection and Compensation: An efficient median-filter based algorithm utilizing a three-line buffer architecture identifies static bad pixels. A surrounding 3×3 median filter is applied, and if the difference between the pixel value and the median exceeds a threshold, it is flagged and replaced by the median value.
$$ BP_{detected} = |I(x,y) – median(3×3\;window)| > T_{bp} $$
Image Enhancement: An improved histogram equalization algorithm is used. A histogram is computed for each frame using a dual-port RAM for accumulation. Instead of a global equalization, a dynamic range partitioning method is used to prevent over-enhancement of background noise while improving contrast in regions of interest.
$$ CDF_{modified}(v) = \alpha \cdot CDF(v) + (1-\alpha) \cdot v_{norm} $$
Here, $\alpha$ is a weighting factor, and $v_{norm}$ is the normalized pixel value.
Pseudo-Color Mapping & HDR Fusion: For display purposes, the 14-bit monochrome SWIR data is dynamically compressed and mapped to an RGB palette. A multi-exposure HDR fusion algorithm, based on a Gaussian-Laplacian pyramid architecture, synthesizes details from both highlights and shadows when applicable.

The cornerstone of our intelligent anti-drone capability is the deep learning-based target detector. We selected YOLOv5s for its favorable balance between speed and accuracy and subjected it to extensive optimization for embedded deployment on the ZYNQ platform.

Model Optimization: The SiLU activation functions were replaced with Leaky ReLU to reduce computational complexity. The model underwent pruning to remove redundant channels and was subsequently quantized to 8-bit fixed-point precision, dramatically reducing its memory footprint and computational load without significant accuracy loss.
Heterogeneous FPGA Acceleration: We designed a custom computing architecture in the PL. The core of this accelerator is an array of 1024 parallel Multiply-Accumulate (MAC) units organized to exploit the spatial parallelism in convolutional layers. A five-stage pipeline ensures high throughput. The weights and activation data are streamed through these processing elements (PEs). The PS side of the ZYNQ is responsible for orchestrating the network execution, managing data movement between DDR and the PL accelerator, and executing post-processing steps like Non-Maximum Suppression (NMS). A double-buffering mechanism is employed to overlap computation and data transfer, effectively hiding latency.

The complete detection process, from a preprocessed input image $I_{pre}$ to a list of bounding boxes $B_{out}$, can be summarized as a function of the accelerated network $N_{acc}$ and the post-processing $PP$:
$$ F, C = N_{acc}(I_{pre}; W_{8bit}) $$
$$ B_{out} = PP(F, C) $$
where $F$ are the feature maps, $C$ are the class/box predictions, and $W_{8bit}$ are the quantized network parameters. The intersection-over-union (IoU) based NMS is crucial for filtering overlapping detections:
$$ B_{out} = \{ b_i | \forall j \neq i, \frac{area(b_i \cap b_j)}{area(b_i \cup b_j)} < T_{nms} \} $$
This tightly coupled PL-PS co-processing scheme is the key to achieving real-time performance.

System Performance and Comparative Analysis

The integrated system was rigorously tested to evaluate its core functionalities: SWIR image acquisition, real-time image processing, and UAV target detection. The image enhancement and pseudo-color functions were validated, providing users with adjustable parameters to improve clarity and detail based on environmental conditions.

We conducted quantitative performance evaluations under two challenging scenarios critical for anti-drone operations:

Overcast/Low-Light Environment: Testing under cloudy skies with no obstructions, the system processed 5,000 consecutive frames. It achieved an average drone detection rate of 97% at an average frame rate of 17 fps.
Complex Cluttered Background: A more demanding test with partial occlusions (e.g., trees, buildings) was performed. Over 5,000 frames, the system maintained an average detection accuracy of 80% at 16 fps, demonstrating robustness in challenging visual environments.

To contextualize our system’s efficiency, we compare it against other FPGA-based implementations documented in recent literature. The evaluation metrics include accuracy, frame rate, resource utilization (DSP slices), and derived metrics like energy efficiency (Giga-Operations Per Second per Watt, GOPS/W) and DSP efficiency (percentage of theoretical peak performance achieved).

Parameter	Ref. A (ZCU104)	Ref. B (ZYNQ7020)	Ref. C (ZYNQ7045)	Our Work (ZYNQ7100)
Network Model	YOLOv5s	YOLOv3-tiny	YOLOv2-tiny	YOLOv5-Ours
Clock Freq. (MHz)	200	200	200	200
Precision (bits)	8	8	8	8
Accuracy (%)	87	72	77	82
DSP Slices Used (%)	Not Specified	Not Specified	610 (67.8%)	1024 (50.6%)
Energy Eff. (GOPS/W)	31.3	18.18	25.34	26.90
Frame Rate (fps)	220	11	43.7	18
DSP Efficiency (%)	Not Specified	Not Specified	95.2	96.29

The analysis reveals a balanced and competitive design. While Ref. A on the more powerful ZCU104 platform (which includes a dedicated Neural Processing Unit) achieves a higher frame rate, our system on the mid-range ZYNQ7100 offers a superior balance of cost, accuracy, and power efficiency. Our accuracy of 82% significantly outperforms the lighter models (YOLOv3-tiny, YOLOv2-tiny) used in Ref. B and C, while our frame rate of 18 fps meets real-time requirements for anti-drone surveillance. Crucially, our design achieves the highest DSP efficiency at 96.29%, indicating exceptionally effective use of the available FPGA computational resources. This high efficiency, coupled with a strong energy-efficiency metric (26.9 GOPS/W), validates our optimization strategies for embedded, power-conscious anti-drone deployments.

Conclusion and Future Perspectives

This research has successfully established a robust, all-weather drone detection system by synergistically integrating short-wave infrared imaging with a ZYNQ-based heterogeneous computing platform. The system directly addresses the triple challenge of reliable complex-environment imaging, real-time algorithm acceleration, and low-power deployment—core requirements for practical anti-drone systems. The combination of a domestic InGaAs SWIR detector and a JESD204B high-speed interface provides exceptional day-night performance at a cost point significantly lower than thermal imaging solutions and with capabilities far exceeding visible light systems.

The key innovative breakthroughs are threefold. First, our FPGA-implemented image processing pipeline, featuring proprietary NUC and dynamic histogram equalization algorithms, substantially enhances image dynamic range and quality. Second, the FPGA-accelerated, lightweight deployment of an optimized YOLOv5 network achieves a compelling balance, maintaining an 82% detection accuracy while delivering an 18 fps inference speed suitable for real-time anti-drone response. Third, the high-precision active temperature stabilization system ensures detector performance consistency across a wide operational temperature range.

This system provides a high-performance, cost-effective solution for the growing low-altitude security sector. Looking forward, research will focus on enhancing system capabilities through multi-modal sensor fusion (e.g., combining SWIR with passive RF sensing), optimizing edge-computing collaboration within networked anti-drone units, and exploring energy-harvesting or low-power designs for deployment in remote areas lacking infrastructure. These advancements will further solidify the system’s applicability in complex electromagnetic environments and extended field operations, contributing a versatile and powerful tool to the global anti-drone technology arsenal.