An Anti-UAV System Based on Short-Wave Infrared Imaging and Heterogeneous Computing

In current operational scenarios, the proliferation of unmanned aerial vehicles (UAVs) presents a significant dual-use challenge. While offering immense benefits across numerous sectors, incidents involving the non-compliant or “rogue” operation of drones occur frequently, posing substantial threats to national security, critical infrastructure, and public safety. Traditional detection systems exhibit pronounced limitations in addressing these modern challenges. Radar systems often suffer from low detection probabilities for small, low-speed, low-altitude targets like UAVs, typically below 40%, and are highly susceptible to false alarms caused by urban clutter like buildings and trees. Visible-light cameras are severely hampered by low-light conditions at night or under poor visibility, with effective detection ranges often limited to 200 meters, and their performance degrades drastically under strong backlight, hurting target identification accuracy. Although thermal imaging systems offer all-weather capability, their core components, especially mid- or long-wave infrared detectors, remain prohibitively expensive, hindering large-scale, cost-effective deployment. These inherent shortcomings underscore the pressing need for a new technological paradigm that fulfills the core requirements of all-weather operation, high precision, and affordability for modern anti-UAV solutions.

To address these industry-wide challenges, our research proposes a comprehensive technological innovation framework centered on a Short-Wave Infrared (SWIR) intelligent detection system built on a heterogeneous computing architecture. This system introduces a triple-layered breakthrough. At the optical perception layer, we innovatively adopt the SWIR band (0.9-1.7 μm), which resides within an atmospheric transmission window. This band offers superior penetration through haze, smoke, and light fog compared to visible light, enhancing transmissivity in such conditions by approximately 60%. Furthermore, the dark current density of the indium gallium arsenide (InGaAs) detector used is significantly lower, markedly improving detection capability in low-light environments. The advantages of SWIR over visible light for UAV detection are summarized in the following table:

Comparison Metric	Short-Wave Infrared Band	Visible Light Band
Illumination Dependency	Low dependency on ambient light; minimally affected by lighting conditions.	Performance degrades severely at night or in low-light scenarios.
Penetration Capability	Can penetrate haze, smoke, and dust; performs robustly in various adverse weather conditions.	Performance is significantly degraded by fog, smoke, and other atmospheric particulates.
Camouflage Identification	Capable of identifying UAVs employing visual camouflage techniques.	Largely ineffective against UAVs with visual camouflage.
Dynamic Range	Wider dynamic range preserves details in high-contrast scenes (e.g., bright sun and deep shadows).	Limited dynamic range struggles with extreme illumination contrasts, leading to loss of detail.

At the hardware architecture level, we constructed a multi-layer processing system. Utilizing a domestically developed InGaAs focal plane array (FPA), we ensure stable, high-quality imaging by integrating it with a precision temperature control module. A self-designed high-speed transmission circuit based on the JESD204B protocol facilitates reliable image capture and data transfer. On the software and algorithm front, we developed a suite of image preprocessing algorithms implemented on the FPGA, including non-uniformity correction (NUC), bad pixel detection and compensation, pseudo-color processing, image enhancement, and high dynamic range (HDR) imaging, collectively improving image quality and usable dynamic range. The core target recognition module employs a channel-optimized YOLOv5s network. Through operator fusion and 8-bit fixed-point quantization, the network is accelerated via parallel processing on the FPGA. Deployed on a ZYNQ7100 platform, the system achieves a real-time detection performance of 17 frames per second (FPS), representing a substantial improvement in both recognition speed and energy efficiency compared to traditional approaches. Practical field tests have validated an average recognition accuracy of 80% for UAVs against complex backgrounds. The system is complemented by a dual-mode human-machine interface supporting both local touch control and remote monitoring, facilitating flexible operation. This integrated solution provides a high-performance, cost-effective tool for low-altitude airspace security, holding significant value for both civilian and military anti-UAV applications.

System Architecture and Hardware Design

The proposed system employs a hierarchical hardware architecture designed for modularity and efficiency. The core is a heterogeneous computing platform centered on a Xilinx ZYNQ-7100 system-on-chip (SoC), which tightly couples a dual-core ARM processing system (PS) with a field-programmable gate array (PL) fabric. The hardware is divided into a carrier motherboard and a sensor expansion board, working in synergy.

The expansion board is dedicated to optoelectronic signal acquisition and conditioning. It hosts a 640×512 pixel InGaAs FPA detector with a spectral response from 0.9 to 1.7 μm. The detector’s analog outputs are digitized by two high-speed CBM96AD56-125 analog-to-digital converters (ADCs), operating under the JESD204B serial interface protocol to achieve a stable data throughput supporting 300 frames per second. A critical component for imaging stability is the integrated precision temperature control system. Driven by an ADN8835 controller, it manages a Thermoelectric Cooler (TEC) attached to the FPA, maintaining its operational temperature within a tight range to minimize dark current noise and ensure consistent responsivity. The timing for the entire data acquisition chain is precisely orchestrated by an LMK04828 clock generation and distribution chip.

The carrier motherboard serves as the central processing and control hub. Its power system, based on the LMZ31710RVQR, provides clean, regulated power to the various voltage domains of the ZYNQ and peripheral circuits. The processing platform is supported by a multi-tiered memory subsystem: 4x MT41K256M16 DDR3 chips forming a 64-bit wide bus for high-bandwidth data buffering, alongside QSPI Flash for boot configuration, eMMC, and an SD card slot for application storage and data logging. A comprehensive set of communication interfaces enables system connectivity and output:

Dual Gigabit Ethernet: Implemented using RTL8211F-CG PHY chips, one on the PL side for high-speed, low-latency UDP video streaming, and one on the PS side for TCP-based command/control and secondary video feed.
Camera Link Interface: Provides a standard industrial-grade video output for professional imaging equipment.
RGB LCD Interface: Enables local real-time video monitoring and a touch-based interactive user interface.

The overall data flow begins with the SWIR detector capturing scene radiance. The analog signals are converted, serialized via JESD204B, and streamed into the PL of the ZYNQ. Here, the raw data is processed before being routed to various outputs (Ethernet, Camera Link, LCD) or passed to the target detection pipeline. This hardware architecture is designed to balance performance, power consumption, and cost, forming a robust foundation for the anti-UAV detection tasks.

Software Framework and Algorithm Implementation

The software and algorithmic pipeline is architected for real-time performance, leveraging the heterogeneous nature of the ZYNQ platform. The workflow is modular, spanning from low-level driver operations to high-level neural network inference and user interaction.

Image Acquisition and Preprocessing on FPGA (PL)

Initial processing occurs in the PL fabric for maximum speed and determinism. A custom detector driver and JESD204B protocol core manage the high-speed data ingress. The raw image data then passes through a series of optimized preprocessing stages:

Non-Uniformity Correction (NUC): Compensates for pixel-to-pixel sensitivity variations in the FPA. A two-point correction method is employed. Calibration coefficients are generated at two different blackbody temperatures and stored in block RAM (BRAM). For each incoming pixel at coordinate (i,j) with raw value $V_{raw}(i,j)$, the corrected value $V_{corr}(i,j)$ is computed in real-time as:
$$V_{corr}(i,j) = G(i,j) \cdot V_{raw}(i,j) + O(i,j)$$
where $G(i,j)$ and $O(i,j)$ are the pre-calculated gain and offset coefficients, respectively.
Bad Pixel Detection and Compensation: A real-time algorithm utilizing a three-line buffer architecture identifies defective pixels. A median filter is applied to the neighborhood of each pixel. If the deviation between the pixel value and the median exceeds a defined threshold, it is flagged as “bad” and replaced by the median value from its valid neighbors.
Image Enhancement & Dynamic Range Management: An improved histogram equalization algorithm is implemented using dual-port BRAM for rapid histogram calculation and cumulative distribution function (CDF) generation. This enhances contrast without losing detail in critical areas. For scenes with extreme contrast, a High Dynamic Range (HDR) fusion algorithm is available. It constructs a Gaussian-Laplacian pyramid to blend information from multiple exposures, synthesizing an image with detail in both shadows and highlights. The Laplacian pyramid $L_l$ at level $l$ for an image $G_l$ from the Gaussian pyramid is given by:
$$L_l = G_l – \text{UP}(G_{l+1}) \odot \text{Gaussian}$$
where $\text{UP}()$ denotes upsampling. The final HDR image is reconstructed by blending these pyramids from different exposures.
Pseudo-Color Processing: To facilitate human observation, the monochromatic SWIR image is mapped to a color palette. After dynamic range compression, the intensity values are mapped to RGB channels via a lookup table (LUT) implemented in the PS, and the resulting data is formatted for display.

Target Detection with Optimized YOLOv5

The core of the anti-UAV intelligence is a deeply optimized YOLOv5s network deployed on the heterogeneous ZYNQ platform. The model underwent several key modifications for embedded deployment:

Activation Function Replacement: The SiLU (Swish) activation function was replaced with Leaky ReLU to reduce computational complexity without significant accuracy loss.
$$\text{LeakyReLU}(x) = \begin{cases} x, & \text{if } x \geq 0 \\ \alpha x, & \text{if } x < 0 \end{cases}$$
where $\alpha$ is a small positive constant (e.g., 0.1).
Quantization: The model weights and activations were quantized to 8-bit fixed-point precision, drastically reducing model size and memory bandwidth requirements while maintaining acceptable accuracy.
Hardware Acceleration: The computationally intensive convolutional layers are offloaded to a custom accelerator engine in the PL. This engine features 1,024 parallel multiply-accumulate (MAC) units organized in a systolic array, coupled with a five-stage pipeline to maximize throughput and DSP efficiency. Weights and feature maps are streamed through this architecture.
PS-PL Collaboration: A cooperative workflow is established. The PL accelerator performs the bulk of the forward pass (convolution, pooling). The PS (ARM cores) running a lightweight Linux OS handles tasks like network layer scheduling, loading parameters into the accelerator, post-processing of the detector outputs (applying anchor boxes, scaling), and running the Non-Maximum Suppression (NMS) algorithm to filter overlapping bounding boxes. A double-buffering mechanism is used for data transfer between PS and PL to hide latency.

The end-to-end detection pipeline, from quantized weight preprocessing to final bounding box output, is thus efficiently split across the two halves of the ZYNQ, achieving a balance between high performance and flexibility.

Human-Machine Interface (HMI)

A dual-platform HMI strategy ensures versatile system control. On the embedded hardware itself, a touch-enabled graphical interface is developed for the RGB LCD. It displays the real-time SWIR video feed, overlays detection bounding boxes and alerts, and provides toggle controls for algorithms (e.g., enhancement on/off) as well as parameter adjustment sliders. For remote operation, a PC-based client software communicates with the system via TCP/IP. This client can receive the video stream, send configuration and control commands, and record video or snapshots for evidentiary purposes. Both interfaces are built with a modular, state-machine-based design for reliability.

System Testing and Performance Analysis

The integrated system was rigorously tested for its core functionalities: SWIR image acquisition, real-time processing, and UAV target detection. Performance was evaluated under various environmental conditions.

First, the image enhancement and pseudo-color functions were validated, confirming their ability to improve visual clarity and operational interpretation of the SWIR feed.

Detection performance was quantitatively assessed in two challenging scenarios:

Overcast/Low-Light Condition: Testing under cloudy skies with no physical obstructions. Analysis of 5,000 consecutively processed frames yielded an average UAV detection rate of 97% at an average throughput of 17 FPS.
Complex Background with Occlusion: Testing in cluttered environments with partial visual obstructions (e.g., trees, structures). Under these more demanding conditions, analysis of another 5,000-frame sequence maintained an average detection rate of 80% at 16 FPS, demonstrating robust performance.

A comparative analysis against other FPGA-based implementations reported in literature highlights the efficiency and balanced performance of our approach. Key metrics include detection accuracy (Precision/Recall), frame rate (FPS), hardware resource utilization (DSP slices), and derived metrics like energy efficiency (GOPS/W) and DSP computational efficiency.

Benchmark Metric	Reference Design A	Reference Design B	Reference Design C	Our System
FPGA Platform	ZCU104	ZYNQ7020	ZYNQ7045	ZYNQ7100
Network Model	YOLOv5s	YOLOv3-tiny	YOLOv2-tiny	YOLOv5-optimized
Clock Frequency	200 MHz	200 MHz	200 MHz	200 MHz
Precision (bits)	8	8	8	8
Detection Accuracy	87%	72%	77%	82%
DSP Slice Usage	N/A	N/A	610 (67.8%)	1024 (50.6%)
Energy Efficiency (GOPS/W)	31.3	18.18	25.34	26.90
Frame Rate (FPS)	220	11	43.7	18
DSP Efficiency	N/A	N/A	95.2%	96.29%

The analysis reveals that our system achieves a favorable balance. While high-end platforms like the ZCU104 (with more abundant resources and dedicated NPU) achieve higher raw FPS, our implementation on the ZYNQ7100 offers a compelling cost-performance-accuracy trade-off specifically tailored for anti-UAV applications. It significantly outperforms designs on smaller platforms (ZYNQ7020/7045) in both accuracy and speed, while maintaining high hardware utilization and energy efficiency. This makes our solution particularly suitable for scalable, deployable anti-UAV systems where cost and power consumption are critical constraints.

Conclusion and Future Outlook

This research has successfully developed and demonstrated a robust all-weather UAV detection system based on SWIR imaging and a ZYNQ heterogeneous computing platform. The system addresses three critical technical challenges: 1) high-quality imaging in complex environments (low-light, haze), 2) real-time execution of advanced vision algorithms, and 3) low-power, cost-effective hardware deployment. By integrating a domestic SWIR detector with a JESD204B high-speed interface and a co-designed FPGA acceleration pipeline, the system delivers reliable day-and-night detection capabilities at a total cost of ownership significantly lower than traditional radar or thermal imaging alternatives.

The key innovations include: 1) Self-developed FPGA image processing algorithms (NUC, enhanced histogram equalization) that substantially improve image dynamic range and usability; 2) A highly optimized YOLOv5 deployment strategy featuring 8-bit quantization and PL-based acceleration, achieving 18 FPS inference speed with maintained accuracy suitable for real-time anti-UAV monitoring; and 3) A high-precision integrated temperature control system that ensures detector performance stability across operational temperatures.

This system provides a viable, high-value solution for the growing anti-UAV domain. Future work will focus on enhancing system capabilities through multi-modal sensor fusion (e.g., combining SWIR with other sensing modalities), optimizing edge computing collaboration in networked deployments, and exploring energy-harvesting or low-power designs for operation in remote, infrastructure-less environments. These advancements will further solidify the system’s applicability in complex electromagnetic and logistical scenarios, contributing to more effective and adaptable solutions for countering unauthorized UAV activities.