An Integrated DSST-KCF Algorithm for Urban Low-Altitude Anti-UAV Automated Image Tracking Systems

The proliferation of unmanned aerial vehicles (UAVs), valued for their compact size, low cost, and operational flexibility, has significantly impacted sectors like smart manufacturing, precision agriculture, and urban management. However, this widespread adoption has been accompanied by a surge in incidents involving unauthorized or malicious UAV flights. These “rogue drones” pose substantial threats to urban security, including collision risks with infrastructure, potential damage from crashes, and privacy violations. Consequently, the development of effective counter-unmanned aerial systems (C-UAS), or anti-UAV technologies, has become a critical focus for safeguarding metropolitan areas. Among various detection methodologies, optoelectronic surveillance offers distinct advantages for urban anti-UAV operations due to its passive nature (non-radiating), ability to provide visual evidence, and relatively lower cost compared to radar systems.

Current optoelectronic anti-UAV solutions primarily rely on image processing algorithms and are typically implemented in two forms: fixed and pan-tilt-zoom (PTZ) systems. Fixed systems utilize algorithms to detect and track targets within a static field of view (FOV). While effective for targets within that FOV, they inherently fail once the target moves beyond it. PTZ systems, equipped with steerable cameras, offer the capability for continuous tracking by dynamically adjusting the camera’s orientation and zoom. The core challenge lies in developing a tracking algorithm that is both highly accurate for small, fast-moving UAVs and computationally efficient for real-time operation, coupled with a robust control strategy for the PTZ mechanism.

To address the precise and rapid tracking challenges of “low, slow, and small” (LSS) UAVs in complex urban low-altitude environments, this paper presents an automatic anti-UAV tracking system. The system’s innovation stems from the integration of an advanced visual tracking algorithm with an intelligent PTZ control policy. The proposed visual core is a fusion of the Discriminative Scale Space Tracking (DSST) algorithm and the Kernelized Correlation Filter (KCF) algorithm. This fusion leverages the computational efficiency of KCF for target localization and the robust multi-scale estimation capability of DSST, creating a tracker that is both real-time capable and adaptive to the changing apparent size of a UAV as it moves relative to the camera. This DSST-KCF algorithm drives a closed-loop control strategy that calculates necessary pan, tilt, and zoom commands to keep the target centered in the image, enabling persistent tracking.

System Architecture and Operational Workflow

The proposed urban low-altitude anti-UAV automated image tracking system is architecturally divided into two primary units: the Dual-spectrum Image Acquisition Unit and the Display & Control Unit.

The Image Acquisition Unit is built around a high-performance PTZ camera assembly. It integrates two co-aligned sensors: a visible-light camera and a long-wave infrared (LWIR) thermal camera. This dual-spectrum design ensures operational capability across diverse lighting and weather conditions; the visible camera provides high-resolution detail during daytime, while the thermal camera enables detection and tracking at night, through light fog, or against cluttered backgrounds based on heat signatures. The PTZ mechanism provides the necessary degrees of freedom for agile pointing.

The Display & Control Unit is implemented on a standard computer workstation. It performs the intensive processing tasks: ingesting the real-time video stream, executing the DSST-KCF tracking algorithm, calculating the requisite PTZ control parameters, and displaying the tracking status and video feed to an operator. The two units communicate via a Gigabit Ethernet connection, ensuring low-latency transmission of high-bitrate video streams and control commands.

The system operates in two primary states: Manual Mode and Automatic Tracking Mode. In Manual Mode, an operator controls the PTZ camera directly. The critical Automatic Tracking Mode is initiated once an operator selects a target (e.g., a UAV) within the video frame. The system then enters a closed-loop tracking process, as illustrated in the following workflow summary:

Step Process Output
1 Frame Capture & Parsing Current image frame I_t
2 Target State Prediction Predicted search region in I_t
3 DSST-KCF Processing Updated target position p_t and scale s_t
4 Control Parameter Calculation Pan angle (Δθ), Tilt angle (Δφ), Zoom factor (Z)
5 Command Dispatch PTZ actuation commands sent via network
6 Loop Continuation Process repeats for frame I_{t+1}

This continuous loop allows the system to follow a moving UAV, keeping it proximate to the center of the FOV for optimal tracking stability.

The Fused DSST-KCF Tracking Algorithm

The effectiveness of the anti-UAV tracker hinges on its core visual algorithm. We propose a fusion of the KCF and DSST algorithms to achieve a balance between speed and scale robustness, which is crucial for tracking UAVs that change apparent size as they move in 3D space.

Algorithm Overview

The fused DSST-KCF algorithm operates via a two-stage process per frame. First, a position filter (from KCF) localizes the target within a search region. Second, a separate scale filter (from DSST) estimates the optimal bounding box size around that location. The updated scale is then fed back to guide the sample extraction for the position filter in the next frame. This synergistic integration enhances performance for small targets undergoing scale variation.

Target Localization with KCF

The KCF algorithm frames tracking as a ridge regression problem. It collects a set of training samples x_i around the current target by exploiting the properties of circulant matrices, which efficiently model all cyclic shifts of a base image patch. Each sample has a corresponding Gaussian function label y_i. The goal is to find a function f(z) = ω^T φ(z) that minimizes the regularized least squares error:
$$ \min_{\omega} \sum_i (f(\mathbf{x}_i) – y_i)^2 + \lambda \|\omega\|^2 $$
where φ(·) denotes a mapping to a high-dimensional space (kernel trick), and λ is a regularization parameter. The power of KCF lies in solving this in the Fourier domain, where the circulant structure leads to a highly efficient closed-form solution for the filter coefficients α:
$$ \boldsymbol{\alpha} = \mathcal{F}^{-1} \left( \frac{\mathcal{F}(\mathbf{y})}{\mathcal{F}(\mathbf{k}^{\mathbf{xx}}) + \lambda} \right) $$
Here, F and F^{-1} denote the Discrete Fourier Transform (DFT) and its inverse, and k^{xx} is the kernel correlation of the base sample. To detect the target in a new frame, a patch z is extracted, and its response map f(z) is computed similarly in the frequency domain:
$$ f(\mathbf{z}) = \mathcal{F}^{-1} \left( \mathcal{F}(\mathbf{k}^{\mathbf{xz}}) \odot \mathcal{F}(\boldsymbol{\alpha}) \right) $$
The position of the maximum value in f(z) indicates the new predicted location of the UAV target for this anti-UAV system.

Scale Estimation with DSST

To adapt to the changing size of the UAV, a one-dimensional discriminative scale filter is applied independently at the location found by the KCF tracker. A scale pyramid is constructed, extracting features at N_s (e.g., 33) scales centered on the target location. The scale filter h is trained to distinguish the correct scale. The optimal filter in the frequency domain, H, and its numerator A and denominator B for the t-th frame are learned and updated online:
$$ \mathbf{A}_t = (1 – \eta) \mathbf{A}_{t-1} + \eta \, \mathcal{F}(\mathbf{g}) \odot \overline{\mathcal{F}(\mathbf{x}_t)} $$
$$ \mathbf{B}_t = (1 – \eta) \mathbf{B}_{t-1} + \eta \, \mathcal{F}(\mathbf{x}_t) \odot \overline{\mathcal{F}(\mathbf{x}_t)} $$
$$ \mathbf{H}_t = \frac{\mathbf{A}_t}{\mathbf{B}_t + \lambda_s} $$
where η is a learning rate, g is the desired Gaussian-shaped response, x_t is the scale feature vector, λ_s is a scale regularization term, and the overbar denotes complex conjugation. The scale response y_s for a test sample z_s is:
$$ y_s = \mathcal{F}^{-1} \left( \frac{\sum \overline{\mathbf{A}} \odot \mathcal{F}(\mathbf{z}_s)}{\mathbf{B} + \lambda_s} \right) $$
The scale corresponding to the maximum of y_s is selected as the new target size, making the anti-UAV tracker robust to target distance changes.

DSST-KCF-PTZ Integrated Control Strategy

Accurate tracking is only half of the anti-UAV solution; the system must also physically keep the camera pointed at the target. We develop a control strategy that translates the image-plane tracking results from DSST-KCF into precise PTZ actuation commands.

FOV Partitioning for Stable Control

To prevent jittery camera movement from frequent, minor corrections, the image FOV is partitioned into three concentric constraint zones. Different control laws are applied based on where the tracked UAV’s centroid lies, balancing reactivity with stability.

Zone Description Purpose
A (High-Speed) Outer perimeter of the FOV. Initiate fast slewing to re-acquire a target that is about to leave the FOV.
B (Medium-Speed) Intermediate annulus. Make moderate corrections to bring the target towards the center.
C (Low-Speed) Inner central region. Apply fine, slow adjustments to maintain precise centering.

Let the image resolution be (W, H). The zone boundaries are defined as multiples of ΔW = W/5 and ΔH = H/5. The centroid coordinates (x, y) of the UAV’s bounding box determine its zone and thus the base pan/tilt velocity V_q (where q ∈ {A, B, C}).

Direction and Velocity Control

The control system determines the pan and tilt direction based on the quadrant of the image where the target centroid resides. For example, if the target is in the top-left quadrant, the command is to pan left and tilt up. The velocity is dynamically adjusted from the base zone velocity V_q by incorporating the target’s apparent pixel motion. If the target’s horizontal pixel displacement between frames is Δx over time ΔT, its apparent pixel velocity is V_pix = Δx / ΔT. The final commanded velocity V_cmd is:
$$ V_{cmd} = V_q + \theta \cdot f \cdot V_{pix} $$
where f is a sign factor (±1) indicating the direction of target movement relative to the FOV center, and θ is a damping factor that smooths the response and minimizes oscillation. This proportional-derivative-like control allows the PTZ to not only correct position error but also anticipate target motion.

Zoom (Magnification) Control

Maintaining an appropriate target size in the image is vital for reliable feature extraction by the DSST-KCF anti-UAV tracker. Let σ = p_target / p_frame be the ratio of the target bounding box area (in pixels) to the total frame area. A desired operating range [σ_min, σ_max] is defined. The zoom control logic is simple yet effective:
$$ \text{Zoom Action} =
\begin{cases}
\text{Zoom In} & \text{if } \sigma < \sigma_{min} \quad \text{(Target too small)} \\
\text{Zoom Out} & \text{if } \sigma > \sigma_{max} \quad \text{(Target too large)} \\
\text{Hold} & \text{otherwise}
\end{cases} $$
This ensures the UAV consistently occupies a sufficient portion of the image for the algorithm to function accurately, without becoming so large that it risks moving out of the now-narrower FOV during rapid maneuvers.

Algorithmic Performance Evaluation

The performance of the fused DSST-KCF algorithm was rigorously evaluated against the baseline KCF algorithm to quantify its contribution to the anti-UAV system’s capability. Testing was conducted using the standard OTB100 benchmark dataset and a proprietary dataset of aerial UAV sequences. Evaluation employed two standard metrics: Precision and Success Rate.

Precision is defined as the percentage of frames where the Euclidean distance between the predicted center and the ground-truth center is below a given threshold (typically 20 pixels). The plot of precision across all thresholds gives the Average Precision.

Success Rate is defined as the percentage of frames where the Intersection-over-Union (IoU) ratio between the predicted bounding box and the ground-truth bounding box exceeds a threshold. Plotting success rate across all IoU thresholds (0 to 1) yields the Success Plot. The Area Under Curve (AUC) of this plot is the Average Success Rate.

The comparative results clearly demonstrate the superiority of the fused approach for anti-UAV tracking tasks. The DSST-KCF algorithm showed an average improvement of 43% in Precision and 31.51% in Success Rate over the standard KCF algorithm. This significant boost is primarily attributable to the integrated scale estimation, which allows the tracker to maintain a tight fit on the target despite changes in distance. The performance gain was especially pronounced in video sequences containing significant scale variation, motion blur, and deformation, which are common when tracking agile UAVs. The following table summarizes the success rate improvement on select challenging sequences:

Video Sequence KCF Success Rate (%) DSST-KCF Success Rate (%) Improvement (%)
Car2 61.61 91.94 +30.33
Coupon 2.52 83.49 +80.97
BlurCar2 24.26 89.34 +65.08
Man 18.69 87.94 +69.25

System Integration and Field Testing

The complete anti-UAV tracking system, integrating the DSST-KCF algorithm with the PTZ control strategy, was deployed and tested in a campus environment. The target was a commercial quadcopter UAV (DJI Phantom 4). Tests evaluated the system’s ability to autonomously acquire and maintain a track on the UAV under various flight patterns.

In a representative test, the UAV was flown at an altitude of 150 meters with a speed of 5 m/s. The system successfully established and maintained a stable track. The control strategy effectively generated pan and tilt commands, resulting in a smooth camera trajectory that kept the UAV near the FOV center as it executed maneuvers. The system demonstrated a maximum effective tracking range of approximately 2.3 kilometers for a UAV of similar size, a substantial increase over the baseline performance of the PTZ unit alone.

The system’s real-time performance is critical for practical anti-UAV operations. Processing time was analyzed over a 1,935-frame tracking sequence. The histogram of per-frame processing times confirmed high efficiency, with the vast majority of frames being processed in under 20 milliseconds, comfortably exceeding the 30 FPS required for real-time video. This efficiency validates the practical feasibility of the fused DSST-KCF algorithm for deployment in fielded systems.

Conclusion

This paper presented a comprehensive solution to the challenge of tracking LSS UAVs in urban settings through an automated image-based anti-UAV system. The core technical contributions are threefold. First, we developed a fused DSST-KCF visual tracking algorithm that combines the high-speed correlation filtering of KCF with the robust multi-scale estimation of DSST. This fusion resulted in a tracker that is both computationally efficient and highly accurate for small targets undergoing scale changes, a common scenario in UAV tracking. Second, we designed an intelligent DSST-KCF-PTZ control strategy that translates image-plane tracking data into stable and anticipatory camera motion, enabling persistent tracking. Third, we integrated these components into a functional dual-spectrum system capable of 24/7 operation.

Experimental results from both benchmark datasets and real-world field tests confirm the system’s effectiveness. The DSST-KCF algorithm significantly outperforms the baseline in both precision and success rate. The integrated system demonstrates long-range tracking capability, high frame-rate processing, and reliable performance across different environmental backgrounds. The proposed system thus provides a viable, effective, and practical technological pathway for enhancing urban low-altitude airspace security against unauthorized UAV activities.

Scroll to Top