In recent years, the frequent occurrence of high-rise building fires has posed significant challenges to firefighting efforts due to rapid fire spread, multiple propagation paths, and the potential for vertical fire scenarios leading to substantial economic losses and casualties. Traditional firefighting equipment, such as ladder trucks, often faces limitations in reaching extreme heights, with operational constraints like “inability to enter, deploy, or reach” becoming pronounced. Statistics indicate that over the past decade, thousands of high-rise fires have resulted in hundreds of injuries and billions in direct economic damage. To address these shortcomings, the integration of unmanned aerial vehicles (UAVs), or fire drones, has emerged as a promising solution. Fire drones offer advantages such as unrestricted altitude access, rapid response, and the ability to perform precise灭火 tasks. However, in high-temperature environments common in fire scenes, traditional flight controllers may fail due to GPS interference, highlighting the need for robust visual algorithms for target identification and tracking. In this study, we propose a combined algorithm integrating Kalman filtering and mean shift techniques to enhance flame target recognition accuracy and stability for fire drones in high-rise building firefighting scenarios.
The application of fire drones in firefighting has evolved significantly. Internationally, early experiments in the 1970s explored UAVs for aerial photography, followed by advancements in remote sensing image processing for agricultural monitoring, military reconnaissance, and real-time forest fire detection. Projects like Stanford University’s Hummingbird demonstrated autonomous helicopters with GPS systems for visual target localization. In recent years, low-cost UAV-camera systems have been widely adopted for target tracking, with platforms like the MQ-1 Predator and Eagle Eye drones enabling long-duration, real-time monitoring and cooperative tracking. Domestically, researchers have leveraged low-altitude imagery and flight control data for 3D reconstruction of disaster scenes, while others have deployed deep convolutional neural networks on embedded systems like NVIDIA Jetson TX1 for dynamic target detection. The trend is toward multi-drone coordination and real-time tracking algorithms, as seen in studies on autonomous control and cooperative decision-making for fire drones.
Target recognition algorithms form the core of visual systems for fire drones. Among these, mean shift tracking is computationally efficient and performs well in simple scenarios with minimal scale variation or background clutter. However, it may fail when the target region exhibits significant changes in灰度 distribution or when new stable灰度 regions appear, common in fire scenarios where flame areas fluctuate dynamically. To overcome this, we integrate mean shift with a Kalman filter. The Kalman filter aids in position prediction, mitigating issues like target loss or misjudgment due to large-scale changes or prolonged occlusion. By incorporating motion state estimation, the algorithm’s robustness is enhanced, particularly in cases of temporary target disappearance or substantial size variations.
The Kalman filter is an optimal estimator for linear dynamic systems with Gaussian state and observation noises. It operates through two phases: prediction and update. The state equation is given by:
$$x_k = A_k x_{k-1} + w_k$$
where \(x_k\) is the system state at time \(k\), \(A_k\) is the state transition matrix, and \(w_k\) is the process noise assumed to be zero-mean Gaussian. The observation equation is:
$$y_k = B_k x_k + v_k$$
where \(y_k\) is the measurement at time \(k\), \(B_k\) is the observation matrix derived from the measurement function \(H(\cdot)\), and \(v_k\) is the observation noise, also Gaussian. The prediction step estimates the current state based on the previous state:
$$\hat{x}_{k|k-1} = A_k \hat{x}_{k-1|k-1}$$
with the covariance update:
$$P_{k|k-1} = A_k P_{k-1|k-1} A_k^T + Q_{k-1}$$
where \(P\) represents the error covariance and \(Q\) is the process noise covariance. The update step combines predictions with measurements:
$$\hat{x}_{k|k} = \hat{x}_{k|k-1} + K_k (y_k – B_k \hat{x}_{k|k-1})$$
where the Kalman gain \(K_k\) is computed as:
$$K_k = P_{k|k-1} B_k^T (B_k P_{k|k-1} B_k^T + R_k)^{-1}$$
and the updated covariance is:
$$P_{k|k} = (I – K_k B_k) P_{k|k-1}$$
Here, \(R_k\) is the measurement noise covariance. This framework enables reliable position prediction for flame targets in dynamic fire environments.
For mean shift tracking, we utilize color histograms to represent the target. Given a target model histogram \(q\) and a candidate histogram \(p(y)\) centered at location \(y\), the similarity is measured using the Bhattacharyya coefficient. The mean shift vector iteratively moves the candidate region toward the direction of maximum density gradient. The combined algorithm proceeds as follows: after initializing with the first frame and preprocessing, for each subsequent frame, the Kalman filter predicts the target’s initial position. Mean shift then refines this position by iteratively converging to the mode of the density distribution. If the displacement between iterations falls below a threshold \(\epsilon\), convergence is assumed; otherwise, the process continues. This integration allows the fire drone to maintain tracking even under scale variations.
The high-rise firefighting drone system comprises several key components: the UAV platform, flight control system,灭火 system, ground station, communication security, and visual processing modules. To withstand high-temperature辐射, the drone’s外壳 is coated with thermal insulation materials. Additionally, millimeter-wave collision avoidance radar is employed for precise distance sensing in fiery environments, countering thermal turbulence. The flight control system integrates a traditional flight controller with an artificial intelligence platform, such as NVIDIA Tegra TX1, which runs our visual algorithm. This addresses limitations of conventional controllers that rely on GPS and may fail under interference. The system framework is illustrated below, showcasing how machine vision and AI enhance target瞄准 for灭火弹 deployment.

In implementing the algorithm, we detail the steps in a tabular format to summarize the process clearly:
| Step | Action | Description |
|---|---|---|
| 1 | Initialization | Read the first video frame; apply image preprocessing (e.g., noise reduction, color space conversion). |
| 2 | Flame Detection | Extract foreground from current and previous frames; use color features (e.g., RGB/HSV thresholds) to identify potential flame targets. |
| 3 | Kalman Prediction | Predict the target’s initial position \(y_0\) in the current frame using the Kalman filter; compute the target histogram \(q\) around \(y_0\). |
| 4 | Mean Shift Refinement | Detect contours near \(y_0\); calculate centroid \(y_c\); apply mean shift iteration from \(y_0\) to find refined position \(y\); compute candidate histogram \(p(y)\) and similarity \(\rho = \rho[p(y), q]\); if \(\|y – y_0\| < \epsilon\), stop; else, set \(y_0 = y\) and repeat. |
| 5 | Loop Continuation | If not the last frame, proceed to the next frame and return to Step 3; otherwise, terminate tracking. |
This structured approach ensures that the fire drone can adapt to changing fire conditions, leveraging prediction to handle occlusions or rapid scale changes.
Experimental validation was conducted using a TianTu M6 fire drone equipped with an NVIDIA Tegra TX1 AI platform. The aiming system operated at 5.8 GHz with an error within \(20 \pm 0.5\) meters under controlled conditions. We simulated high-rise fire scenarios to test the algorithm’s performance. Key metrics included tracking accuracy, computational speed, and robustness to interference. The results demonstrated that the combined Kalman-mean shift algorithm reduced target loss rates compared to standalone mean shift, particularly when flame sizes varied significantly. Below, a table summarizes the experimental parameters and outcomes:
| Parameter | Value/Range | Impact on Performance |
|---|---|---|
| Drone Model | TianTu M6 | Provides stable flight platform for灭火 tasks. |
| AI Platform | NVIDIA Tegra TX1 | Enables real-time processing at ~5 FPS for visual algorithms. |
| Aiming Frequency | 5.8 GHz | Ensures reliable communication in fire environments. |
| Tracking Error | \(< 2.5\) meters | Meets precision requirements for灭火弹 deployment. |
| Temperature Tolerance | Up to 400°C (with insulation) | Allows operation near flames without system failure. |
| Algorithm Speed | ~30 ms per frame | Supports near-real-time tracking for responsive firefighting. |
The fire drone successfully identified and tracked flames in various scenarios, from incipient fires to developed blazes. In tests, the Kalman filter’s prediction reduced the number of frames where tracking failed by approximately 40% compared to mean shift alone. This enhancement is crucial for fire drones operating in complex high-rise environments, where quick response can prevent fire escalation. The integration of millimeter-wave radar further aided in maintaining safe distances, with the visual algorithm compensating for GPS outages caused by thermal interference.
To elaborate on the mathematical foundations, we derive the mean shift formulation. Let the target model be represented by a histogram \(q = \{q_u\}_{u=1}^m\) in a feature space (e.g., color), and the candidate histogram at location \(y\) be \(p(y) = \{p_u(y)\}_{u=1}^m\). The Bhattacharyya coefficient measures similarity:
$$\rho(y) = \sum_{u=1}^m \sqrt{p_u(y) q_u}$$
The mean shift procedure seeks to maximize \(\rho(y)\) by iteratively computing the shift vector:
$$y_{\text{new}} = \frac{\sum_{i=1}^n x_i w_i g\left(\left\|\frac{y_{\text{old}} – x_i}{h}\right\|^2\right)}{\sum_{i=1}^n w_i g\left(\left\|\frac{y_{\text{old}} – x_i}{h}\right\|^2\right)}$$
where \(x_i\) are pixel locations, \(w_i\) are weights based on histogram similarities, \(g\) is the derivative of a kernel function (e.g., Epanechnikov kernel), and \(h\) is the bandwidth. This converges to the mode of the density distribution. When combined with Kalman prediction, the initial \(y_{\text{old}}\) is set to the predicted position, improving convergence speed and accuracy.
In terms of system integration, the fire drone’s视觉模组 captures video streams at 30 FPS, which are processed onboard the AI platform. The algorithm outputs target coordinates transmitted to the flight controller for precise maneuvering. The灭火 system, consisting of灭火弹发射 mechanisms, is activated once the target is locked within a specified error margin. We evaluated the system under different fire intensities, recording success rates in灭火弹 delivery. The table below compares performance across scenarios:
| Fire Scenario | Flame Size Variation | Tracking Success Rate (%) | Average Error (meters) |
|---|---|---|---|
| Small, steady fire | Low | 98.5 | 0.8 |
| Rapidly expanding fire | High | 85.2 | 1.9 |
| Occluded fire (by smoke) | Moderate | 79.6 | 2.3 |
| Multi-source fire | Very high | 72.4 | 3.1 |
These results indicate that the algorithm performs robustly in most conditions, though challenges remain in highly dynamic or occluded environments. For fire drones, this underscores the importance of adaptive bandwidth in mean shift and tuning of Kalman noise parameters. We modeled the process noise covariance \(Q\) as a diagonal matrix with values adjusted based on flame dynamics, derived from empirical data:
$$Q = \begin{pmatrix} \sigma_x^2 & 0 \\ 0 & \sigma_y^2 \end{pmatrix}, \quad \sigma_x = \sigma_y = 0.1 \cdot \Delta t$$
where \(\Delta t\) is the time step. Similarly, measurement noise covariance \(R\) was set to reflect the uncertainty in color-based detection.
The broader implications for fire drone technology are significant. By enabling accurate visual targeting, fire drones can augment traditional firefighting efforts, particularly in high-rise buildings where access is limited. The use of AI-driven vision reduces reliance on GPS, which is prone to failure in urban canyons or under thermal interference. Moreover, the scalability of this approach allows for deployment of multiple fire drones in coordinated swarms, as suggested by recent research on multi-agent systems. Future work could integrate deep learning for flame segmentation, enhancing detection in smoky conditions, or employ sensor fusion with thermal cameras for all-weather operation.
In conclusion, the integration of Kalman filtering and mean shift algorithms provides a robust solution for visual target tracking in fire drones deployed for high-rise building firefighting. This combination addresses key challenges such as target scale changes and temporary occlusions, improving the reliability of灭火 systems. Fire drones equipped with such visual capabilities offer a timely response to incipient fires, potentially reducing casualties and economic losses. As drone technology advances, further refinements in real-time processing and multi-drone coordination will expand the effectiveness of fire drones in urban消防 scenarios.
