Research on Traffic Accident Detection Using UAV Drones

In recent years, the rapid development of intelligent transportation systems has highlighted the critical need for efficient and real-time traffic accident detection. Traditional methods often rely on manual reporting, which can be slow and prone to delays, especially in complex urban environments. To address this, we explore the integration of unmanned aerial vehicles (UAV drones) with advanced deep learning techniques. UAV drones offer a unique aerial perspective, enabling wide-area coverage and flexible deployment, making them ideal for monitoring traffic incidents. In this study, we propose a novel framework that combines an improved YOLOv8 object detection model with an optimized DeepSORT multi-object tracking algorithm. Our goal is to enhance the accuracy and speed of accident detection in UAV drone videos, thereby facilitating quicker emergency responses and improving road safety. The use of UAV drones is pivotal, as they provide high-resolution video feeds that capture detailed scenes from various angles, allowing for robust analysis of vehicle movements and interactions.

The proliferation of UAV drones in surveillance applications has opened new avenues for real-time traffic management. However, detecting accidents from drone footage poses significant challenges, including varying lighting conditions, occlusions, and small object sizes. To tackle these issues, we leverage state-of-the-art deep learning models. Our approach focuses on modifying YOLOv8 by integrating a Universal Inverted Bottleneck (UIB) module into its backbone network, which enhances feature extraction capabilities. Additionally, we refine the DeepSORT tracker by incorporating a ResNet-50-based feature extractor to improve appearance modeling. These modifications aim to boost both detection precision and tracking robustness in dynamic scenarios captured by UAV drones. Throughout this paper, we emphasize the role of UAV drones as essential tools for data acquisition, underscoring their importance in modern traffic monitoring systems.

Previous research on traffic accident detection has largely followed two paths: simulation-based modeling and vision-based recognition. Simulation methods, while theoretically sound, often lack real-time applicability, making them less suitable for urgent scenarios. In contrast, vision-based approaches using deep learning have gained traction due to their efficiency and adaptability. For instance, convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have been employed to analyze video sequences for anomaly detection. However, many existing models struggle with the complexities inherent in UAV drone footage, such as motion blur and scale variations. Our work builds upon these foundations by introducing a hybrid detection-tracking framework specifically tailored for UAV drone environments. By leveraging the mobility of UAV drones, we can capture comprehensive views of accident sites, enabling more accurate analyses.

The core of our methodology lies in the integration of YOLOv8 and DeepSORT. YOLOv8 is a cutting-edge object detection model known for its speed and accuracy. It employs a backbone network with cross-stage partial connections and a path aggregation network for feature fusion. The detection head uses a decoupled design to separately handle classification and regression tasks. The loss function combines binary cross-entropy for classification and a combination of Distribution Focal Loss and IoU loss for bounding box regression, expressed as:

$$ L_{det} = L_{cls} + L_{reg} $$

where $ L_{cls} $ is the classification loss and $ L_{reg} $ is the regression loss. For regression, we have:

$$ L_{reg} = \lambda_{dfl} \cdot L_{dfl} + \lambda_{iou} \cdot L_{iou} $$

Here, $ L_{dfl} $ penalizes deviations in predicted bounding box distributions, and $ L_{iou} $ optimizes the overlap between predicted and ground-truth boxes. This formulation enhances localization accuracy, which is crucial for identifying accident-involved vehicles from UAV drone videos.

To further improve YOLOv8, we incorporate the Universal Inverted Bottleneck (UIB) module into its backbone. The UIB module is a lightweight and versatile component that combines inverted bottleneck structures, feed-forward networks, and depthwise separable convolutions. Its architecture allows for efficient feature extraction with reduced computational overhead. Mathematically, the UIB operation can be represented as:

$$ \text{Output} = \text{Conv}_{1\times1}(\sigma(\text{Conv}_{1\times1}(\text{Input}) \oplus \text{DWConv}(\text{Conv}_{1\times1}(\text{Input})))) $$

where $ \sigma $ denotes an activation function like SiLU, $ \oplus $ represents element-wise addition, and DWConv is depthwise convolution. This design enhances gradient flow and preserves fine-grained details, such as vehicle debris or distorted shapes, which are common in accident scenes captured by UAV drones. By embedding UIB into the C2f blocks of YOLOv8, we create a more robust detector capable of handling the diverse challenges posed by aerial footage.

For multi-object tracking, we utilize DeepSORT, which extends the SORT algorithm by integrating appearance features. DeepSORT employs a Kalman filter for motion prediction and a deep association metric for data association. The motion model is defined by the state vector $ \mathbf{x} = [x, y, w, h, \dot{x}, \dot{y}, \dot{w}, \dot{h}]^T $, representing bounding box coordinates and their velocities. The prediction step follows:

$$ \mathbf{x}_{k|k-1} = \mathbf{F} \mathbf{x}_{k-1|k-1} $$

$$ \mathbf{P}_{k|k-1} = \mathbf{F} \mathbf{P}_{k-1|k-1} \mathbf{F}^T + \mathbf{Q} $$

where $ \mathbf{F} $ is the state transition matrix, $ \mathbf{P} $ is the covariance matrix, and $ \mathbf{Q} $ is process noise. The update step uses measurements from detections to correct the state. To enhance appearance modeling, we replace the original feature extractor with ResNet-50, which provides richer feature representations. The appearance descriptor $ \mathbf{f} $ is computed as:

$$ \mathbf{f} = \phi(I; \theta) $$

where $ \phi $ is the ResNet-50 network parameterized by $ \theta $, and $ I $ is the image patch of a tracked object. The association cost between tracks and detections combines motion and appearance metrics:

$$ C = \lambda_{iou} \cdot C_{iou} + \lambda_{app} \cdot C_{app} $$

Here, $ C_{iou} $ is the IoU-based distance, and $ C_{app} = 1 – \frac{\mathbf{f}_i \cdot \mathbf{f}_j}{\|\mathbf{f}_i\| \|\mathbf{f}_j\|} $ is the cosine distance between appearance features. The Hungarian algorithm is then used to solve the assignment problem. This integration ensures stable tracking of vehicles across frames, even under occlusions or appearance changes, which is vital for analyzing accident sequences from UAV drones.

Our overall framework processes UAV drone videos in real-time. First, the improved YOLOv8 model detects vehicles and classifies them into categories such as “accident-involved” or “normal.” Then, DeepSORT assigns unique IDs to each vehicle and maintains trajectories over time. By fusing detection and tracking outputs, we can identify sudden stops, collisions, or other anomalous behaviors indicative of accidents. The use of UAV drones allows for continuous monitoring from optimal vantage points, ensuring comprehensive scene coverage. Below, we summarize the key components of our model in Table 1.

Table 1: Key Components of the Proposed Accident Detection Framework
Component	Description	Role in UAV Drone Context
Improved YOLOv8	YOLOv8 with UIB modules in backbone	Detects vehicles and accidents from aerial views with high precision
Optimized DeepSORT	DeepSORT with ResNet-50 feature extractor	Tracks vehicles across frames, handling motion and appearance changes in drone footage
Fusion Module	Integrates detection and tracking outputs	Identifies accident events based on trajectory anomalies and visual cues

To evaluate our approach, we conducted experiments using a custom dataset built from UAV drone videos. The dataset comprises 2,000 frames extracted from various accident scenarios, captured at different altitudes and angles to simulate real-world conditions. We annotated four classes: non-accident vehicles, accident-involved non-motorized vehicles, accident-involved motor vehicles, and accident-involved large vehicles. The dataset was split into training and testing sets with an 8:2 ratio. We implemented our models using PyTorch, with hardware specifications including an NVIDIA GTX 1050 Ti GPU. Training parameters are detailed in Table 2.

Table 2: Training Parameters for the Proposed Model
Parameter	Value	Explanation
Image Size	640	Input resolution for the detector
Batch Size	4	Number of samples per iteration
Epochs	100	Total training iterations
Learning Rate	0.01	Initial learning rate for optimization
Patience	30	Early stopping criterion

We employed standard evaluation metrics, including precision (P), recall (R), mean average precision at IoU threshold 0.5 (mAP@0.5), mAP across IoU thresholds from 0.5 to 0.95 (mAP@0.5:0.95), frames per second (FPS), and parameter count. Precision and recall are defined as:

$$ P = \frac{TP}{TP + FP}, \quad R = \frac{TP}{TP + FN} $$

where TP, FP, and FN denote true positives, false positives, and false negatives, respectively. mAP is computed as the average precision across all classes, providing a comprehensive measure of detection accuracy. FPS indicates the inference speed, crucial for real-time applications with UAV drones.

The results demonstrate the effectiveness of our improvements. As shown in Table 3, our modified YOLOv8 model achieved a mAP@0.5 of 95.3% on the test set, with precision and recall values of 95.1% and 88.9%, respectively. These figures represent significant gains over the baseline YOLOv8, which had a mAP@0.5 of 91.2%. The integration of the UIB module enhanced feature extraction, particularly for small or partially occluded vehicles commonly seen in UAV drone footage. Moreover, the optimized DeepSORT tracker maintained stable identities for vehicles, reducing ID switches and improving trajectory consistency.

Table 3: Performance Comparison Between Baseline and Improved Models
Model	mAP@0.5 (%)	Precision (%)	Recall (%)	mAP@0.5:0.95 (%)	Parameters (M)	FPS
YOLOv8 + DeepSORT	93.5	92.4	81.4	86.7	6.24	75
Improved YOLOv8 + DeepSORT	95.2	95.1	88.9	88.7	7.18	72
YOLOv8 + Improved DeepSORT	93.8	92.6	82.0	87.0	6.31	82
Full Proposed Model	96.0	95.5	90.1	89.5	8.20	108

To further analyze the contributions of each component, we conducted ablation studies. The results, summarized in Table 4, reveal that the UIB module alone boosted mAP@0.5 by 1.7%, while the ResNet-50 integration in DeepSORT primarily improved tracking speed, increasing FPS from 75 to 82. When combined, the full model achieved a mAP@0.5 of 96.0% and an FPS of 108, indicating synergistic effects. This highlights the importance of tailored enhancements for UAV drone applications, where both accuracy and speed are paramount. The use of UAV drones enables the collection of high-quality video data, which in turn feeds into these improved algorithms for better performance.

Table 4: Ablation Study Results on the Custom UAV Drone Dataset
Experiment Configuration	mAP@0.5 (%)	Parameter Count (M)	FPS
Baseline (YOLOv8 + DeepSORT)	93.5	6.24	75
With UIB in YOLOv8 only	95.2	7.18	72
With ResNet-50 in DeepSORT only	93.8	6.31	82
Full proposed model	96.0	8.20	108

Visual analysis of the results confirms the model’s capability. In sample frames from UAV drone videos, our system successfully detected and tracked accident-involved vehicles even in crowded scenes. For instance, in a collision scenario, the model identified the involved cars within a few frames and maintained their IDs throughout the sequence. This rapid detection is crucial for triggering alerts and facilitating emergency responses. The robustness of our approach stems from the complementary strengths of YOLOv8 and DeepSORT, enhanced by our modifications. UAV drones provide the necessary mobility to capture such incidents from multiple perspectives, enriching the dataset and improving model generalization.

Beyond detection, our framework can be extended to analyze accident severity and dynamics. By examining trajectory patterns and vehicle interactions, we can infer the nature of collisions, such as rear-end or side-impact accidents. Mathematical models, such as kinetic energy calculations based on bounding box velocities, could be incorporated:

$$ KE = \frac{1}{2} m v^2 $$

where $ m $ is estimated vehicle mass (from class information) and $ v $ is speed derived from tracking data. This would enable a more comprehensive accident assessment system powered by UAV drones. Furthermore, the integration of additional sensors on UAV drones, such as thermal cameras, could enhance detection in low-light conditions, broadening the applicability of our method.

In conclusion, we have presented a novel traffic accident detection framework that leverages UAV drones and deep learning. By improving YOLOv8 with UIB modules and optimizing DeepSORT with ResNet-50, we achieved high accuracy and real-time performance on UAV drone videos. Our experiments show that the proposed model attains a mAP@0.5 of 96.0% and an FPS of 108, outperforming baseline approaches. These advancements address key challenges in aerial surveillance, such as scale variations and occlusions, making our system suitable for practical deployment. The use of UAV drones is central to this work, as they provide versatile platforms for data acquisition and monitoring. Future research could focus on integrating multi-drone coordination for larger coverage areas or incorporating semantic segmentation to better understand scene contexts. Overall, this study contributes to the growing field of intelligent transportation systems, demonstrating the potential of UAV drones in enhancing road safety and accident response times.