Quadrotor Drone Face Recognition Platform Based on YOLO

In recent years, the integration of unmanned aerial vehicles (UAVs) with advanced computer vision techniques has opened new frontiers in surveillance, security, and educational applications. Among these, quadrotor drones have gained significant attention due to their agility, stability, and ease of control. As a researcher in automation and control systems, I have designed an experimental platform that combines a quadrotor drone with a deep learning-based face recognition algorithm, specifically the YOLO (You Only Look Once) framework. This platform serves as a comprehensive tool for students to explore drone control, wireless communication, image processing, and artificial intelligence. In this article, I will detail the design, implementation, and testing of this quadrotor drone face recognition system, emphasizing its educational value and technical innovations.

The core of this platform lies in the YOLO algorithm, which revolutionizes object detection by treating it as a regression problem. Unlike traditional methods that involve multiple stages, YOLO processes images in a single pass, making it exceptionally fast and suitable for real-time applications. For our quadrotor drone system, speed and accuracy are paramount, as the drone captures dynamic video feeds that require immediate analysis. The YOLO network architecture consists of 24 convolutional layers for feature extraction and 2 fully connected layers for regression prediction. It takes an input image of resolution 448×448 pixels and outputs a tensor containing bounding box coordinates and class probabilities. The image is divided into an S×S grid, with each cell predicting B bounding boxes and C class probabilities. In our implementation, we use S=7, B=2, and C=20, but for face recognition, we adapt C to the number of individuals in our dataset.

The mathematical formulation of YOLO is crucial for understanding its efficiency. Each bounding box prediction includes five values: the center coordinates (x, y), width (w), height (h), and a confidence score P. The confidence score is defined as:

$$ P = \text{Pr}(\text{object}) \times \text{IoU}_{\text{pred}}^{\text{truth}} $$

Here, $\text{Pr}(\text{object})$ is the probability that the cell contains an object (1 if present, 0 otherwise), and $\text{IoU}_{\text{pred}}^{\text{truth}}$ is the intersection over union between the predicted box and the ground truth. Additionally, each cell predicts conditional class probabilities $\text{Pr}(\text{Class}_i | \text{object})$, representing the likelihood of the object belonging to class i. During testing, the final score for each class is computed as:

$$ \text{Pr}(\text{Class}_i) \times \text{IoU}_{\text{pred}}^{\text{truth}} = \text{Pr}(\text{Class}_i | \text{object}) \times \text{Pr}(\text{object}) \times \text{IoU}_{\text{pred}}^{\text{truth}} $$

This allows the network to output precise detections with associated probabilities. To optimize performance, we apply non-maximum suppression (NMS) to eliminate redundant boxes. For our quadrotor drone application, we fine-tuned YOLO on a custom face dataset, enabling it to detect and recognize human faces in real-time from aerial footage.

The hardware component of our platform centers on a Tello quadrotor drone, a compact UAV developed with support from DJI and Intel. This quadrotor drone is equipped with four brushless motors, a 5V polymer battery, and integrated flight control modules that ensure stable hovering and maneuverability. Key features include a downward-facing positioning system for centimeter-level accuracy at low altitudes and a forward-facing HD camera for image acquisition. The quadrotor drone communicates with a ground station via Wi-Fi, using the User Datagram Protocol (UDP) for low-latency data transmission. The open-source SDK provided by Tello allows us to send control commands and receive video streams programmatically, making it ideal for experimental setups.

The ground station is a portable computer running Ubuntu OS and Python 2.7 scripts. It handles drone control, image processing, and face recognition. To manage the computational demands of YOLO, the computer is outfitted with a high-performance GPU. The software workflow begins with establishing a Wi-Fi connection between the ground station and the quadrotor drone. Once linked, the ground station sends takeoff commands and initiates video streaming. The quadrotor drone captures live footage and transmits it to the ground station, where the YOLO algorithm processes each frame to detect faces. Recognition results are displayed on-screen, and the operator can issue flight commands via keyboard input. This seamless integration allows the quadrotor drone to perform autonomous surveillance tasks, such as identifying individuals in a crowd.

To train the YOLO algorithm for face recognition, we created a custom dataset using the quadrotor drone. We developed a specialized data collection routine that maneuvers the drone in a zigzag pattern within an 80 cm × 80 cm vertical plane, capturing faces from multiple angles and distances. The quadrotor drone starts at a height of 170 cm and a distance of 100 cm from the subject, then moves to 150 cm for additional variety. This method yields rich facial data under varying conditions. We collected 1,996 images from six volunteers and annotated them with bounding boxes and labels using LabelImg. The dataset was split into training and testing sets, with the YOLO network trained over 50,400 iterations to optimize weights. The training process minimized the loss function, which combines localization error and confidence error:

$$ \lambda_{\text{coord}} \sum_{i=0}^{S^2} \sum_{j=0}^{B} \mathbb{1}_{ij}^{\text{obj}} \left[ (x_i – \hat{x}_i)^2 + (y_i – \hat{y}_i)^2 \right] + \lambda_{\text{coord}} \sum_{i=0}^{S^2} \sum_{j=0}^{B} \mathbb{1}_{ij}^{\text{obj}} \left[ (\sqrt{w_i} – \sqrt{\hat{w}_i})^2 + (\sqrt{h_i} – \sqrt{\hat{h}_i})^2 \right] + \sum_{i=0}^{S^2} \sum_{j=0}^{B} \mathbb{1}_{ij}^{\text{obj}} (C_i – \hat{C}_i)^2 + \lambda_{\text{noobj}} \sum_{i=0}^{S^2} \sum_{j=0}^{B} \mathbb{1}_{ij}^{\text{noobj}} (C_i – \hat{C}_i)^2 + \sum_{i=0}^{S^2} \mathbb{1}_{i}^{\text{obj}} \sum_{c \in \text{classes}} (p_i(c) – \hat{p}_i(c))^2 $$

Here, $\mathbb{1}_{ij}^{\text{obj}}$ denotes if the j-th bounding box in cell i is responsible for an object, and $\lambda_{\text{coord}}$ and $\lambda_{\text{noobj}}$ are hyperparameters that balance the contributions. After training, we evaluated the model on a test set of 300 images containing 585 face instances. The results, summarized in Table 1, show a high recognition rate, with most faces detected at confidence levels above 98%.

Volunteer ID	Number of Faces in Test Set	Faces Correctly Detected	Average Confidence Score	Recognition Rate (%)
1	98	92	0.99	93.9
2	102	96	0.98	94.1
3	95	88	0.97	92.6
4	105	98	0.99	93.3
5	97	90	0.98	92.8
6	88	82	0.99	93.2
Total	585	546	0.98	92.6

The overall recognition rate of 92.6% demonstrates the efficacy of YOLO in handling aerial face recognition. Notably, the quadrotor drone’s ability to capture multi-angle views enhanced the model’s robustness. During real-time tests, the platform maintained stable flight while processing video at 30 frames per second. The ground station displayed bounding boxes and labels over detected faces, as shown in the inserted image. This performance underscores the potential of quadrotor drones for security and monitoring applications.

Beyond technical metrics, this quadrotor drone platform offers significant educational benefits. It bridges theoretical concepts in deep learning and practical skills in UAV operation. Students can modify the YOLO architecture, experiment with different datasets, or implement additional features like tracking. The wireless communication aspect introduces them to network protocols and real-time systems. To further illustrate the system’s parameters, Table 2 outlines key specifications of the quadrotor drone and ground station.

Component	Specification	Description
Quadrotor Drone Model	Tello	Micro UAV with 4 brushless motors, 5V battery
Camera Resolution	HD 720p	Forward-facing camera for video capture
Flight Time	13 minutes	Typical duration on a single charge
Communication	Wi-Fi 802.11n	UDP-based control and video streaming
Ground Station OS	Ubuntu 18.04	Running Python 2.7 and OpenCV
GPU	NVIDIA GTX 1080	For accelerated deep learning computations
YOLO Input Size	448×448 pixels	Resized from original video frames
Dataset Size	1,996 images	Custom-collected using the quadrotor drone

The mathematical modeling of drone dynamics also plays a role in understanding flight stability. For a quadrotor drone, the thrust generated by each rotor is proportional to the square of its rotational speed. The total thrust T and torques $\tau$ are given by:

$$ T = k_f (\omega_1^2 + \omega_2^2 + \omega_3^2 + \omega_4^2) $$
$$ \tau_x = k_f l (\omega_4^2 – \omega_2^2) $$
$$ \tau_y = k_f l (\omega_3^2 – \omega_1^2) $$
$$ \tau_z = k_m (\omega_1^2 – \omega_2^2 + \omega_3^2 – \omega_4^2) $$

Here, $k_f$ and $k_m$ are thrust and drag coefficients, $l$ is the arm length, and $\omega_i$ are rotor speeds. The flight controller adjusts these speeds to maintain orientation and position. Integrating this with vision-based control allows the quadrotor drone to autonomously track faces, though in our platform, navigation is manual via ground station commands.

In conclusion, this quadrotor drone face recognition platform exemplifies the synergy between cutting-edge AI and UAV technology. By leveraging YOLO’s speed and accuracy, we enable real-time identification from aerial perspectives. The system’s modular design encourages experimentation, making it a valuable resource for engineering education. Future work may involve enhancing the quadrotor drone’s autonomy with obstacle avoidance or integrating multi-drone swarms for large-scale surveillance. As quadrotor drones become increasingly prevalent, hands-on experience with such systems will prepare students for emerging challenges in automation and robotics.