The traditional model of livestock herding faces significant challenges including low efficiency, high labor costs, and numerous monitoring blind spots in complex terrains. With the consumption upgrade driving the scaling of animal husbandry, there is a pressing need for technological innovation. The “14th Five-Year Plan for the National Animal Husbandry and Veterinary Industry Development” explicitly states that Unmanned Aerial Vehicle (UAV) technology will play a key role in areas such as data collection and dynamic supervision. To enhance the automation and precision of herd management in small and medium-sized pastures, our team has designed and implemented a low-cost, multi-rotor intelligent herding UAV drone system based on the PX4 (Pixhawk 4) flight control platform. This system integrates an improved YOLOv5 object detection model, OpenHD high-definition video transmission, Real-Time Kinematic (RTK) centimeter-level positioning, PID flight control algorithms, and binocular vision obstacle avoidance. Controlled by an onboard Raspberry Pi 5 computer within a Robot Operating System (ROS) framework, the UAV drone supports real-time 1080p video transmission and long-range communication. It enables autonomous patrolling, precise sheep counting, detection of strays, targeted auditory deterrence, and the establishment of geo-fences to prevent the UAV drone from leaving designated boundaries. This design offers a novel approach and methodology for the intelligent management of the livestock industry.

The airframe serves as the foundational physical platform for the entire UAV drone system. Our design prioritizes structural robustness, vibration damping, stability, and lightweight properties to ensure reliable operation in demanding pastoral environments. The primary material selected for the frame is PETG (Polyethylene Terephthalate Glycol), known for its excellent layer adhesion, impact resistance, and ease of 3D printing. Key structural components include a base plate, a top plate, and four symmetrical arm assemblies. The base plate incorporates mounting holes and dual damping blocks at each corner. These damping blocks feature a unique design with a ring-shaped protrusion and elastic semi-conical bumps at a 60°–90° angle, which effectively disperses impact stress from hard landings. The arm assembly is a composite structure consisting of a main arm and a support arm. The main arm, printed from PETG, is internally reinforced with a carbon fiber rod running no less than 75% of its length. This pre-embedded rod, as shown in related 3D models, significantly enhances torsional stiffness and tensile strength while minimizing weight. The top plate includes corresponding mounting holes, central heat dissipation vents, and cable routing ports. It also features extended curved platforms at the corners for mounting payloads like gimbals, designed to reduce aerodynamic drag. All components are rigidly connected via bolts. The arms are arranged in an ‘X’ configuration with a 90°夹角 (angular displacement), and the optimal angle between the main and support arms is maintained between 51° and 70°. This synergistic design, combining material selection, innovative damping, and strategic reinforcement, grants the UAV drone exceptional resistance to shock, structural stability, and efficient heat dissipation, making it well-suited for professional applications like aerial monitoring in complex pastoral settings.
| Component | Primary Material | Key Feature | Design Purpose |
|---|---|---|---|
| Base Plate | PETG | Damping blocks with conical bumps | Absorb and disperse landing impact |
| Arm Assembly | PETG + Carbon Fiber Rod | Internal composite reinforcement | Maximize strength-to-weight ratio, resist bending |
| Top Plate | PETG | Heat dissipation vents, curved mounts | Improve cooling, reduce aerodynamic drag |
| Overall Layout | — | X-configuration, optimized arm angles | Enhance flight stability and control authority |
The rapid decline in manufacturing costs has made small multi-rotor UAV drones highly accessible. Compared to fixed-wing or larger drones, multi-rotor platforms offer superior maneuverability, vertical take-off and landing (VTOL) capability, and strong expansibility, making them more advantageous for herding applications in confined or uneven pastures. Our system is architected around integration and cost-effectiveness. The hardware core comprises an onboard Raspberry Pi 5 single-board computer, a PX4 flight controller, electronic speed controllers (ESCs), brushless motors, a multi-constellation GNSS module, a laser rangefinder, and an OpenHD video/data transmission module. The UAV drone‘s motion is governed by the flight controller, which processes inputs from the pilot via radio control or pre-programmed missions from the ground station.
The software ecosystem is centered on the PX4 autopilot stack and ROS. The PX4 firmware, featuring a modular architecture, handles low-level flight stabilization, sensor fusion, and autonomous navigation. It communicates with higher-level systems using the MAVLink protocol. The Raspberry Pi 5 runs ROS (Robot Operating System), which acts as a middleware to seamlessly integrate various subsystems. Key software nodes on the Pi handle computer vision processing, communication relay to the ground station, and high-level task management. The primary ground control station (GCS) software used is QGroundControl (QGC), which allows the herder to plan flight paths, monitor real-time telemetry and video, and send commands. The overall system framework demonstrates how instructions flow from the herder through QGC and the communication link to the UAV drone, where the PX4 and Raspberry Pi collaborate to execute tasks and return data.
| Subsystem | Component | Model/Specification | Primary Function |
|---|---|---|---|
| Computation & Control | Flight Controller | Pixhawk 4 (PX4) | Flight stabilization, sensor fusion, autonomous navigation |
| Onboard Computer | Raspberry Pi 5 | High-level processing, ROS master, vision inference | |
| Perception & Navigation | GNSS Module | Multi-constellation (GPS, BDS, GLONASS, Galileo) | Global positioning and velocity |
| RTK System | Base Station + Rover | Centimeter-level absolute positioning | |
| Vision Sensor | Binocular Camera | Depth perception for obstacle avoidance | |
| Propulsion | Motor & ESC | Brushless Motor, 30A ESC | Generate lift and control thrust |
| Communication | Video/Data Link | OpenHD 2.4/5.8 GHz Module | Real-time HD video and telemetry transmission |
| Payload | Deterrence System | Directional Wireless Speaker | Emit auditory signals for herd guidance |
The flight control system is the brain of the UAV drone. The PX4’s Inertial Measurement Unit (IMU), incorporating accelerometers and gyroscopes, is meticulously calibrated to minimize sensor biases. Raw sensor data from the IMU, GNSS, magnetometer, and barometer are fused using an Extended Kalman Filter (EKF2) algorithm. The EKF2 provides a robust estimate of the vehicle’s state (position, velocity, attitude). The core attitude and position control rely on Proportional-Integral-Derivative (PID) controllers. The attitude controller calculates required torque based on orientation error, while the position/velocity controller calculates desired acceleration. These outputs are converted into motor speed commands. The control law for a single axis (e.g., pitch) can be simplified as:
$$ \tau = K_p e + K_i \int e \, dt + K_d \frac{de}{dt} $$
where $\tau$ is the control output (torque), $e$ is the error (desired angle – estimated angle), and $K_p$, $K_i$, $K_d$ are the tuned gains. For navigation, we employ a multi-constellation GNSS receiver capable of tracking GPS, BeiDou, GLONASS, and Galileo satellites simultaneously. This increases the number of available satellites, improving accuracy and reliability, especially in environments with partial sky occlusion. The key metric for satellite geometry, Geometric Dilution of Precision (GDOP), is generally lower with more satellites distributed across the sky:
$$ \text{GDOP} = \sqrt{\text{trace}((G^T G)^{-1})} $$
where $G$ is the geometry matrix relating user position and clock bias to satellite ranges. Lower GDOP indicates better positioning accuracy. We enhance this further with Real-Time Kinematic (RTK) positioning. The UAV drone (rover) receives correction data from a fixed base station with known coordinates, enabling it to resolve integer ambiguities in carrier-phase measurements and achieve centimeter-level positioning accuracy, which is crucial for precise geo-fencing and path following.
Reliable communication is vital for command, control, and situational awareness. Our system employs an OpenHD-based link for both telemetry data and high-definition video. Unlike standard Wi-Fi, OpenHD uses a WiFi Broadcast technique, transmitting data in a one-way broadcast stream without requiring a handshake connection. This makes it more resilient to packet loss and interference. The data is transmitted using the MAVLink protocol, while the video stream is encoded (typically with H.264/H.265) and transmitted on a separate channel. The theoretical maximum range is up to 5 km under ideal conditions. The ground operator uses QGroundControl as the interface to send mission commands (waypoints, actions) and receive real-time telemetry (altitude, speed, battery status) and the 1080p video feed from the UAV drone‘s camera. This allows for remote monitoring of the herd and manual intervention if necessary.
The machine vision unit is the cornerstone of the system’s intelligence. Its task is to automatically detect, count, and analyze sheep from the aerial perspective of the UAV drone. We based our solution on the YOLOv5 (You Only Look Once) object detection architecture due to its excellent balance of speed and accuracy, which is suitable for real-time inference on edge devices like the Raspberry Pi. To improve performance specifically for the challenges of aerial herding—such as small object size, occlusions, and varying lighting—we modified the standard YOLOv5s model. In the backbone network, we integrated the ConvNeXt block design to enhance feature extraction efficiency. Furthermore, we incorporated the Convolutional Block Attention Module (CBAM) into the network. CBAM sequentially infers attention maps along both channel and spatial dimensions, allowing the model to focus on more informative features. The channel attention $M_c$ and spatial attention $M_s$ are computed as:
$$ \begin{aligned} M_c(F) &= \sigma(MLP(AvgPool(F)) + MLP(MaxPool(F))) \\ M_s(F) &= \sigma(f^{7 \times 7}([AvgPool(F); MaxPool(F)])) \end{aligned} $$
where $F$ is the input feature map, $\sigma$ is the sigmoid function, and $f^{7 \times 7}$ is a convolution operation. The final output is $F’ = M_s(M_c(F) \otimes F) \otimes F$. We trained the model on a composite dataset comprising publicly available aerial sheep datasets and our own collected imagery. The model was optimized, converted to a lightweight format (ONNX), and deployed on the Raspberry Pi 5 using the ONNX Runtime engine. The processed video frames, with bounding boxes and counts overlaid, are then sent to the ground station via the OpenHD link, providing the herder with actionable intelligence.
We conducted a comprehensive series of tests to evaluate the performance of our UAV drone system, focusing on flight endurance, control stability, and the accuracy of the vision system.
Flight Performance Tests: The UAV drone was subjected to autonomous flight tests along predefined waypoint paths in both indoor (calm) and outdoor (moderate wind) environments. The primary metrics were flight time until low-battery warning and system temperature. The UAV drone was configured to fly at 2.5m altitude with a speed of 1.5 m/s. The results are summarized below:
| Test Environment | Flight # | Endurance (min) | Battery Energy Used (Wh) | Frame Temp (°C) |
|---|---|---|---|---|
| Outdoor (Wind ~3-5 m/s) | 1 | 17.7 | 19.71 | 33 |
| 2 | 18.2 | 18.87 | 32 | |
| Indoor (Calm) | 3 | 22.3 | 17.85 | 35 |
| 4 | 23.1 | 17.44 | 35 |
The data shows a clear difference in endurance based on environmental conditions. The outdoor flights, battling variable winds, lasted an average of 17.95 minutes and consumed more energy as the flight control system constantly adjusted the motors to maintain position and heading. The indoor flights, free from wind disturbance, achieved a longer average endurance of 22.7 minutes with lower energy consumption. The frame temperature remained within safe limits in all tests, confirming adequate heat dissipation from the electronic components.
Vision Model Evaluation: We rigorously evaluated the performance of our improved YOLOv5 model against the baseline. The model was trained using a stochastic gradient descent optimizer with a momentum of 0.937 and weight decay of 0.0005. The key evaluation metrics are Precision (the fraction of correct positive predictions), Recall (the fraction of actual positives correctly identified), and mean Average Precision (mAP). mAP@0.5 calculates the average precision when the Intersection over Union (IoU) threshold is 0.5, while mAP@0.5:0.95 averages mAP over IoU thresholds from 0.5 to 0.95 in steps of 0.05.
$$ \text{Precision} = \frac{TP}{TP + FP}, \quad \text{Recall} = \frac{TP}{TP + FN}, \quad \text{AP} = \int_{0}^{1} p(r) dr $$
where $TP$ are true positives, $FP$ are false positives, $FN$ are false negatives, and $p(r)$ is the precision-recall curve. The performance comparison and an ablation study on the effect of the CBAM module are presented below:
| Model Version | Precision | Recall | mAP@0.5 | mAP@0.5:0.95 | Inference Speed (FPS) |
|---|---|---|---|---|---|
| Baseline YOLOv5s | 0.91 | 0.89 | 0.89 | 0.55 | 33 |
| YOLOv5s + CBAM (Ours) | 0.96 | 0.94 | 0.97 | 0.62 | 26 |
| ConvNeXt Block | CBAM Module | mAP@0.5 | Δ mAP |
|---|---|---|---|
| ✓ | ✗ | 0.93 | — |
| ✓ | ✓ | 0.97 | +0.04 |
The results are conclusive. Our modified YOLOv5 model with CBAM attention achieved a precision of 0.96 and a recall of 0.94, significantly outperforming the baseline. The mAP@0.5 score improved from 0.89 to an excellent 0.97. The ablation study confirms that the integration of the CBAM module was responsible for a clear and substantial gain of 4 percentage points in mAP@0.5. While the inference speed decreased slightly from 33 to 26 FPS on the Raspberry Pi 5, this framerate remains more than sufficient for real-time herding applications, where sheep movement is relatively slow. Visual analysis of the training curves showed stable convergence of the loss function and high final metric values, indicating a well-trained model without overfitting.
In conclusion, the low-cost, multi-rotor intelligent herding UAV drone system developed in this work, centered on the PX4 flight control platform, has demonstrated high effectiveness and reliability in autonomous monitoring and herd management tasks. Through systematic design and testing, we have validated the system’s capabilities. The structurally optimized airframe ensures durability and stable flight. The integration of RTK-GNSS and sensor fusion provides robust, centimeter-accurate navigation essential for geo-fencing. The custom OpenHD link offers a stable, long-range channel for HD video and data. Most importantly, the enhanced YOLOv5-based vision system delivers precise, real-time sheep detection and counting even in complex pastoral settings. This integrated UAV drone system presents a practical, modular, and cost-effective technological solution for modernizing small and medium-scale livestock management. It directly addresses the inefficiencies of traditional herding by reducing labor costs, eliminating monitoring gaps, and enabling proactive herd management. Future work will focus on further model optimization for lower power consumption, exploring multi-UAV drone collaborative herding strategies, and integrating additional sensors for pasture health assessment, thereby expanding the system’s role in comprehensive, precision livestock farming.
