Autonomous Flight Control of a Quadrotor Drone Using Onboard Vision

The pursuit of robust, autonomous navigation for small-scale unmanned aerial vehicles (UAVs), particularly the quadrotor drone, in GPS-denied and unstructured environments remains a significant research challenge. Inspired by the elegant navigation capabilities of flying insects, vision-based methods, and specifically the use of optical flow, present a compelling solution. This article details my comprehensive research and development of a fully autonomous control system for a quadrotor drone, centered on a novel, embedded architecture that processes optical flow entirely onboard.

Traditional methods for quadrotor drone localization, such as GPS, fail indoors or in cluttered areas, while laser range finders often exceed the strict weight and size constraints of micro aerial vehicles. Vision sensors, being lightweight and information-rich, are ideal. Among vision techniques, optical flow—the apparent motion of brightness patterns in a sequence of images—is particularly suitable. It requires no external infrastructure or pre-defined markers, has relatively lower computational demands compared to full SLAM (Simultaneous Localization and Mapping) algorithms, and provides direct cues about ego-motion relative to the environment, mimicking biological systems.

My work differentiates itself through a key design philosophy: complete onboard autonomy. Many existing systems offload computationally intensive tasks like optical flow calculation to a ground station, creating dependencies on wireless data links that introduce latency, potential dropouts, and limit operational range. I address this by implementing an embedded control system architecture where all perception, state estimation, and control algorithms run on compact, onboard computing hardware. This design significantly enhances the quadrotor drone’s capability for genuine self-contained operation.

System Architecture and Platform

The foundation of this research is a custom-built quadrotor drone platform designed around the principle of embedded processing. The core airframe is a commercially available model, modified and equipped with the following key components:

Vision Sensor: A Logitech QuickCam Pro 4000, providing image streams at 640×480 pixel resolution.
Flight Controller Board: This unit integrates a MEMS-based Inertial Measurement Unit (IMU) providing 3-axis accelerometer, gyroscope, and magnetometer data, and a barometer for altitude estimation.
Embedded Onboard Computer: An Intel Atom processor running a Linux operating system. This is the computational heart, executing all image processing, state estimation, and high-level control algorithms.
Communication Link: A wireless module used primarily for telemetry data downlink and experimental monitoring, not for real-time control loop closure.

The total takeoff mass of the quadrotor drone is approximately 1.4 kg. The data flow within this embedded architecture is sequential and self-contained. Images from the downward-facing camera and inertial data from the flight controller are sent via serial connection to the Atom computer. Here, optical flow is calculated, fused with attitude data to estimate horizontal velocity and position. This estimated state is then passed as a feedback signal to the control module, which generates actuator commands. These commands are sent to the flight controller, which handles the low-level motor mixing and commands the Electronic Speed Controllers (ESCs). This eliminates the ground station from the critical control loop.

Optical Flow Computation and Motion State Estimation

The accurate estimation of the quadrotor drone’s horizontal motion is the cornerstone of the control system. This is achieved through a multi-stage pipeline: calculating raw optical flow, filtering the data, and fusing it with inertial measurements.

Optical Flow Calculation

For robustness to the relatively fast and large motions possible in agile quadrotor drone flight, I employ the Lucas-Kanade method within an image pyramid framework. This approach tracks features at multiple scales (from a coarse, downsampled image to the full resolution), allowing it to handle displacements that would violate the “small motion” assumption of the basic algorithm.

Let $I^L(x, y)$ represent the image intensity at pixel $(x, y)$ in level $L$ of the pyramid, where $L=0$ is the original resolution and $L_{max}$ is the coarsest level. The calculation starts at the coarsest level. The goal is to find the displacement vector $\vec{d} = (d_x, d_y)^T$ between two consecutive frames. At each pyramid level $L$, an initial motion guess $\vec{g}^L$ (inherited from the previous coarser level $L+1$) is refined. The brightness constancy assumption is applied in a warped form:
$$ I^L(x, y) = I^L(x + g^L_x + d^L_x, y + g^L_y + d^L_y) $$
The incremental flow $\vec{d}^L$ at this level is computed by minimizing the sum of squared differences in a local window. The guess for the next finer level is then updated:
$$ \vec{g}^{L-1} = 2(\vec{g}^L + \vec{d}^L) $$
The final optical flow vector at the original image level is the accumulation of all incremental flows:
$$ \vec{d} = \vec{g}^0 + \vec{d}^0 $$
In practice, I compute flow for a grid of feature points across the image and take the spatial median to obtain a single, robust $\vec{d}_f = (d_f^x, d_f^y)^T$ representing the frame’s raw optical flow.

Flow Filtering and Rotational Compensation

The raw optical flow $\vec{d}_f$ encodes both translational motion of the quadrotor drone and rotational motion due to changes in attitude (roll and pitch). To extract the pure translational component needed for position estimation, the rotational component must be removed. This is done by fusing IMU data.

First, the raw flow is filtered using a discrete Kalman filter to reduce noise from image artifacts. The state vector is $\vec{x}_k = (\hat{d}_x, \hat{d}_y)_k^T$, the filtered flow. The process and measurement equations are:
$$ \vec{x}_k = A \vec{x}_{k-1} + \vec{w}_{k-1} $$
$$ \vec{z}_k = \vec{d}_{f,k} = H \vec{x}_k + \vec{v}_k $$
where $\vec{w}$ and $\vec{v}$ are process and measurement noise, and $A$ and $H$ are identity matrices for this simple velocity model.

The filtered flow $\vec{d}_{filtered}$ is then compensated for rotation. The rotational component of flow is directly proportional to the angular rate and the focal length. Using the measured change in roll ($\Delta\phi$) and pitch ($\Delta\theta$) between frames, the compensated translational flow $\vec{d}_p = (d_p^x, d_p^y)^T$ is:
$$ d_p^x = d_{filtered}^x – \frac{f \cdot \Delta\phi}{R_x} $$
$$ d_p^y = d_{filtered}^y – \frac{f \cdot \Delta\theta}{R_y} $$
where $f$ is the camera’s focal length in pixels, and $R_x$, $R_y$ are constants relating angular change to pixel displacement based on the camera’s field of view.

Horizontal Velocity and Position Estimation

The translational optical flow $\vec{d}_p$ is fundamentally a measurement of angular velocity relative to the ground. To convert it to linear velocity, the altitude $h$ (measured by the barometer) is essential. The relationship is derived from the pinhole camera model. The estimated incremental displacement of the quadrotor drone in the world frame ($\Delta X, \Delta Y$) between frames is:
$$ \Delta X = \frac{h \cdot d_p^x}{f \cdot s} $$
$$ \Delta Y = \frac{h \cdot d_p^y}{f \cdot s} $$
Here, $s$ is a scaling factor accounting for sensor dimensions. By integrating these incremental displacements over time (e.g., $\hat{X}_k = \hat{X}_{k-1} + \Delta X_k$), an estimate of the quadrotor drone’s horizontal position relative to an arbitrary starting point is obtained. Similarly, instantaneous horizontal velocities can be estimated as $\hat{V}_x = \Delta X / \Delta t$.

The following table summarizes the key parameters and their roles in the motion estimation pipeline for the quadrotor drone.

Parameter	Symbol	Source	Role in Estimation
Raw Image Flow	$\vec{d}_f$	Camera + LK Algorithm	Raw apparent motion (pixels/frame)
Filtered Flow	$\vec{d}_{filtered}$	Kalman Filter	Denoised apparent motion
Attitude Change	$\Delta\phi, \Delta\theta$	IMU Gyroscope	Compensates for rotational flow
Altitude	$h$	Barometer	Converts angular to linear motion
Focal Length	$f$	Camera Calibration	Scaling factor (pixels)
Estimated Displacement	$\Delta X, \Delta Y$	Calculation (Eq. above)	Quadrotor drone’s horizontal movement (meters)

Control System Design

With a reliable estimate of the quadrotor drone’s horizontal position and velocity, a control system can be designed to achieve autonomous flight. The quadrotor drone is an underactuated, highly coupled nonlinear system. For the scope of this work, which focuses on the embedded perception pipeline, I implemented a classical yet effective nested Proportional-Integral-Derivative (PID) control structure. This structure is well-understood and effectively demonstrates the usability of the vision-based state estimates.

The control architecture consists of two primary loops: an outer loop for position/velocity control and an inner loop for attitude stabilization. The estimated states from the vision-inertial pipeline ($\hat{X}, \hat{Y}, \hat{V}_x, \hat{V}_y$) serve as the feedback for the outer loop.

Outer Loop (Position/Velocity Control): This loop takes the desired setpoint (e.g., hover at $(0,0)$) and the estimated current position $(\hat{X}, \hat{Y})$. A PID controller computes desired velocity commands, or more commonly, directly computes desired roll ($\phi_{des}$) and pitch ($\theta_{des}$) angles. For instance, the desired pitch angle for forward/backward motion might be:
$$ \theta_{des} = K_{p,x} (X_{des} – \hat{X}) + K_{d,x} (-\hat{V}_x) + K_{i,x} \int (X_{des} – \hat{X}) dt $$
A similar equation governs roll for lateral motion. The desired yaw angle $\psi_{des}$ is typically set independently or held constant.
Inner Loop (Attitude Control): This faster loop runs on the flight controller board. It takes the desired angles $(\phi_{des}, \theta_{des}, \psi_{des})$ from the outer loop and the current estimated angles $(\phi, \theta, \psi)$ from the IMU. A second set of PID controllers (often operating on angular rates for better performance) generates torque commands to drive the angle errors to zero. These torque commands are then translated into differential thrust commands for the four motors.

The control law for the inner loop angular rate (e.g., for pitch rate $q$) can be represented as:
$$ \tau_\theta = K_{p,q} (q_{des} – q) + K_{i,q} \int (q_{des} – q) dt + K_{d,q} \dot{q} $$
where $\tau_\theta$ is the command torque, and $q_{des}$ is derived from the outer-loop pitch angle error. The specific gains used in the experiments for the quadrotor drone are listed below. These were tuned empirically to achieve stable flight.

Control Loop	Axis	$K_p$	$K_i$	$K_d$
Inner Loop (Rate)	Pitch Rate	0.8	0.005	0.003
	Roll Rate	0.8	0.005	0.003
	Yaw Rate	0.2	0.020	0.000
Inner Loop (Angle)	Pitch	4.5	0.0	0.0
	Roll	4.5	0.0	0.0
	Yaw	6.0	0.0	0.0
Outer Loop (Velocity)	X-Velocity	1.0	0.5	0.0
Outer Loop (Velocity)	Y-Velocity	1.0	0.5	0.0
Outer Loop (Position)	X-Position	0.1	0.0	0.0
Outer Loop (Position)	Y-Position	0.1	0.0	0.0

Experimental Validation and Results

The performance of the complete system—embedding the vision pipeline, state estimation, and control on the quadrotor drone—was evaluated through outdoor flight tests. The primary mission was autonomous hover at a fixed point from an arbitrary manual takeoff. The quadrotor drone was manually flown to an approximate hover point at about 4 meters altitude, and then switched into autonomous mode, relying solely on its onboard sensors and algorithms.

Attitude and Height Control

The inner-loop attitude controller provided stable base for the outer-loop vision-based control. During autonomous hover, the roll and pitch angles were maintained within ±2° for the majority of the flight, with maximum deviations not exceeding ±3°. The yaw angle was successfully locked to within a few degrees of its initial value. The altitude, controlled using the barometer, remained stable around the 4-meter setpoint with minimal drift. This demonstrates that the low-level stabilization of the quadrotor drone was effective and did not interfere with the higher-level optical flow estimation.

Position Hold Performance

The critical test was the accuracy of the vision-based horizontal position hold. The desired hover point was set as the origin (0,0) in the local frame at the moment of autonomy engagement. The estimated $(\hat{X}, \hat{Y})$ trajectory was logged over a 60-second autonomous hover period. A quantitative analysis of the position error, defined as $e = \sqrt{\hat{X}^2 + \hat{Y}^2}$, yielded impressive results. The statistical distribution of the hover error is summarized below.

Metric	Value	Interpretation
Points within 0.35m radius	91.25%	High-precision core performance
Points within 0.50m radius	100%	Complete bounded operation
Maximum observed error	< 0.50m	Stable operational envelope

This performance indicates that the embedded optical flow system enabled the quadrotor drone to maintain its position with an accuracy superior to typical consumer-grade GPS, all without any external positioning aids. The circular error probable (CEP) for this system, derived from the data, is well below 0.5 meters. The primary limitations on accuracy were identified as the moderate-grade MEMS IMU used for rotational compensation and the inherent drift associated with integrating velocity estimates. Future work with higher-grade IMUs and the inclusion of zero-velocity updates during stable hover could further reduce this drift.

Conclusion and Future Work

This research successfully demonstrated a practical framework for autonomous flight of a quadrotor drone using an embedded, vision-based system. The key contributions are threefold. First, the implementation of a fully onboard processing architecture eliminates dependence on ground station computation and fragile data links, moving closer to true agent-level autonomy for the quadrotor drone. Second, the use of a standard camera with a pyramidal optical flow algorithm, fused with inertial data, provides a versatile and expandable perception solution applicable in diverse GPS-denied environments. Third, the integration of this perception system with a nested PID controller resulted in stable and precise hover control, validating the entire pipeline.

The experimental results confirm that a small quadrotor drone can achieve autonomous position hold with decimeter-level accuracy using only lightweight, onboard sensors. This capability is fundamental for more complex missions such as indoor exploration, infrastructure inspection, or inventory management in warehouses.

Future directions for this work are promising. The control strategy can be advanced from PID to nonlinear or robust control techniques (e.g., sliding mode control, backstepping) to explicitly handle model uncertainties and disturbances, improving performance in windy conditions. The perception system can be enhanced by incorporating a dynamical model of the quadrotor drone into a filtering framework (e.g., an Extended Kalman Filter) to better fuse optical flow with IMU data and reduce drift. Furthermore, expanding the system to include obstacle detection, either from the same optical flow field or by adding a forward-facing depth sensor, would enable autonomous navigation in cluttered environments, making the quadrotor drone an even more capable and intelligent aerial robot.