Design and Implementation of a Visual Inertial Odometry System for Multirotor Drones

In modern aerial robotics, the navigation and positioning of multirotor drones are critical for autonomous operations. Traditional methods heavily rely on Global Navigation Satellite System (GNSS) signals fused with inertial measurement units (IMUs). However, in environments where GNSS signals are denied or interfered with, such as urban canyons or indoor settings, multirotor drones face significant challenges in maintaining accurate localization. To address this, I have designed and implemented a visual inertial odometry (VIO) system that leverages camera and IMU data to provide reliable pose estimates. This system is tailored for multirotor drones, ensuring robustness and real-time performance. In this article, I will detail the overall system design, hardware selection, software architecture, communication protocols, and algorithmic framework, supported by experimental validation.

The core of this system is an improved VINS_Fusion algorithm, which combines visual data from a stereo camera with inertial data from an IMU. By using nonlinear optimization, the system achieves high accuracy while maintaining computational efficiency. The hardware platform is based on the Nvidia Jetson Xavier NX, chosen for its high computing power and compact size, making it ideal for integration into multirotor drones. The software environment utilizes Ubuntu 18.04 and the Robot Operating System (ROS) to facilitate multi-threaded processing and seamless communication between modules. Through this design, the VIO system can operate in real-time, providing pose estimates at 200 Hz even when GNSS is unavailable.

One of the key challenges in developing VIO for multirotor drones is balancing accuracy and computational load. Multirotor drones often operate in dynamic environments with limited onboard resources, so the algorithm must be optimized for speed without sacrificing precision. My approach involves a semi-tightly coupled initialization method and the removal of loop closure and mapping modules to reduce complexity. This allows the system to run efficiently on embedded hardware while still delivering centimeter-level accuracy. In the following sections, I will elaborate on each component of the system and present experimental results that demonstrate its effectiveness.

The overall architecture of the visual inertial odometry system for multirotor drones is depicted in the figure above. It consists of two main modules: the flight control module and the mission computer module. The flight control module handles the interaction with the mission computer, forwarding IMU data and receiving computed pose estimates. The mission computer, equipped with the Nvidia Jetson Xavier NX, processes the sensor data and runs the VIO algorithm. This modular design ensures that the multirotor drone can switch between GNSS-based and vision-based navigation seamlessly, enhancing its adaptability in diverse environments.

In terms of hardware, the mission computer is a critical component. I selected the Nvidia Jetson Xavier NX for its high performance and low power consumption, which are essential for multirotor drones where weight and energy efficiency are paramount. The core specifications of this platform are summarized in the table below.

Table 1: Key Specifications of the Mission Computer
Component	Specification
Processor	6-core NVIDIA Carmel ARM®v8.2 64-bit
RAM	8 GB 128-bit LPDDR4x at 51.2 GB/s
Storage	16 GB eMMC 5.1
Weight	<60 grams
AI Performance	14 TOPS (10 W) / 21 TOPS (15 W)

To support the mission computer, I designed a carrier board that includes power management, communication interfaces, and debugging modules. The board features connectors for the flight control computer, ad-hoc networking, stereo vision, and M.2 interfaces. This design ensures that the multirotor drone can handle various peripherals while maintaining a small form factor. The visual sensor chosen is the Intel Realsense D435 stereo camera, which uses active near-infrared stereo imaging to output depth data directly. This reduces the computational burden on the mission computer, as it does not need to perform stereo matching in software. The key parameters of the camera are listed in the following table.

Table 2: Stereo Camera Imaging Parameters
Parameter	Description
Operating Environment	Indoor/Outdoor
Depth Technology	Infrared Stereo
Measurement Range	0.3–20 m (ideal 0.3–3 m)
Depth Resolution	1280 × 720 pixels
Depth Frame Rate	90 fps

The software system is built on Ubuntu 18.04 with ROS Melodic, providing a robust environment for developing and running the VIO algorithm. ROS enables topic-based communication between nodes, allowing the camera node to publish stereo images and the flight control node to publish IMU data. The VIO node subscribes to these topics, processes the data, and outputs pose estimates. This decoupled architecture facilitates easy integration and testing. For instance, the camera node publishes data to a topic like /camera/stereo, while the flight control node publishes to /imu/data. The VIO node then fuses these inputs to compute the multirotor drone’s position and orientation.

Communication between the flight control and mission computer modules is defined using custom ROS messages. Two primary message types are used: one for data from the flight control to the mission computer, and another for the reverse direction. The message from the flight control, named FlightToVisual.msg, includes header information, orientation in quaternion form, IMU data, GNSS data (if available), and a boolean flag to toggle visual positioning. This message is published at 200 Hz to ensure timely updates. The table below details its structure.

Table 3: FlightToVisual.msg Message Definition
Field Number	Data Type	Member Variable	Description
1	std_msgs/Header	header	Timestamp and sequence information
2	geometry_msgs/Quaternion	orientation	3-axis attitude in quaternion form
3	IMU_struct	imu_data	IMU sensor data
4	GNSS_struct	gnss_data	GNSS coordinates (longitude, latitude, altitude)
5	bool	vis_pos_switch	True to enable visual positioning, false to disable

Conversely, the message from the mission computer to the flight control, named VisualToFlight.msg, contains the computed pose, GNSS-like coordinates, and a validity flag. This message is also published at 200 Hz via the /visual_to_flight topic. Its structure is outlined in the following table.

Table 4: VisualToFlight.msg Message Definition
Field Number	Data Type	Member Variable	Description
1	std_msgs/Header	header	Timestamp and sequence information
2	geometry_msgs/Quaternion	orientation	3-axis attitude in quaternion form
3	GNSS_struct	gnss_data	GNSS-style coordinates
4	bool	is_visual_valid	Indicates if visual positioning data is valid

The algorithmic framework of the VIO system is based on an improved version of VINS_Fusion. The key enhancements include a semi-tightly coupled initialization method and the removal of loop closure and mapping modules to reduce computational overhead. This is particularly important for multirotor drones, which require real-time performance over long distances. The initialization process estimates the initial states needed for the VIO algorithm, such as gyroscope bias, scale, gravity vector, and velocity. The steps are as follows:

Input: Keyframes in the sliding window and spatiotemporally aligned IMU data.

Step 1: Estimate the gyroscope bias using pure visual information.

Step 2: Solve a set of equations to preliminarily estimate scale, gravity vector, and velocity, ignoring accelerometer bias.

Step 3: Refine the gravity vector.

Output: Camera pose, gyroscope bias, gravity vector, and velocity.

The state estimation in the VIO algorithm involves nonlinear optimization to minimize the error between predicted and observed measurements. The cost function can be expressed as:

$$E(\mathbf{X}) = \sum_{i} \left\| \mathbf{z}_i – h_i(\mathbf{X}) \right\|_{\Sigma_i}^2$$

where $\mathbf{X}$ is the state vector containing pose, velocity, and bias terms, $\mathbf{z}_i$ is the measurement, $h_i$ is the measurement model, and $\Sigma_i$ is the covariance matrix. For multirotor drones, the state vector typically includes the position $\mathbf{p}$, velocity $\mathbf{v}$, orientation quaternion $\mathbf{q}$, and IMU biases $\mathbf{b}_g$ and $\mathbf{b}_a$. The optimization is performed over a sliding window of frames to maintain real-time performance.

To validate the system, I conducted experiments using both dataset simulation and real-world tests on a multirotor drone. For simulation, I used the EuRoc dataset, specifically the V2_03_difficult sequence, which presents challenging conditions such as rapid motion and poor lighting. The performance was evaluated using the Absolute Trajectory Error (ATE), which measures the root-mean-squared error (RMSE) between the estimated and ground truth trajectories. The ATE is computed as:

$$\text{ATE} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} \left\| \log\left( \mathbf{T}_{\text{gt},i}^{-1} \mathbf{T}_{\text{est},i} \right)^\vee \right\|^2}$$

where $\mathbf{T}_{\text{gt},i}$ and $\mathbf{T}_{\text{est},i}$ are the ground truth and estimated poses at frame $i$, respectively, and $N$ is the total number of frames. The results comparing my improved algorithm with the original VINS_Fusion are shown in the table below.

Table 5: Comparison of ATE RMSE on EuRoc Dataset
Dataset Sequence	RMSE (Our Method)	RMSE (VINS_Fusion)
V2_03_difficult	0.3375 m	0.3395 m

As seen, my method achieves a slightly lower ATE, indicating improved accuracy. This is attributed to the enhanced initialization and optimization steps, which better handle the biases and scale uncertainties in multirotor drone operations.

For real-world testing, I deployed the system on a multirotor drone in an outdoor environment. Since absolute ground truth is unavailable outdoors, I used RTK-GNSS measurements as a reference. The VIO system was initialized by aligning the visual coordinate system with the GNSS frame. The drone was flown in an open area, and the pose estimates from the VIO were compared to the RTK data. The longitude and latitude over time are plotted, showing that the VIO outputs closely match the RTK values, with errors in the centimeter range. This demonstrates that the system can provide reliable navigation for multirotor drones in GNSS-denied environments.

The effectiveness of the VIO system for multirotor drones can be further analyzed through the covariance of the estimated states. The uncertainty in the pose estimate grows over time due to the incremental nature of odometry, but the fusion with IMU data helps bound this drift. The covariance propagation can be modeled using the error state Kalman filter formulation, where the state error $\delta \mathbf{X}$ evolves as:

$$\delta \mathbf{X}_{k+1} = \mathbf{F}_k \delta \mathbf{X}_k + \mathbf{G}_k \mathbf{w}_k$$

where $\mathbf{F}_k$ is the state transition matrix, $\mathbf{G}_k$ is the noise input matrix, and $\mathbf{w}_k$ is the process noise. For multirotor drones, the process noise includes accelerometer and gyroscope noises, which are characterized by their power spectral densities. The covariance matrix $\mathbf{P}_k$ is updated as:

$$\mathbf{P}_{k+1} = \mathbf{F}_k \mathbf{P}_k \mathbf{F}_k^T + \mathbf{G}_k \mathbf{Q}_k \mathbf{G}_k^T$$

where $\mathbf{Q}_k$ is the noise covariance matrix. In my implementation, I tuned these parameters to match the dynamics of multirotor drones, ensuring stable and accurate estimates.

In conclusion, the visual inertial odometry system I designed for multirotor drones addresses the critical need for reliable navigation in GNSS-denied environments. By integrating high-performance hardware, a robust software framework, and an optimized algorithm, the system achieves real-time pose estimation with centimeter-level accuracy. The experiments confirm that it can seamlessly replace GNSS-based navigation when needed, making it a valuable tool for various applications, from surveillance to delivery. Future work will focus on enhancing the algorithm’s robustness to visual degradation and extending it to swarm operations for multiple multirotor drones.

The development of this system underscores the importance of sensor fusion in autonomous systems. For multirotor drones, which operate in complex and unpredictable environments, the ability to rely on visual and inertial data alone significantly expands their operational scope. The use of nonlinear optimization ensures high accuracy, while the efficient implementation on embedded hardware meets the real-time demands. I believe that this approach will pave the way for more advanced navigation solutions in the field of aerial robotics.

Moreover, the communication protocol and modular design allow for easy integration with existing multirotor drone platforms. The ROS-based messaging system enables interoperability with other sensors and algorithms, facilitating further research and development. As multirotor drones continue to evolve, systems like this will play a crucial role in enabling fully autonomous missions in challenging conditions.

In summary, the key contributions of this work include the design of a compact hardware system, the development of a real-time software architecture, and the implementation of an improved VIO algorithm tailored for multirotor drones. The experimental results validate the system’s performance, demonstrating its practicality and effectiveness. I am confident that this technology will contribute to the advancement of multirotor drone capabilities, particularly in scenarios where GNSS is unavailable or unreliable.