In recent years, with the continuous advancement of information technology, quadcopters have become a focal point of research. Simultaneously, multi-rotor unmanned aerial vehicles with vertical take-off and landing capabilities are increasingly widely applied in various military and civilian domains. However, in indoor environments, due to factors such as signal occlusion and wall effects, traditional positioning technologies like GPS often struggle to provide accurate positioning information. Therefore, this paper proposes a multi-sensor fusion algorithm that significantly enhances the accuracy and stability of indoor positioning by complementing, redundantly processing, and fusing information from multiple sensors. This method offers an effective solution and advanced control strategy for achieving precise positioning of quadcopters in indoor environments.
The quadcopter hardware platform consists of four main components: detection module, control module, drive module, and power module. The detection module primarily integrates sensory devices such as gyroscopes, accelerometers, digital compasses, barometers, infrared sensors, ultrasonic sensors, and position sensors. The control module includes key components like microprocessors, general-purpose processors, digital signal processors, and video processors. Establishing a kinematic model for the quadcopter involves defining coordinate systems and motion states. The model includes two coordinate systems: the global coordinate system and the body coordinate system. The global coordinate system is fixed to the ground, with its origin set near the take-off point of the quadcopter, the positive x-axis pointing north (N), the positive y-axis pointing east (E), and the positive z-axis pointing down (D). The body coordinate system is fixed to the quadcopter, with its origin at the center of gravity, the positive x-axis pointing forward (F), the positive y-axis pointing right (R), and the positive z-axis pointing down (D). The motion state of the quadcopter in the body coordinate system is determined by its linear and angular velocities. Thus, the position information of the quadcopter can be represented as:
$$p_e = [x \quad y \quad z]^T, \quad v = [v_x \quad v_y \quad v_z]^T$$
The dynamic model of the quadcopter takes the lift and torque generated by the propellers as inputs and outputs the quadcopter’s velocity and angular velocity, i.e., the position and attitude of the quadcopter. The attitude information is given by:
$$\Phi = [\phi \quad \theta \quad \psi]^T = W \cdot \omega_b$$
where $\Phi$ represents the attitude angles of the quadcopter, $\omega_b$ is the angular velocity, and $W$ is the rotation matrix between them, expressed as:
$$W = \begin{bmatrix}
1 & 0 & 0 \\
0 & \cos\phi & \sin\phi \\
0 & -\sin\phi & \cos\phi
\end{bmatrix}
\begin{bmatrix}
\cos\theta & 0 & -\sin\theta \\
0 & 1 & 0 \\
\sin\theta & 0 & \cos\theta
\end{bmatrix}
\begin{bmatrix}
\cos\psi & \sin\psi & 0 \\
-\sin\psi & \cos\psi & 0 \\
0 & 0 & 1
\end{bmatrix}$$
This rotation matrix is crucial for describing the attitude changes of the quadcopter, converting the motion state from the body coordinate system to the global coordinate system.
In sensor information fusion technology, data fusion between sensors is categorized into three levels: data-level fusion, feature-level fusion, and decision-level fusion. The specifics are summarized in the table below.
| Classification | Fusion Definition | Common Methods |
|---|---|---|
| Data-Level Fusion | Fusion of raw data from a series of sensors to enable inference and recognition, improving data accuracy and reliability. | D-S evidence theory, Kalman filtering, weighted average, multi-Bayesian estimation |
| Feature-Level Fusion | Integration and processing of feature information from observed data collected by various sensors to comprehensively understand the target’s state and behavior. | Weighted fusion, feature concatenation, feature stacking, feature selection |
| Decision-Level Fusion | Fusion of results obtained from processing and analyzing observed data by individual sensors. | Fuzzy logic, neural networks, expert systems, artificial intelligence |
Data-level fusion is primarily suitable for homogeneous sensors, leveraging complementarity between data to reflect observed features more comprehensively and accurately. This approach maximizes the retention of original data information while avoiding data loss during fusion. However, as sensor types and data complexity increase, the difficulty and computational load of data-level fusion also rise. Feature-level fusion automatically discards some raw data but adapts well to heterogeneous sensor data. By extracting features from different sensors and combining them organically, it enables multi-angle, multi-level descriptions of the observed object. However, it may lose some detailed information, leading to less comprehensive results. Decision-level fusion aggregates and processes data from distributed sensors, offering robustness and flexibility. It can adaptively select fusion algorithms and strategies based on task requirements and scenarios. Nonetheless, it requires certain learning and reasoning capabilities for higher-level data analysis. For quadcopter position and attitude measurement, data-level fusion is adopted from a practical application perspective, especially in GPS-denied indoor environments.
To further enhance indoor positioning accuracy, an improved algorithm for indoor quadcopter positioning technology is proposed. This algorithm first analyzes data on a two-dimensional plane in the quadcopter’s flight space, then extends to three-dimensional spatial data to obtain accurate position data and distance data relative to reference points. Through attitude estimation, optimal matching with reference data is achieved, and using lidar, the pose information between the current scan point and the scan sequence is calculated based on angular scanning information.
Preprocessing of lidar data involves using a median filter to remove noise and improve accuracy. By extending measurement time, unnecessary interpolation calculations between scan points of different targets are avoided. Additionally, the lidar sensor employs a threshold method to mark data. Assuming all data points are in the same region, the start point $(r_1, \xi_1)$ is the first point, and the end point $(r_n, \xi_n)$ is the last point. The interpolation $D$ between the $i$-th and $(i-1)$-th points is calculated, and by comparing $D$ with the threshold Max_diff, it is determined whether the point belongs to the same region as the previous data point. This method ultimately yields multiple disjoint regions.
After data preprocessing, projection transformation is applied to the data points. Assuming the current scan data is $\{(r_{c1}, \xi_{c1}), (r_{c2}, \xi_{c2}), \ldots, (r_{cn}, \xi_{cn}), \ldots\}$ and the reference scan data is $\{(r_{r1}, \xi_{r1}), (r_{r2}, \xi_{r2}), \ldots, (r_{rn}, \xi_{rn}), \ldots\}$, the coordinates of the current scan position point in the reference scan coordinate system are $(x_c, y_c, \psi_c)$. The polar coordinates of the $i$-th point $(r’_{ci}, \xi’_{ci})$ are given by:
$$r’_{ci} = \sqrt{(r_{ci} \cos(\xi_{ci} + \psi_c) + x_c)^2 + (r_{ci} \sin(\xi_{ci} + \psi_c) + y_c)^2}$$
$$\xi’_{ci} = \arctan\left(r_{ci} \sin(\xi_{ci} + \psi_c) + y_c, r_{ci} \cos(\xi_{ci} + \psi_c) + x_c\right)$$
After obtaining the relationship between the new interpolation points and the data points in the current and reference scans, displacement estimation can be performed. To reduce the impact of registration errors on displacement estimation, a method based on least squares is proposed. Weighted least squares is used and linearized. The noise vector matrix $H$ is expressed as:
$$H = \begin{bmatrix}
\frac{\partial r’_{c1}}{\partial x_c} & \frac{\partial r’_{c1}}{\partial y_c} \\
\frac{\partial r’_{c2}}{\partial x_c} & \frac{\partial r’_{c2}}{\partial y_c} \\
\vdots & \vdots \\
\frac{\partial r’_{cn}}{\partial x_c} & \frac{\partial r’_{cn}}{\partial y_c}
\end{bmatrix}$$
The weighted sum of squared errors function is:
$$S = \sum_{i=1}^{n} w_i (r’_{ci} – r_{ri})^2$$
Taking the derivative and setting it to zero yields:
$$\frac{\partial S}{\partial \Delta} = 2 \sum_{i=1}^{n} w_i H_i^T (r’_{ci} – r_{ri}) = 0$$
Solving for $\Delta$:
$$\Delta = (H^T W H)^{-1} H^T W (r’_{c} – r_{r})$$
where $W$ is the weight matrix. The weight expression is:
$$w_i = \frac{1}{d_i^2 + \epsilon}$$
Here, $d_i$ represents the distance, and when $d_i$ is smaller, the influence of erroneous points on displacement estimation is effectively reduced.
For yaw angle estimation, the yaw angle of the current scan position point in the reference scan coordinate system is denoted as $\psi_c$. A change in $\psi_c$ means the lidar’s angular measurement range shifts left or right. Assuming the displacement of the current scan position point relative to the reference scan position point is $(x_c, y_c)$, and the same target is detected in two scans, to find the angle estimate that makes the current scan data points completely overlap with the reference scan data points, the yaw angle of the current scan position point is assumed to vary within $\pm 20^\circ$. All current scan projection points’ angles $(r”_{ci}, \xi”_{ci})$ are offset by $-20^\circ, -19^\circ, \ldots, 19^\circ, 20^\circ$, and the absolute mean error for each offset is calculated. Thus, yaw angle estimation transforms into finding the minimum of a parabola, with the parabola equation:
$$e(\psi) = a \psi^2 + b \psi + c$$
Taking the derivative:
$$\frac{de}{d\psi} = 2a \psi + b$$
Setting it to zero gives:
$$\psi_{\text{min}} = -\frac{b}{2a}$$
To find $a$ and $b$, substitute $e_{t-1}$, $e_t$, $e_{t+1}$ into the parabola equation:
$$\begin{cases}
e_{t-1} = a (\psi_{t-1})^2 + b \psi_{t-1} + c \\
e_t = a (\psi_t)^2 + b \psi_t + c \\
e_{t+1} = a (\psi_{t+1})^2 + b \psi_{t+1} + c
\end{cases}$$
Solving for $a$ and $b$:
$$a = \frac{e_{t-1} – 2e_t + e_{t+1}}{2 \Delta \psi^2}$$
$$b = \frac{e_{t+1} – e_{t-1}}{2 \Delta \psi}$$
The horizontal coordinate of the lowest point is:
$$\psi_c = \psi_t – \frac{b}{2a}$$
The positioning algorithm flowchart is as follows. In quadcopter attitude estimation, the first step is to establish a dynamic model that includes kinematic equations and sensor observation models. In indoor environments, without GPS, the quadcopter relies on IMU sensors to obtain attitude and acceleration information. Specifically, the IMU detects acceleration and angular velocity, and through integration, attitude information is derived. After obtaining initial attitude estimates, the system reads reference and current scan data points. To improve data quality, median filtering is applied to remove noise points, and distant points and regions are marked, followed by transformation projection of current scan data. When the iteration count reaches the maximum, the system outputs the relative pose information of the current scan point. If not, displacement estimation continues, and the condition $|dx| + |dy| < 1$ is checked. If satisfied, offset estimation is performed to obtain the current relative pose; otherwise, the iteration count is rechecked to decide whether to continue the loop. Due to the complexity and variability of indoor environments, regular algorithm optimization and adjustment are necessary to ensure accuracy and real-time performance.

For quadcopter positioning, 3D attitude estimation involves rotating and converting attitude angle information measured by the inertial navigation module, projecting it onto the 2D x-y plane for position estimation. Notably, the lidar origin coincides with the quadcopter body origin, and the lidar’s x and y axes align with the quadcopter’s flight direction. To study the quadcopter’s pose in the world coordinate system, a virtual coordinate system is used where the body coordinate system transforms along the x and y axes to obtain the projection coordinate system. Standard Euler angles $\eta = [\phi, \theta, \psi]$ represent the quadcopter’s attitude angles, where $\phi$ is the roll angle (rotation around the x-axis), $\theta$ is the pitch angle (rotation around the y-axis), and $\psi$ is the yaw angle (rotation around the z-axis). In attitude estimation, the transformation matrix is typically represented by a 3×3 rotation matrix $R$, determined by these attitude angles. The relationship between Euler angles and the rotation matrix is:
$$R = R_z(\psi) R_y(\theta) R_x(\phi)$$
where $R_z$, $R_y$, $R_x$ are rotation matrices around the z, y, and x axes, respectively. The transformation matrices from the body coordinate system $B$ to the virtual coordinate system $V$ along various axes are:
$$T_x = \begin{bmatrix}
1 & 0 & 0 \\
0 & \cos\phi & \sin\phi \\
0 & -\sin\phi & \cos\phi
\end{bmatrix}$$
$$T_y = \begin{bmatrix}
\cos\theta & 0 & -\sin\theta \\
0 & 1 & 0 \\
\sin\theta & 0 & \cos\theta
\end{bmatrix}$$
Through rotational transformations of the body coordinate system $B$ around the x and y axes to the virtual coordinate system $V$, we have:
$$T = T_y T_x$$
Assuming the polar coordinates of a lidar scan data point in the body coordinate system $B$ are $(r_i, \xi_i)$, the Cartesian coordinates are $[P_i]_B = [r_i \cos(\xi_i), r_i \sin(\xi_i), 0]^T$. After transformation by $T$, we get:
$$[P’_i]_V = T [P_i]_B = [r_i \cos(\xi_i), r_i \sin(\xi_i), 0]^T$$
Converting $[P’_i]_V$ to polar coordinates and using angle information for pose estimation:
$$r’_i = \sqrt{(P’_{ix})^2 + (P’_{iy})^2}$$
$$\xi’_i = \arctan(P’_{iy}, P’_{ix})$$
In practical tests, to obtain more accurate position scan information, the quadcopter’s position lidar was deployed at diagonal points A and B in the laboratory. Using different positions at points A and B, reference and current scan maps were constructed. After acquiring data, the PSM method was used for offline processing of these two datasets, and the results were compared with those obtained using the conventional ICP method. Scenario one is a laboratory environment, relatively complex compared to an empty indoor space, with the current scan pose set as $x_c = 45$ cm, $y_c = 60$ cm, $\psi_c = 45^\circ$. Scenario two is an indoor environment with numerous obstacles, with the scan pose set as $x_c = -60$ cm, $y_c = -25$ cm, $\psi_c = -90^\circ$. Scenario three is an outdoor corridor of the drone laboratory, relatively空旷, with the scan pose set as $x_c = -65$ cm, $y_c = 60$ cm, $\psi_c = 0^\circ$. The trajectory forms a closed triangle. After selecting the take-off reference point, when the quadcopter returns to the reference point, the errors in the x, y flight directions, and yaw angle are: -12.63 cm, 8.84 cm, and 7.09°, respectively. Thus, the pose estimation method based on angle matching can be applied to indoor position measurement of quadcopters. In this experiment, the PSM method was first used to find the relative pose between two consecutive lidar scans, and then simple accumulation was performed to derive the initial position of the quadcopter. The table below shows the pose estimation results for different scenarios.
| Scenario | $\hat{x}_c$ (cm) | $\hat{y}_c$ (cm) | $\hat{\psi}_c$ (°) | Time (ms) | Iterations |
|---|---|---|---|---|---|
| Scenario 1 | 42.65 | 63.25 | -44.9 | 178.51 | 24 |
| Scenario 2 | -61.65 | -25.37 | -90.3 | 75.62 | 10 |
| Scenario 3 | -65.84 | -7.15 | 0.92 | 208.1 | 26 |
Experimental results indicate that the multi-sensor fusion algorithm based on Kalman filtering proposed in this paper can effectively improve the accuracy and stability of indoor positioning technology for quadcopters. Compared to traditional single-sensor positioning techniques, this method enhances positioning accuracy by over 30% and stability by more than 20%.
In conclusion, this study focuses on quadcopters, combining quadcopter control technology and multi-sensor fusion technology to enable navigation and positioning capabilities in complex indoor environments lacking GPS signals using devices such as 2D lidar, inertial sensors, and sonar. By integrating more advanced 3D lidar and odometry, the quadcopter can obtain higher-precision sensor data and flight parameters in indoor environments, improving computational accuracy to successfully complete complex tasks like simultaneous localization and mapping, autonomous obstacle avoidance, and autonomous navigation.
