In the rapidly evolving field of drone technology, accurate state estimation for Unmanned Aerial Vehicles (UAVs) is critical for autonomous navigation, environmental monitoring, and complex mission execution. Traditional methods for nonlinear state estimation often struggle with high errors, poor robustness to noise, and computational inefficiencies, particularly in time-varying systems. This article introduces an Extended Sparse Gaussian Variational Inference (ESGVI) approach for batch state estimation and parameter learning in nonlinear systems, specifically tailored for UAV applications. By leveraging Gaussian variational reasoning, we transform the state estimation problem into an approximation of the true posterior distribution, incorporating learnable parameters to enhance accuracy. The method utilizes Stein’s lemma, sparsity in covariance matrices, and Gaussian quadrature to derive an efficient iterative scheme. Furthermore, we employ expectation-maximization (EM) for noise parameter learning and introduce an inverse Wishart prior to mitigate the impact of measurement noise and outliers. Through extensive simulations on a UAV trajectory model, we demonstrate that our approach achieves precise trajectory estimation without requiring true noise values, effectively suppressing disturbances and improving robustness in real-world scenarios. The integration of drone technology and advanced inference methods underscores the potential for autonomous systems in dynamic environments.
The core of our method lies in the Gaussian Variational Inference (GVI) framework, which approximates the posterior distribution of states given sensor observations. For a state vector \( \mathbf{x} \in \mathbb{R}^N \) and observation data \( \mathbf{z} \in \mathbb{R}^D \), we assume a Gaussian distribution for the estimated posterior:
$$ q(\mathbf{x}) = \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma}) = \frac{1}{\sqrt{(2\pi)^N |\boldsymbol{\Sigma}|}} \exp\left( -\frac{1}{2} (\mathbf{x} – \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} – \boldsymbol{\mu}) \right) $$
where \( \boldsymbol{\mu} \) is the mean and \( \boldsymbol{\Sigma} \) is the covariance matrix. The goal is to minimize the Kullback-Leibler (KL) divergence between the true posterior \( p(\mathbf{x} | \mathbf{z}) \) and the approximation \( q(\mathbf{x}) \), leading to the loss function:
$$ \mathcal{V}(q) = \mathbb{E}_q[\phi(\mathbf{x})] + \frac{1}{2} \ln |\boldsymbol{\Sigma}^{-1}| $$
Here, \( \phi(\mathbf{x}) = -\ln p(\mathbf{x}, \mathbf{z}) \) represents the negative log-likelihood. By introducing learnable parameters \( \boldsymbol{\theta} \), such as noise covariances, the loss function becomes \( \mathcal{V}(q, \boldsymbol{\theta}) = \mathbb{E}_q[\phi(\mathbf{x}, \boldsymbol{\theta})] + \frac{1}{2} \ln |\boldsymbol{\Sigma}^{-1}| \). The iterative updates for the mean and inverse covariance are derived using Stein’s lemma and Gaussian-Newton optimization:
$$ \boldsymbol{\Sigma}^{-1}_{(i+1)} = \mathbb{E}_{q_{(i)}} \left[ \frac{\partial^2 \phi(\mathbf{x}, \boldsymbol{\theta})}{\partial \mathbf{x}^T \partial \mathbf{x}} \right] $$
$$ \boldsymbol{\Sigma}^{-1}_{(i+1)} \delta \boldsymbol{\mu} = – \mathbb{E}_{q_{(i)}} \left[ \frac{\partial \phi(\mathbf{x}, \boldsymbol{\theta})}{\partial \mathbf{x}^T} \right] $$
$$ \boldsymbol{\mu}_{(i+1)} = \boldsymbol{\mu}_{(i)} + \delta \boldsymbol{\mu} $$
To handle computational complexity, we exploit the sparsity of the inverse covariance matrix and decompose the negative log-likelihood into factors. For \( K \) factors, \( \phi(\mathbf{x}, \boldsymbol{\theta}) = \sum_{k=1}^K \phi_k(\mathbf{x}_k, \boldsymbol{\theta}) \), where \( \mathbf{x}_k \) is a subset of states associated with the \( k \)-th factor. Using projection matrices \( \mathbf{P}_k \) such that \( \mathbf{x}_k = \mathbf{P}_k \mathbf{x} \), the marginal distributions \( q_k(\mathbf{x}_k) = \mathcal{N}(\boldsymbol{\mu}_k, \boldsymbol{\Sigma}_{kk}) \) allow for efficient updates:
$$ \boldsymbol{\Sigma}^{-1}_{(i+1)} = \sum_{k=1}^K \mathbf{P}_k^T \mathbb{E}_{q_{(i)k}} \left[ \frac{\partial^2 \phi_k(\mathbf{x}_k, \boldsymbol{\theta})}{\partial \mathbf{x}_k^T \partial \mathbf{x}_k} \right] \mathbf{P}_k $$
$$ \boldsymbol{\Sigma}^{-1}_{(i+1)} \delta \boldsymbol{\mu} = – \sum_{k=1}^K \mathbf{P}_k^T \mathbb{E}_{q_{(i)k}} \left[ \frac{\partial \phi_k(\mathbf{x}_k, \boldsymbol{\theta})}{\partial \mathbf{x}_k^T} \right] $$
This sparse formulation reduces computational load while maintaining accuracy, making it suitable for real-time applications in drone technology.

Parameter learning is integral to our ESGVI framework, particularly for adapting to unknown noise statistics in UAV systems. We employ the expectation-maximization (EM) algorithm to learn parameters such as measurement noise covariances. The EM decomposition of the negative log-likelihood is given by:
$$ -\ln p(\mathbf{z} | \boldsymbol{\theta}) = \int q(\mathbf{x}) \ln \frac{p(\mathbf{x} | \mathbf{z}, \boldsymbol{\theta})}{q(\mathbf{x})} d\mathbf{x} – \int q(\mathbf{x}) \ln \frac{p(\mathbf{x}, \mathbf{z} | \boldsymbol{\theta})}{q(\mathbf{x})} d\mathbf{x} $$
The second term is the evidence lower bound (ELBO). In the E-step, we fix parameters and optimize the variational distribution, while in the M-step, we update parameters based on the current distribution. For a constant covariance matrix \( \mathbf{W} \), the negative log-likelihood is \( \phi(\mathbf{x}, \mathbf{W}) = \frac{1}{2} \left[ \mathbf{e}(\mathbf{x})^T \mathbf{W}^{-1} \mathbf{e}(\mathbf{x}) – \ln |\mathbf{W}^{-1}| \right] \), where \( \mathbf{e}(\mathbf{x}) \) is the error vector. The M-step update for \( \mathbf{W} \) is:
$$ \mathbf{W}_{\text{min}} = \frac{1}{K} \sum_{k=1}^K \mathbb{E}_{q_k} \left[ \mathbf{e}_k(\mathbf{x}_k) \mathbf{e}_k(\mathbf{x}_k)^T \right] $$
For UAV trajectory estimation, we consider a white-noise-on-acceleration (WNOA) prior to model dynamics. The state vector includes position, velocity, and orientation, and the prior factor for the WNOA model is defined as:
$$ \phi_p(\mathbf{x}, \mathbf{Q}_C) = \sum_{k=2}^K \phi_{p,k}(\mathbf{x}_{k-1,k}, \mathbf{Q}_C) = \sum_{k=2}^K \frac{1}{2} \left( \mathbf{e}_{p,k}^T \mathbf{Q}_k^{-1} \mathbf{e}_{p,k} + \ln |\mathbf{Q}_k| \right) $$
where \( \mathbf{e}_{p,k} \) is the error term involving pose and velocity, and \( \mathbf{Q}_k = \mathbf{Q}_{\Delta t} \otimes \mathbf{Q}_C \) is the covariance with power spectral density matrix \( \mathbf{Q}_C \). The optimal estimate for \( \mathbf{Q}_C \) is derived as:
$$ \mathbf{Q}_{C_{i,j}} = \frac{\text{tr} \left( \sum_{k=2}^K \mathbb{E}_{q_{k-1,k}} \left[ \mathbf{e}_{p,k} \mathbf{e}_{p,k}^T \right] (\mathbf{Q}_{\Delta t}^{-1} \otimes \mathbf{1}_{i,j}) \right)}{(K-1) \dim(\mathbf{Q}_{\Delta t})} $$
To enhance robustness against outliers and time-varying noise, we incorporate an inverse Wishart (IW) prior for the covariance matrices. The joint likelihood becomes \( p(\mathbf{x}, \mathbf{z}, \mathbf{A}) = p(\mathbf{x}, \mathbf{z} | \mathbf{A}) p(\mathbf{A}) \), where \( \mathbf{A} = (\mathbf{A}_1, \mathbf{A}_2, \dots, \mathbf{A}_K) \) are covariance matrices treated as random variables. The IW prior for each \( \mathbf{A}_k \) is:
$$ p(\mathbf{A}_k = \boldsymbol{\Upsilon}_k) = \frac{|\boldsymbol{\Psi}|^{\frac{\nu}{2}}}{2^{\frac{\nu d}{2}} \Gamma_d\left( \frac{\nu}{2} \right)} |\boldsymbol{\Upsilon}_k|^{-\frac{\nu + d + 1}{2}} \exp\left( -\frac{1}{2} \text{tr}(\boldsymbol{\Psi} \boldsymbol{\Upsilon}_k^{-1}) \right) $$
where \( d \) is the dimension, \( \boldsymbol{\Psi} \) is a positive definite scale matrix, and \( \nu > d – 1 \) is the degrees of freedom. The loss function with IW prior is:
$$ \mathcal{V}(q’, \boldsymbol{\Upsilon}, \boldsymbol{\Psi}) = \sum_{k=1}^K \mathbb{E}_{q_k} \left[ \phi^m_k(\mathbf{x}_k, \boldsymbol{\Upsilon}_k) + \phi^w_k(\boldsymbol{\Upsilon}_k, \boldsymbol{\Psi}) \right] + \frac{1}{2} \ln |\boldsymbol{\Sigma}^{-1}| $$
In the E-step, we optimize \( \boldsymbol{\Upsilon}_k \) with fixed \( \boldsymbol{\Psi} \):
$$ \boldsymbol{\Upsilon}_k = \frac{1}{\alpha} \boldsymbol{\Psi} + \frac{1}{\alpha} \mathbb{E}_{q_k} \left[ \mathbf{e}_k(\mathbf{x}_k) \mathbf{e}_k(\mathbf{x}_k)^T \right] $$
where \( \alpha = \nu + d + 2 \). In the M-step, we update \( \boldsymbol{\Psi} \) with fixed \( \boldsymbol{\Upsilon} \):
$$ \boldsymbol{\Psi}^{-1} = \frac{1}{K \nu} \sum_{k=1}^K \boldsymbol{\Upsilon}_k^{-1} $$
A constraint constant \( \beta \) is applied to ensure positive definiteness: \( \boldsymbol{\Psi}_c \leftarrow (\beta \boldsymbol{\Psi}^{-1})^{\frac{1}{d}} \boldsymbol{\Psi} \). This IW prior effectively reduces the influence of measurement noise and outliers, crucial for reliable UAV operations.
We validate our ESGVI method through simulations on a UAV trajectory estimation problem. The UAV model involves nonlinear motion and observation equations:
$$ \mathbf{x}_k = f(\mathbf{x}_{k-1}, \mathbf{u}_k, \mathbf{w}_k), \quad \mathbf{y}_k = g(\mathbf{x}_k, \mathbf{n}_k) $$
where \( \mathbf{w}_k \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Omega}_k) \) is process noise and \( \mathbf{n}_k \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Psi}_k) \) is measurement noise. The state vector includes pose and landmark positions:
$$ \mathbf{x} = [\mathbf{x}_0, \mathbf{x}_1, \dots, \mathbf{x}_K, \mathbf{m}_1, \dots, \mathbf{m}_L]^T, \quad \mathbf{x}_k = [x_k, y_k, \theta_k, \dot{x}_k, \dot{y}_k, \dot{\theta}_k]^T, \quad \mathbf{m}_l = [x_l, y_l]^T $$
Factors include linear priors, nonlinear odometry, and bearing measurements. The joint likelihood is:
$$ -\ln p(\mathbf{x}, \mathbf{z}) = \sum_{k=0}^K \phi_k + \sum_{k=0}^K \psi_k + \sum_{k=1}^K \sum_{l=1}^L \psi_{l,k} + \text{const} $$
We learn parameters \( \mathbf{Q}_C \), \( \boldsymbol{\Psi} \), and \( \mathbf{W}_{gt} \) using the EM algorithm with IW prior. The dataset comprises 10,000 time steps split into 10 subsequences, each with 1,000 steps. We compare trajectory estimates with and without true state values, demonstrating the method’s accuracy and robustness.
The following table summarizes the mean absolute translation errors for trajectory estimation across subsequences, highlighting the performance of ESGVI compared to standard GVI:
| Sequence | ESGVI without True Values (m) | ESGVI with True Values (m) | Standard GVI (m) |
|---|---|---|---|
| 1 | 0.2306 | 0.2335 | 0.3534 |
| 2 | 0.1223 | 0.1196 | 0.2166 |
| 3 | 0.1893 | 0.1815 | 0.2993 |
| 4 | 0.0895 | 0.0963 | 0.1452 |
| 5 | 0.2059 | 0.2264 | 0.3198 |
| 6 | 0.1146 | 0.1155 | 0.2116 |
| 7 | 0.1324 | 0.1365 | 0.2266 |
| 8 | 0.0979 | 0.0997 | 0.1336 |
| 9 | 0.1996 | 0.2115 | 0.3725 |
| 10 | 0.1556 | 0.1650 | 0.3355 |
The average translation error for ESGVI without true values is 0.1538 m, compared to 0.1586 m with true values and 0.2614 m for standard GVI. This indicates that ESGVI achieves high accuracy even without ground truth, making it suitable for practical UAV applications where true states are unavailable.
To assess robustness, we inject additional noise into pose measurements with standard deviation \( \sigma \) ranging from 0.25 m to 1 m. The results show that while measurement errors increase, trajectory estimation errors remain below 0.5 m, demonstrating the method’s resilience in noisy environments common in drone technology:
| Measurement Error (m) | Estimation Translation Error (m) |
|---|---|
| 0.1133 | 0.1653 |
| 0.3069 | 0.2745 |
| 0.5341 | 0.3016 |
| 0.9716 | 0.3651 |
| 1.3842 | 0.4126 |
| 1.6530 | 0.4679 |
Furthermore, we evaluate the impact of outliers by introducing perturbations in 5% of measurements. The IW prior significantly reduces estimation errors, as shown below:
| Sequence | Without IW Prior (m) | With IW Prior (m) |
|---|---|---|
| 1 | 6.2593 | 0.1336 |
| 2 | 7.1566 | 0.1789 |
| 3 | 5.5535 | 0.0962 |
| 4 | 5.9949 | 0.1466 |
| 5 | 6.5613 | 0.1864 |
| 6 | 6.3380 | 0.1536 |
| 7 | 6.8543 | 0.1992 |
| 8 | 5.9576 | 0.0861 |
| 9 | 6.8122 | 0.2016 |
| 10 | 5.3641 | 0.1139 |
The average error with IW prior is 0.1496 m, compared to 6.2852 m without, underscoring its effectiveness in suppressing outliers. This is particularly important for Unmanned Aerial Vehicles operating in cluttered or dynamic environments.
In conclusion, the Extended Sparse Gaussian Variational Inference method provides a robust and accurate framework for nonlinear state estimation and parameter learning in UAV systems. By integrating Gaussian variational reasoning, sparsity exploitation, and inverse Wishart priors, our approach handles noise and outliers effectively, enabling precise trajectory estimation without reliance on true values. The advancements in drone technology and autonomous navigation benefit greatly from such probabilistic methods, ensuring reliability in real-world applications. Future work will focus on extending this framework to multi-UAV systems and real-time implementation for enhanced scalability.
