I have dedicated significant effort to the development of an accurate predictive model for the landing loads experienced by the landing gear of UAV drones. The structural integrity and operational safety of UAV drones during landing phases are critically dependent on the loads transmitted through the landing gear. Traditional approaches often rely on single‑task models that independently predict each load component, which may not fully exploit the inherent correlations among different landing gear loads. In this work, I propose a multi‑task learning framework based on the Multi‑gate Mixture of Experts (MMoE) architecture to simultaneously predict the left and right main landing gear vertical loads of UAV drones. By leveraging flight parameters recorded during landing maneuvers, the MMoE model captures complex nonlinear relationships and substantially improves prediction accuracy. The methodology is validated using a comprehensive dataset collected from repeated landing tests of a representative UAV drone platform.
1. Introduction
UAV drones are increasingly deployed in a wide range of applications, from surveillance and agriculture to logistics and disaster response. One of the most critical phases of UAV drone operation is the landing, during which the landing gear must absorb impact energy and transmit loads to the airframe. Accurate prediction of these landing loads is essential for structural health monitoring, fatigue life assessment, and predictive maintenance. However, the relationship between flight parameters (e.g., angle of attack, sideslip angle, speed, acceleration) and landing gear loads is highly nonlinear and multivariate. Traditional physics‑based models are computationally expensive and often require detailed structural simulations.
In recent years, machine learning methods have shown great promise in modeling such complex systems. For UAV drones, researchers have employed neural networks, ensemble methods, and deep learning to predict loads and strains. Most existing studies focus on single‑task prediction, where each load component is treated independently. Yet the loads on the left and right landing gears of a UAV drone are naturally correlated due to the symmetric design and coupling during landing. Multi‑task learning (MTL) offers a natural way to exploit these correlations by sharing representations across tasks. Among MTL architectures, the MMoE model introduces gating mechanisms that allow each task to adaptively select a weighted combination of expert sub‑networks, thereby capturing both shared and task‑specific features.
In this paper, I present a comprehensive study on applying MMoE to landing load prediction for UAV drones. I compare its performance against two baseline models: a single‑task feedforward neural network (ST) and a Shared‑Bottom multi‑task model. The results demonstrate that MMoE achieves significantly lower prediction errors and higher goodness‑of‑fit, with over 84% of test samples having relative errors below 2%. The proposed framework is both efficient and accurate, making it suitable for real‑time load monitoring and predictive maintenance of UAV drones.
The rest of the paper is organized as follows. Section 2 formulates the prediction problem and describes the dataset. Section 3 details the multi‑task learning models, including mathematical formulations. Section 4 presents the experimental setup, including feature selection via Pearson correlation, hyperparameter tuning, and evaluation metrics. Section 5 discusses the results, including tables and comparative analysis. Finally, Section 6 concludes the paper with key findings and future directions.
2. Problem Formulation
Let us denote the dataset collected from a series of landing tests of a UAV drone. Each sample corresponds to the instantaneous state of the UAV drone during the landing phase. The dataset contains 52 flight parameters, including angle of attack, sideslip angle, airspeed, vertical acceleration, pitch rate, etc. After a preliminary dimensionality reduction using the methods described in previous work, I retained 9 key flight parameters (denoted as A through I). The outputs are the vertical loads on the left and right main landing gears, which are the two prediction tasks (Task 1 and Task 2). The dataset comprises 45,149 samples, with 40,634 used for training and 4,515 for testing, following a 9:1 random split. The goal is to learn a mapping f: x → (y₁, y₂) where x ∈ ℝ⁹ and y₁, y₂ ∈ ℝ represent the two landing loads.
The two tasks are naturally related because the loads on symmetric landing gears of the UAV drone are correlated. Indeed, the Pearson correlation coefficient between the two output variables is 0.87, indicating a strong positive linear relationship. This correlation suggests that multi‑task learning can be beneficial.
3. Multi‑Task Learning Models
I employ three neural network architectures in this study: a single‑task (ST) feedforward neural network, a Shared‑Bottom multi‑task model, and the proposed MMoE model. All models are implemented with comparable numbers of trainable parameters to ensure a fair comparison.
3.1 Feedforward Neural Network (Single‑Task)
The ST model is a standard deep neural network with 8 hidden layers, each using the ELU activation function. The architecture is [16, 32, 64, 309, 309, 64, 32, 16] neurons, leading to a total of 144,560 parameters. The output layer has a single neuron with linear activation. Two separate ST models are trained independently for Task 1 and Task 2.
3.2 Shared‑Bottom Model
The Shared‑Bottom model consists of a shared bottom network (expert) that extracts common features, followed by two task‑specific tower networks (towers). The shared network has three hidden layers [16, 32, 64] followed by a 246‑dimensional output, which is then fed into the towers. Each tower has three hidden layers [64, 32, 16] and a final linear output. The total parameter count is 144,322. The output for task k is given by
$$ y_k = h_k( f(x) ) $$
where $f(\cdot)$ denotes the shared network and $h_k(\cdot)$ the tower for task $k$.
3.3 MMoE Model
The MMoE model introduces multiple expert networks (each a feedforward network) and a gating network for each task. The gating network learns a soft combination of expert outputs tailored to each task. In my implementation, I use 8 expert networks, each with layers [16, 32, 64] and a 128‑dimensional output. The gating networks are linear layers with softmax activation. The tower networks each have layers [64, 32, 16] and a linear output. The total number of parameters is 144,034. The output for task $k$ is
$$ y_k = h_k\left( \sum_{i=1}^{n} g^k(x)_i f_i(x) \right) $$
where $n = 8$ is the number of experts, $f_i(x)$ is the output of the $i$-th expert, and $g^k(x) = \text{softmax}(W_{gk} x)$ with $W_{gk} \in \mathbb{R}^{n \times d}$.
Table 1 summarizes the layer configurations for the expert and tower networks used in the Shared‑Bottom and MMoE models.
| Layer | Shared‑Bottom Expert | Shared‑Bottom Tower | MMoE Expert | MMoE Tower |
|---|---|---|---|---|
| Input | (None, 9) | (None, 246) | (None, 9) | (None, 128) |
| Dense 1 | (None, 16) | (None, 64) | (None, 16) | (None, 64) |
| Dense 2 | (None, 32) | (None, 32) | (None, 32) | (None, 32) |
| Dense 3 | (None, 64) | (None, 16) | (None, 64) | (None, 16) |
| Output | (None, 246) | (None, 1) | (None, 128) | (None, 1) |
4. Experimental Setup
4.1 Feature Selection via Pearson Correlation
To avoid redundancy and improve model robustness, I performed a Pearson correlation analysis on the 9 flight parameters. Table 2 shows the correlation matrix. The strongest linear correlation occurs between parameters B and E (0.76), while other pairs exhibit weak correlations, confirming that multicollinearity is not a concern. The correlation between the two output tasks is 0.87, further motivating the use of multi‑task learning.
| Param | A | B | C | D | E | F | G | H | I | Task1 | Task2 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 1.00 | 0.12 | 0.05 | 0.08 | 0.10 | 0.03 | 0.07 | 0.01 | 0.09 | 0.45 | 0.43 |
| B | 0.12 | 1.00 | 0.15 | 0.20 | 0.76 | 0.08 | 0.11 | 0.05 | 0.13 | 0.51 | 0.48 |
| C | 0.05 | 0.15 | 1.00 | 0.09 | 0.12 | 0.02 | 0.06 | 0.04 | 0.10 | 0.32 | 0.30 |
| D | 0.08 | 0.20 | 0.09 | 1.00 | 0.18 | 0.07 | 0.13 | 0.03 | 0.11 | 0.47 | 0.44 |
| E | 0.10 | 0.76 | 0.12 | 0.18 | 1.00 | 0.06 | 0.09 | 0.02 | 0.14 | 0.53 | 0.50 |
| F | 0.03 | 0.08 | 0.02 | 0.07 | 0.06 | 1.00 | 0.04 | 0.01 | 0.05 | 0.28 | 0.26 |
| G | 0.07 | 0.11 | 0.06 | 0.13 | 0.09 | 0.04 | 1.00 | 0.02 | 0.08 | 0.36 | 0.34 |
| H | 0.01 | 0.05 | 0.04 | 0.03 | 0.02 | 0.01 | 0.02 | 1.00 | 0.06 | 0.12 | 0.11 |
| I | 0.09 | 0.13 | 0.10 | 0.11 | 0.14 | 0.05 | 0.08 | 0.06 | 1.00 | 0.40 | 0.38 |
| Task1 | 0.45 | 0.51 | 0.32 | 0.47 | 0.53 | 0.28 | 0.36 | 0.12 | 0.40 | 1.00 | 0.87 |
| Task2 | 0.43 | 0.48 | 0.30 | 0.44 | 0.50 | 0.26 | 0.34 | 0.11 | 0.38 | 0.87 | 1.00 |
4.2 Hyperparameter Settings
All models were trained using the Adam optimizer with an initial learning rate of 0.001 for 4,000 epochs. The loss function for each task was the mean squared error (MSE), and the total loss for multi‑task models was a weighted sum:
$$ L_{\text{total}} = w_1 L_1 + w_2 L_2 $$
The weights were set equal (w₁ = w₂ = 1) after initial tuning. The hidden layers used the ELU activation function. Early stopping was applied if the validation loss did not improve for 200 epochs.
Table 3 compares the training time (in seconds) for the three models. MMoE required 3,307 seconds, which is 30% faster than training two separate single‑task models (total 4,733 seconds). The Shared‑Bottom model was the fastest (1,952 seconds), but as will be shown, its accuracy is lower.
| Model | Training Time (s) |
|---|---|
| Single‑Task (Task 1) | 2,553.57 |
| Single‑Task (Task 2) | 2,179.43 |
| Single‑Task (total) | 4,733.00 |
| Shared‑Bottom | 1,952.01 |
| MMoE | 3,307.49 |
5. Results and Discussion
5.1 Prediction Performance
I evaluated the models using the coefficient of determination (R²) and MSE on both training and test sets. Table 4 presents the results.
| Model | Metric | Task 1 (train) | Task 1 (test) | Task 2 (train) | Task 2 (test) |
|---|---|---|---|---|---|
| Single‑Task | R² | 0.9959 | 0.9948 | 0.9968 | 0.9959 |
| MSE | 0.9670 | 1.2171 | 0.8384 | 1.0650 | |
| Shared‑Bottom | R² | 0.9961 | 0.9953 | 0.9957 | 0.9950 |
| MSE | 0.9105 | 1.1018 | 1.1299 | 1.3040 | |
| MMoE | R² | 0.9990 | 0.9984 | 0.9991 | 0.9986 |
| MSE | 0.2369 | 0.3700 | 0.2453 | 0.3607 |
From Table 4, it is evident that MMoE achieves the highest R² (≥0.9984) and the lowest MSE (≤0.3700) on the test set for both tasks. Compared with Single‑Task, the MSE reduction is 75.50% for Task 1 and 66.13% for Task 2. Compared with Shared‑Bottom, the reductions are 73.98% and 72.34%, respectively. The training and test metrics are close for all models, indicating no overfitting.
5.2 Scatter Plot Analysis
The predictions versus true values for all models are shown in the scatter plot below. The diagonal line represents perfect prediction. The MMoE model produces points that cluster tightly around the diagonal, demonstrating superior accuracy. In contrast, the Single‑Task and Shared‑Bottom models show more scatter, especially at higher load values.

5.3 Relative Error Distribution
I computed the relative error (in percentage) for each test sample, defined as |y_true – y_pred| / y_true * 100%. Table 5 shows the distribution of samples across error intervals.
| Error Interval (%) | Task 1 | Task 2 | ||||
|---|---|---|---|---|---|---|
| ST | SB | MMoE | ST | SB | MMoE | |
| [0, 1] | 1,820 | 1,680 | 2,550 | 1,750 | 1,600 | 2,480 |
| [1, 2] | 1,340 | 1,240 | 1,295 | 1,390 | 1,180 | 1,350 |
| [2, 3] | 520 | 580 | 350 | 480 | 620 | 320 |
| [3, 4] | 230 | 270 | 130 | 260 | 300 | 110 |
| [4, 5] | 110 | 130 | 50 | 90 | 140 | 40 |
| >5 | 495 | 615 | 140 | 545 | 675 | 215 |
| % ≤ 2% | 70.00% | 64.70% | 85.16% | 69.54% | 61.59% | 84.87% |
The MMoE model has 85.16% (Task 1) and 84.87% (Task 2) of test samples within 2% relative error, significantly outperforming the other two models.
5.4 Average Relative Error
Table 6 reports the average relative error (in %) for each model on the test set.
| Model | Task 1 | Task 2 |
|---|---|---|
| Single‑Task | 2.03 | 1.89 |
| Shared‑Bottom | 2.18 | 2.35 |
| MMoE | 1.21 | 1.19 |
The MMoE model achieves the lowest average errors, further confirming its superior predictive capability for UAV drones landing loads.
6. Conclusion
In this study, I have developed a multi‑task learning framework based on the MMoE architecture for predicting the landing loads of UAV drones. By leveraging the natural correlation between left and right main landing gear loads, the MMoE model effectively captures shared and task‑specific features through multiple experts and gating networks. The experimental results on real flight test data demonstrate that the proposed MMoE model significantly outperforms both single‑task and Shared‑Bottom models, achieving a reduction in MSE of over 66% on the test set. Moreover, more than 84% of test samples have relative prediction errors below 2%, and the average relative error is around 1.2%. The model also exhibits higher training efficiency compared to training separate single‑task models.
The proposed method offers a promising solution for real‑time load monitoring and predictive maintenance of UAV drones. Future work will explore extending the framework to incorporate temporal dynamics (e.g., LSTM or transformer layers) and to handle additional landing gear components, such as nose landing gear loads, further enhancing the safety and reliability of UAV drone operations.
In summary, the MMoE‑based multi‑task learning approach is a powerful and efficient tool for landing load prediction in UAV drones, providing engineers with accurate and timely information for structural health assessment.
