Integrating Attention Mechanism and Meta-Learning for Fixed-Wing Drones Fault Diagnosis

In recent years, unmanned aerial vehicles (UAVs), particularly fixed-wing drones, have attracted widespread attention and experienced significant development. Their application scope has expanded from military operations to agriculture, communication, fire rescue, surveying, and many other civilian scenarios, thanks to their flexibility, efficiency, and moderate cost. Fixed-wing drones, in particular, are well-suited for long-range and high-altitude missions due to their aerodynamic efficiency. However, these drones rely on precisely embedded components such as sensors and actuators, which are prone to malfunctions under harsh operating conditions. Such failures can lead to mission failure or even crashes, resulting in substantial economic losses. Therefore, fault diagnosis for fixed-wing drones is of paramount importance. Traditional deep learning-based fault diagnosis methods often depend on large amounts of labeled data, leading to issues such as poor generalization performance, insufficient extraction of key features, and overfitting, especially in scenarios with small sample sizes and complex flight environments. To address these challenges, we propose a meta-learning and effective channel attention (MLECA) fault diagnosis method. This method aims to improve the accuracy and robustness of fault diagnosis for fixed-wing drones through meta-learning. Specifically, we preprocess raw sensor data and construct meta-tasks. To effectively capture and emphasize important features, we design a feature encoder combining convolutional neural networks (CNN) and efficient channel attention (ECA). This encoder serves as the base model, which is then trained using model-agnostic meta-learning (MAML) to optimize initialization parameters and acquire prior representational knowledge. The learned knowledge is subsequently used for fault diagnosis of fixed-wing drones in unknown environments. Experimental results demonstrate that the MLECA method exhibits superior diagnostic performance and stronger generalization capability.

1. Problem Statement and Preliminaries

In real-world applications, fixed-wing drones operate under complex and varying flight conditions, such as different wind speeds, altitudes, and mission profiles. These variations cause the feature distribution of fault data to differ significantly. For instance, under high wind conditions, sensor measurements may exhibit increased fluctuations and noise, leading to inconsistent data characteristics. Traditional deep learning models often suffer from limited generalization ability when facing such complex scenarios. Moreover, since fixed-wing drones typically operate in normal conditions most of the time, acquiring sufficient fault samples is extremely difficult. This scarcity of fault data exacerbates the overfitting problem in conventional data-driven methods. Consequently, achieving efficient and reliable fault diagnosis for fixed-wing drones under scarce data and complex working conditions remains a critical challenge.

The proposed MLECA method leverages meta-learning to enable rapid adaptation to new fault types with limited samples, effectively transferring knowledge across tasks. Additionally, the ECA mechanism focuses on critical feature channels, suppressing irrelevant noise and enhancing feature extraction efficiency. During the meta-testing phase, the model fine-tunes the initial parameters to quickly adapt to new fault distributions, achieving accurate diagnosis.

1.1 Fixed-Wing Drone Faults

Fixed-wing drones generate lift through the relative motion of their wings with respect to the air, enabling sustained flight. As a complex intelligent system, a fixed-wing drone primarily consists of the airframe, actuators, and sensor subsystems. Actuators play a vital role in ensuring aerodynamic stability, attitude control, and angular rate regulation. However, under harsh environmental conditions, actuators are prone to faults that compromise flight performance. Common actuator faults in fixed-wing drones can be categorized into four types: (1) float, where the actuator provides no control effect; (2) lock-in-place, where the actuator is stuck at a fixed position; (3) hard-over, where the actuator is stuck at the minimum or maximum physical limit; and (4) loss of effectiveness, where the actuator’s response to control commands is reduced. The data model for actuator faults is expressed as:

$$
u_{\text{app}} = D u_{\text{com}} + E
$$

where $u_{\text{com}}$ is the commanded control surface deflection, $u_{\text{app}}$ is the actual applied deflection, $D$ represents the efficiency loss of the control surface, and $E$ is the bias error. By setting different values of $D$ and $E$, various fault types can be injected as summarized in Table 1.

**Table 1: Parameter settings for actuator fault model**
Fault Type	$D$	$E$	Explanation
Float	0	0	No control effect
Lock-in-place	0	Constant	Fixed position
Hard-over	0	Constant	Stuck at min/max limit
Loss of effectiveness	Constant (<1)	0	Reduced response efficiency

In this work, we focus on two aerodynamic control surfaces (right and left). The fault model for these two surfaces is given by:

$$
\begin{bmatrix}
u_{\text{app}_1} \\
u_{\text{app}_2}
\end{bmatrix}
=
\begin{bmatrix}
d_1 & 0 \\
0 & d_2
\end{bmatrix}
\begin{bmatrix}
u_{\text{com}_1} \\
u_{\text{com}_2}
\end{bmatrix}
+
\begin{bmatrix}
e_1 \\
e_2
\end{bmatrix}
$$

where the subscript 1 denotes the right control surface and 2 denotes the left control surface. By adjusting $d_1$, $d_2$, $e_1$, and $e_2$, different fault scenarios are simulated.

1.2 Meta-Learning

Meta-learning is a machine learning paradigm that focuses on enabling models to acquire meta-knowledge from previous tasks, thereby facilitating rapid adaptation to new tasks without retraining from scratch. Model-agnostic meta-learning (MAML) is a popular optimization-based meta-learning algorithm. The goal of MAML is to train model parameters such that a small number of gradient updates on a new task yields strong performance. MAML consists of two loops: an inner loop and an outer loop. In the inner loop, the model performs local updates on each specific task using its support set. The update rule is:

$$
\theta_i’ = \theta – \alpha \nabla_{\theta} \mathcal{L}_{T_i}(f_\theta, D_i^{\text{train}})
$$

where $\alpha$ is the inner learning rate, $D_i^{\text{train}}$ is the support set of task $T_i$, and $\mathcal{L}_{T_i}$ is the loss function (e.g., cross-entropy). After the inner update, we obtain task-specific parameters $\theta_i’$. In the outer loop, the initial parameters $\theta$ are updated based on the performance of the adapted model on the query set of each task:

$$
\theta \leftarrow \theta – \beta \sum_i \nabla_{\theta} \mathcal{L}_{T_i}(f_{\theta_i’}, D_i^{\text{val}})
$$

where $\beta$ is the outer meta-learning rate, and $D_i^{\text{val}}$ is the query set. Through this bi-level optimization, the initial parameters become well-suited for fast adaptation across a distribution of tasks.

1.3 Efficient Channel Attention

Channel attention mechanisms allow neural networks to focus on important feature channels while suppressing irrelevant ones. The Squeeze-and-Excitation Network (SENet) is a classic approach that uses global average pooling and fully connected layers to learn channel weights. However, SENet introduces high computational complexity and parameter overhead. The Efficient Channel Attention (ECA) module addresses this by using a one-dimensional convolution to perform local cross-channel interaction with fewer parameters. For an input feature map $ X \in \mathbb{R}^{C \times H \times W} $, global average pooling first extracts channel-wise global information:

$$
z_c = \frac{1}{H \times W} \sum_{i=1}^{H} \sum_{j=1}^{W} X_{c,i,j}
$$

where $z \in \mathbb{R}^C$. Then, a 1D convolution with kernel size $k$ is applied to model local channel interactions:

$$
a = \text{Conv1D}(z, k), \quad s = \sigma(a)
$$

where $\sigma$ is the sigmoid activation function, and $s \in \mathbb{R}^C$ is the final channel attention weights. The output feature map is obtained by multiplying each channel by its corresponding weight:

$$
Y_{c,i,j} = s_c \cdot X_{c,i,j}
$$

This simplified structure reduces parameters and computational cost while maintaining or even improving performance. Figure 1 illustrates the ECA module.

2. Proposed Method: MLECA

The overall architecture of the MLECA model is designed specifically for fault diagnosis of fixed-wing drones. The method comprises three main stages: (1) data preprocessing and meta-task construction, (2) feature encoding using a CNN-ECA backbone, and (3) meta-training with MAML for parameter initialization and fast adaptation.

2.1 Meta-Task Construction

Meta-learning requires a set of tasks derived from the available data. Each task mimics a few-shot learning scenario where the model must diagnose faults using only a small number of labeled examples. In our approach, we treat each fault type as a distinct class. For constructing an $N$-way $K$-shot meta-task, we randomly select $N$ classes from the entire set of fault types. From each selected class, we randomly pick $K$ samples for the support set and $Q$ samples for the query set. The support set is used for inner-loop adaptation, while the query set evaluates the model’s generalization performance after adaptation. This process is repeated to generate a large collection of meta-training tasks. Similarly, meta-validation and meta-testing tasks are created using data from unseen flight conditions.

2.2 CNN-ECA Feature Encoder

To effectively extract discriminative features from sensor data, we design a feature encoder that integrates convolutional neural networks with ECA modules. The encoder consists of four consecutive feature extraction blocks. Each block comprises a 1D convolutional layer, batch normalization, ReLU activation, max-pooling, and an ECA module. The convolutional layers capture local temporal patterns in the sensor signals, while the ECA modules adaptively reweight the channel outputs to emphasize fault-relevant channels and suppress noise. After the four blocks, a fully connected layer maps the extracted features to class scores. Table 2 summarizes the architecture parameters.

**Table 2: Architecture parameters of the CNN-ECA feature encoder**
Layer	Kernel Size	Stride	Channels	Input (Width×Depth)	Output (Width×Depth)
Conv1D	3	1	64	160×1	160×64
MaxPool1D	2	2	64	160×64	80×64
ECA	—	—	64	80×64	80×64
Conv1D	3	1	64	80×64	80×64
MaxPool1D	2	2	64	80×64	40×64
ECA	—	—	64	40×64	40×64
Conv1D	3	1	64	40×64	40×64
MaxPool1D	2	2	64	40×64	20×64
ECA	—	—	64	20×64	20×64
Conv1D	3	1	64	20×64	20×64
MaxPool1D	2	2	64	20×64	10×64
ECA	—	—	64	10×64	10×64
FC	—	—	$N$	640	$N$

The input to the encoder is a 160-dimensional vector (20 time steps × 8 features: linear accelerations $\alpha_x, \alpha_y, \alpha_z$, angular velocities $\omega_x, \omega_y, \omega_z$, and autopilot commands $u_{\text{com}1}, u_{\text{com}2}$). The output dimension equals the number of fault classes $N$ in the current task.

2.3 Meta-Training Procedure

During meta-training, we sample a batch of tasks from the meta-training set. For each task $T_i$, we perform the following steps:

Inner-loop adaptation: Compute the loss on the support set $S_i$ using the current model parameters $\theta$:
\[
\mathcal{L}_{S_i}(\theta) = \frac{1}{|S_i|} \sum_{(x_j, y_j) \in S_i} \ell(f_\theta(x_j), y_j)
\]
where $\ell$ is cross-entropy loss. Update parameters via gradient descent:
\[
\theta_i’ = \theta – \alpha \nabla_\theta \mathcal{L}_{S_i}(\theta)
\]
Outer-loop evaluation: Evaluate the adapted model on the query set $Q_i$:
\[
\mathcal{L}_{Q_i}(\theta_i’) = \frac{1}{|Q_i|} \sum_{(x_j, y_j) \in Q_i} \ell(f_{\theta_i’}(x_j), y_j)
\]
Meta-update: Accumulate gradients from all tasks in the batch and update the initial parameters:
\[
\theta \leftarrow \theta – \beta \sum_{T_i} \nabla_\theta \mathcal{L}_{Q_i}(\theta_i’)
\]

This procedure trains the model to acquire initialization parameters that enable fast adaptation with only a few gradient steps. The inner learning rate $\alpha$ is set to 0.1, the outer meta-learning rate $\beta$ is 0.003, and the number of inner gradient steps is 1.

3. Experiments and Results

3.1 Experimental Setup

All experiments are conducted on a workstation with Ubuntu 18.04, an 8-core Intel Xeon W-2123 CPU, an NVIDIA GTX 2080Ti GPU, and PyTorch 1.7.1. The meta-learning rate is 0.003, the inner learning rate is 0.1, and the number of inner steps is 1. For each experiment, we report the average accuracy over 10 independent runs.

3.2 Dataset

We use a publicly available dataset collected from real flight experiments of a small fixed-wing drone. The flight tests were conducted on different days with varying wind conditions. The dataset includes multiple variables recorded by onboard sensors and autopilot commands. To reduce computational cost and remove irrelevant information, we select eight features that are most relevant to actuator faults: linear accelerations ($\alpha_x, \alpha_y, \alpha_z$), angular velocities ($\omega_x, \omega_y, \omega_z$), and control commands for the two control surfaces ($u_{\text{com}1}, u_{\text{com}2}$). Time dynamics are incorporated by concatenating 20 consecutive time steps, resulting in a 160-dimensional input vector. The dataset is divided into four subsets (A, B, C, D) based on flight date and wind speed, as summarized in Table 3.

**Table 3: Dataset summary**
Subset	Fault Types	Date	Wind Speed (m/s)	Number of Samples
A	Normal, right control surface efficiency 30%	Day 12	<2.0	8,980
B	Normal, right control surface efficiency 30%	Day 13	8.0	8,980
C	9 types (normal + various actuator faults)	Day 21	2.5	21,980
D	9 types (normal + various actuator faults)	Day 23	5.0	22,480

Subsets A and B contain two classes (normal and fault), while subsets C and D contain nine classes (including normal and various fault combinations). The wind speeds vary significantly across days, creating a challenging domain shift that tests generalization capability.

3.3 Binary Fault Diagnosis Results

First, we evaluate the model on binary classification tasks (normal vs. fault) using a 2-way 1-shot setting. The training set is one subset (e.g., A) and the test set is another (e.g., B). We compare MLECA with several baselines: support vector machine (SVM), a plain 4-layer 1D CNN with the same architecture as the encoder but without ECA, siamese hybrid neural network (SHNN) with two shared-parameter CNN-ECA encoders, and ML-CNN (same as MLECA but without ECA). Results are shown in Table 4.

**Table 4: Binary classification accuracy (%) for different methods**
Method	A (wind<2m/s) → B (wind 8m/s)	B (wind 8m/s) → A (wind<2m/s)
SVM	59.47	77.03
CNN	87.97	79.10
SHNN	90.08	87.50
ML-CNN	90.22	85.00
MLECA (ours)	92.19	90.53

MLECA achieves the highest accuracy in both directions. In the A→B transfer (from low-wind to high-wind), MLECA outperforms SVM by 32.72 percentage points, CNN by 2.11, SHNN by 0.99, and ML-CNN by 1.97. The improvement over ML-CNN confirms the benefit of the ECA module. In the reverse direction (B→A), MLECA still maintains 90.53%, demonstrating robustness and superior generalization under domain shift. The SVM accuracy fluctuates significantly, while deep learning methods are more stable.

3.4 Multi-Class Fault Diagnosis Results

Multi-class fault diagnosis is more challenging due to the increased number of classes and overlapping feature distributions. We use subsets C and D (9 classes each) with 9-way 1-shot and 9-way 5-shot settings. The results are presented in Table 5.

**Table 5: Multi-class classification accuracy (%) for different methods**
Method	C (2.5 m/s) → D (5 m/s)	D (5 m/s) → C (2.5 m/s)
SVM	50.44	67.52
CNN	53.17	60.20
SHNN-1shot	57.50	61.29
SHNN-5shot	60.08	60.43
ML-CNN-1shot	53.93	58.92
ML-CNN-5shot	59.51	60.10
MLECA-1shot	60.19	60.08
MLECA-5shot	60.19	60.43

MLECA-5shot achieves the highest accuracy in C→D (60.19%) and ties with SHNN-5shot in D→C (60.43%). Note that MLECA-1shot already reaches 60.19% in C→D, which is comparable to the 5-shot performance of other methods, indicating excellent few-shot capability. The improvement from 1-shot to 5-shot is marginal for MLECA, suggesting that the meta-learned initialization already provides strong generalization even with extremely limited data. SVM performs well in D→C (67.52%) but poorly in C→D (50.44%), showing instability across domains. The plain CNN also struggles, while MLECA consistently delivers balanced performance.

3.5 Impact of Wind Speed

The influence of wind speed on model performance is analyzed by examining the flight trajectories and sensor signals. Low-wind conditions (e.g., subset A, <2 m/s) produce stable data with low noise, allowing models to learn clean features. High-wind conditions (e.g., subset B, 8 m/s) introduce significant measurement fluctuations, making it harder for models to extract fault-relevant patterns. In the binary tasks, training on low-wind data and testing on high-wind data (A→B) yields higher accuracy than the reverse (B→A) for all deep learning models, because models trained on clean data generalize better to noisy data. For multi-class tasks, training on the higher-wind subset D (5 m/s) and testing on lower-wind subset C (2.5 m/s) yields higher accuracy in some cases, potentially because the model learns to be robust to noise. The MLECA method, however, maintains relatively stable performance regardless of the wind direction, confirming its robustness.

4. Conclusion

We have proposed a novel fault diagnosis method for fixed-wing drones that integrates meta-learning with an efficient channel attention mechanism. The MLECA method addresses the critical challenges of limited fault data and complex flight environments by leveraging MAML for rapid adaptation and ECA for salient feature extraction. Experimental results on real flight data demonstrate that MLECA significantly outperforms traditional methods such as SVM, plain CNN, siamese networks, and meta-learning without attention. The method exhibits strong generalization under domain shifts caused by varying wind speeds and shows stable performance in both binary and multi-class fault diagnosis tasks. The incorporation of ECA proved effective in enhancing feature discrimination. Future work could explore more advanced meta-learning frameworks, incorporate temporal attention, and test the method on larger-scale datasets and real-time deployment scenarios.