A Full-Coverage Path Planning Method for UAVs Based on Graph Transformer

Unmanned Aerial Vehicle (UAV) technology has revolutionized structural inspection tasks through its efficiency and flexibility, particularly in visual damage assessment of three-dimensional structures. Achieving complete coverage while avoiding collisions requires solving the path planning problem—a critical challenge we address by modeling it as a Traveling Salesman Problem (TSP) variant. Traditional TSP solvers like Concorde or LKH face computational bottlenecks for real-time applications. Recent deep learning approaches, including graph neural networks (GNNs), offer promising alternatives but struggle with sparse message passing and non-Euclidean distance constraints inherent to drone navigation. To overcome these limitations, we propose GraphFormer, a novel Graph Transformer model that integrates graph convolution with global attention for robust feature extraction. Our method predicts edge probabilities in solutions and combines beam search with local optimization to generate efficient paths, significantly advancing drone technology in complex environments.

GraphFormer Methodology

We model the full-coverage path planning as a TSP on a fully connected graph $ \mathcal{G} = \langle \mathcal{V}, \mathcal{E} \rangle $, where $ \mathcal{V} = \{v_1, v_2, \ldots, v_n\} $ represents viewpoints and $ \mathcal{E} = \{e_{ij} \mid 1 \leq i,j \leq n\} $ defines edges with costs:

$$
\text{cost}(v_i, v_j) =
\begin{cases}
d_{ij} & \text{if } m_{ij} = 1 \\
W & \text{if } m_{ij} = 0
\end{cases}
$$

Here, $ d_{ij} $ is the Euclidean distance, $ m_{ij} $ is a collision-avoidance mask (1 if traversable, 0 otherwise), and $ W $ is a large penalty weight. The objective minimizes the tour length $ L(\pi) $:

$$
L(\pi) = \sum_{i=1}^{N-1} \text{cost}(\pi_i, \pi_{i+1}) + \text{cost}(\pi_N, \pi_1)
$$

Encoder Architecture

The encoder processes node coordinates $ \mathbf{x}_i $ and edge features $ \epsilon_{ij} $ through:

Input Embedding: Linear projections lift features to $ h $-dimensional space:
$$
\mathbf{x}_i^{(0)} = \mathbf{A}_1 \mathbf{v}_i + \mathbf{b}_1, \quad \mathbf{e}_{ij}^{(0)} = \mathbf{A}_2 \epsilon_{ij} + \mathbf{b}_2
$$
Graph Convolution Layers: Iteratively update node and edge features using neighbor aggregation:
$$
\mathbf{x}_i^{(l+1)} = \mathbf{x}_i^{(l)} + \text{ReLU}\left(\text{BN}\left( \sum_{j \in \mathcal{N}_i} \gamma^{(l)}\left( \alpha^{(l)}(\mathbf{x}_j^{(l)}) \odot \beta^{(l)}(\mathbf{e}_{ij}^{(l)}) \right) \right)\right)
$$
$$
\mathbf{e}_{ij}^{(l+1)} = \mathbf{e}_{ij}^{(l)} + \text{ReLU}\left(\text{BN}\left( \mathbf{W}_1^{(l)} \mathbf{e}_{ij}^{(l)} + \mathbf{W}_2^{(l)} \mathbf{x}_i^{(l)} + \mathbf{W}_3^{(l)} \mathbf{x}_j^{(l)} \right)\right)
$$
where $ \mathcal{N}_i $ denotes the top-$ 2\sqrt{N} $ neighbors by cost.
Gated Transformer: Enhances global context via multi-head attention (MHA) and gating:
$$
\mathbf{Y} = \text{MHA}\left(\text{BN}(\mathbf{X}^{(l)})\right), \quad \mathbf{r} = \sigma(\mathbf{W}_r \mathbf{Y} + \mathbf{V}_r \mathbf{X}^{(l)})
$$
$$
\mathbf{z} = \sigma(\mathbf{W}_z \mathbf{Y} + \mathbf{V}_z \mathbf{X}^{(l)} + \mathbf{b}_z), \quad \mathbf{h} = \tanh(\mathbf{W}_h \mathbf{Y} + \mathbf{V}_h (\mathbf{r} \odot \mathbf{X}^{(l)}))
$$
$$
\mathbf{X}^{(l+1)} = (1 – \mathbf{z}) \odot \mathbf{X}^{(l)} + \mathbf{z} \odot \mathbf{h}
$$

Decoder and Optimization

The decoder uses a multilayer perceptron to predict edge probabilities $ \hat{p}_{ij}^1 $ (inclusion in optimal tour) with weighted binary cross-entropy loss:

$$
\mathcal{L} = -\frac{1}{B \cdot N^2} \sum_{b=1}^{B} \sum_{i,j} w_{y_{ij}} \log(\hat{p}_{ij, y_{ij}})
$$

where weights $ w_1 = \frac{2}{N} $ and $ w_0 = \frac{2}{N(N-2)} $ balance class imbalance. For inference, beam search (width=1280) generates initial tours, refined via 2-opt local optimization:

$$
\pi^* = \underset{\pi \in \mathcal{S}}{\text{argmin}} \ L(\pi)
$$

Experimental Results

2D Path Planning

Tests on random instances and TSPLIB benchmarks show GraphFormer outperforms deep learning baselines in solution quality and generalizability. Key metrics include tour length (Len) and optimality gap (Gap):

$$
\text{Gap} = \frac{\hat{L} – L}{L}
$$

Table 1: Performance on Random 2D Instances (Euclidean Distance)
Method	TSP-20	TSP-50	TSP-100
Concorde (Optimal)	3.830 / 0%	5.692 / 0%	7.765 / 0%
AM	3.841 / 0.28%	5.785 / 1.64%	8.101 / 4.33%
POMO	3.831 / 0.02%	5.697 / 0.10%	7.790 / 0.32%
GraphFormer	3.830 / -0.01%	5.692 / 0.00%	7.781 / 0.21%

Table 2: Generalization to TSPLIB Instances
Method	TSP 1-100	TSP 101-200	TSP 201-500
POMO	19,639 / 0.95%	30,760 / 3.90%	55,111 / 14.61%
GraphFormer	19,777 / 1.66%	30,362 / 2.56%	49,648 / 3.25%

For non-Euclidean cases, GraphFormer dominates classical and learning-based methods:

Table 3: Non-Euclidean 2D Path Planning
Method	TSP-20 (Len/Time)	TSP-50 (Len/Time)
Max-Min Ant Colony	4.135 / 14.09s	6.675 / 82.64s
Genetic Algorithm	4.221 / 67.66s	7.549 / 125.61s
GraphFormer	4.079 / 0.59s	6.394 / 3.27s

3D Path Planning for Drone Technology

In UAV-based structural inspection, we evaluate GraphFormer on aircraft models. Collision detection assigns high costs ($ W $) to non-traversable edges. Tests confirm superior performance:

Table 4: 3D Aircraft Inspection (60 Viewpoints)
Method	Tour Length (m)	Time (s)
Max-Min Ant Colony	11.57	121.0
Genetic Algorithm	18.11	166.7
GraphFormer	11.14	5.40

Ablation studies highlight the encoder’s attention mechanism (MHA) as critical for feature richness:

$$
\begin{array}{lcc}
\text{Encoder Variant} & \text{Euclidean Len} & \text{Non-Euclidean Len} \\
\hline
\text{w/o MHA} & 8.047 & 10.538 \\
\text{GraphFormer} & \mathbf{7.807} & \mathbf{9.503} \\
\end{array}
$$

Conclusion

GraphFormer advances drone technology by solving full-coverage path planning as a generalized TSP. Our Graph Transformer integrates graph convolution with global attention to handle Euclidean and non-Euclidean distances in 2D/3D spaces. Experiments demonstrate state-of-the-art performance in solution quality, speed, and generalizability across random and real-world benchmarks. The approach enables efficient Unmanned Aerial Vehicle deployment for structural inspection with collision avoidance. Future work will optimize search strategies for larger-scale instances, further enhancing UAV navigation capabilities.