Unmanned Aerial Vehicle (UAV) technology has revolutionized structural inspection tasks through its efficiency and flexibility, particularly in visual damage assessment of three-dimensional structures. Achieving complete coverage while avoiding collisions requires solving the path planning problem—a critical challenge we address by modeling it as a Traveling Salesman Problem (TSP) variant. Traditional TSP solvers like Concorde or LKH face computational bottlenecks for real-time applications. Recent deep learning approaches, including graph neural networks (GNNs), offer promising alternatives but struggle with sparse message passing and non-Euclidean distance constraints inherent to drone navigation. To overcome these limitations, we propose GraphFormer, a novel Graph Transformer model that integrates graph convolution with global attention for robust feature extraction. Our method predicts edge probabilities in solutions and combines beam search with local optimization to generate efficient paths, significantly advancing drone technology in complex environments.

GraphFormer Methodology
We model the full-coverage path planning as a TSP on a fully connected graph \( \mathcal{G} = \langle \mathcal{V}, \mathcal{E} \rangle \), where \( \mathcal{V} = \{v_1, v_2, \ldots, v_n\} \) represents viewpoints and \( \mathcal{E} = \{e_{ij} \mid 1 \leq i,j \leq n\} \) defines edges with costs:
$$
\text{cost}(v_i, v_j) =
\begin{cases}
d_{ij} & \text{if } m_{ij} = 1 \\
W & \text{if } m_{ij} = 0
\end{cases}
$$
Here, \( d_{ij} \) is the Euclidean distance, \( m_{ij} \) is a collision-avoidance mask (1 if traversable, 0 otherwise), and \( W \) is a large penalty weight. The objective minimizes the tour length \( L(\pi) \):
$$
L(\pi) = \sum_{i=1}^{N-1} \text{cost}(\pi_i, \pi_{i+1}) + \text{cost}(\pi_N, \pi_1)
$$
Encoder Architecture
The encoder processes node coordinates \( \mathbf{x}_i \) and edge features \( \epsilon_{ij} \) through:
- Input Embedding: Linear projections lift features to \( h \)-dimensional space:
$$
\mathbf{x}_i^{(0)} = \mathbf{A}_1 \mathbf{v}_i + \mathbf{b}_1, \quad \mathbf{e}_{ij}^{(0)} = \mathbf{A}_2 \epsilon_{ij} + \mathbf{b}_2
$$ - Graph Convolution Layers: Iteratively update node and edge features using neighbor aggregation:
$$
\mathbf{x}_i^{(l+1)} = \mathbf{x}_i^{(l)} + \text{ReLU}\left(\text{BN}\left( \sum_{j \in \mathcal{N}_i} \gamma^{(l)}\left( \alpha^{(l)}(\mathbf{x}_j^{(l)}) \odot \beta^{(l)}(\mathbf{e}_{ij}^{(l)}) \right) \right)\right)
$$
$$
\mathbf{e}_{ij}^{(l+1)} = \mathbf{e}_{ij}^{(l)} + \text{ReLU}\left(\text{BN}\left( \mathbf{W}_1^{(l)} \mathbf{e}_{ij}^{(l)} + \mathbf{W}_2^{(l)} \mathbf{x}_i^{(l)} + \mathbf{W}_3^{(l)} \mathbf{x}_j^{(l)} \right)\right)
$$
where \( \mathcal{N}_i \) denotes the top-\( 2\sqrt{N} \) neighbors by cost. - Gated Transformer: Enhances global context via multi-head attention (MHA) and gating:
$$
\mathbf{Y} = \text{MHA}\left(\text{BN}(\mathbf{X}^{(l)})\right), \quad \mathbf{r} = \sigma(\mathbf{W}_r \mathbf{Y} + \mathbf{V}_r \mathbf{X}^{(l)})
$$
$$
\mathbf{z} = \sigma(\mathbf{W}_z \mathbf{Y} + \mathbf{V}_z \mathbf{X}^{(l)} + \mathbf{b}_z), \quad \mathbf{h} = \tanh(\mathbf{W}_h \mathbf{Y} + \mathbf{V}_h (\mathbf{r} \odot \mathbf{X}^{(l)}))
$$
$$
\mathbf{X}^{(l+1)} = (1 – \mathbf{z}) \odot \mathbf{X}^{(l)} + \mathbf{z} \odot \mathbf{h}
$$
Decoder and Optimization
The decoder uses a multilayer perceptron to predict edge probabilities \( \hat{p}_{ij}^1 \) (inclusion in optimal tour) with weighted binary cross-entropy loss:
$$
\mathcal{L} = -\frac{1}{B \cdot N^2} \sum_{b=1}^{B} \sum_{i,j} w_{y_{ij}} \log(\hat{p}_{ij, y_{ij}})
$$
where weights \( w_1 = \frac{2}{N} \) and \( w_0 = \frac{2}{N(N-2)} \) balance class imbalance. For inference, beam search (width=1280) generates initial tours, refined via 2-opt local optimization:
$$
\pi^* = \underset{\pi \in \mathcal{S}}{\text{argmin}} \ L(\pi)
$$
Experimental Results
2D Path Planning
Tests on random instances and TSPLIB benchmarks show GraphFormer outperforms deep learning baselines in solution quality and generalizability. Key metrics include tour length (Len) and optimality gap (Gap):
$$
\text{Gap} = \frac{\hat{L} – L}{L}
$$
Method | TSP-20 | TSP-50 | TSP-100 |
---|---|---|---|
Concorde (Optimal) | 3.830 / 0% | 5.692 / 0% | 7.765 / 0% |
AM | 3.841 / 0.28% | 5.785 / 1.64% | 8.101 / 4.33% |
POMO | 3.831 / 0.02% | 5.697 / 0.10% | 7.790 / 0.32% |
GraphFormer | 3.830 / -0.01% | 5.692 / 0.00% | 7.781 / 0.21% |
Method | TSP 1-100 | TSP 101-200 | TSP 201-500 |
---|---|---|---|
POMO | 19,639 / 0.95% | 30,760 / 3.90% | 55,111 / 14.61% |
GraphFormer | 19,777 / 1.66% | 30,362 / 2.56% | 49,648 / 3.25% |
For non-Euclidean cases, GraphFormer dominates classical and learning-based methods:
Method | TSP-20 (Len/Time) | TSP-50 (Len/Time) |
---|---|---|
Max-Min Ant Colony | 4.135 / 14.09s | 6.675 / 82.64s |
Genetic Algorithm | 4.221 / 67.66s | 7.549 / 125.61s |
GraphFormer | 4.079 / 0.59s | 6.394 / 3.27s |
3D Path Planning for Drone Technology
In UAV-based structural inspection, we evaluate GraphFormer on aircraft models. Collision detection assigns high costs (\( W \)) to non-traversable edges. Tests confirm superior performance:
Method | Tour Length (m) | Time (s) |
---|---|---|
Max-Min Ant Colony | 11.57 | 121.0 |
Genetic Algorithm | 18.11 | 166.7 |
GraphFormer | 11.14 | 5.40 |
Ablation studies highlight the encoder’s attention mechanism (MHA) as critical for feature richness:
$$
\begin{array}{lcc}
\text{Encoder Variant} & \text{Euclidean Len} & \text{Non-Euclidean Len} \\
\hline
\text{w/o MHA} & 8.047 & 10.538 \\
\text{GraphFormer} & \mathbf{7.807} & \mathbf{9.503} \\
\end{array}
$$
Conclusion
GraphFormer advances drone technology by solving full-coverage path planning as a generalized TSP. Our Graph Transformer integrates graph convolution with global attention to handle Euclidean and non-Euclidean distances in 2D/3D spaces. Experiments demonstrate state-of-the-art performance in solution quality, speed, and generalizability across random and real-world benchmarks. The approach enables efficient Unmanned Aerial Vehicle deployment for structural inspection with collision avoidance. Future work will optimize search strategies for larger-scale instances, further enhancing UAV navigation capabilities.