With the rapid advancement of Unmanned Aerial Vehicle technology, UAV ad hoc networks have been widely applied in various fields such as military operations, environmental monitoring, and post-disaster rescue. Most current UAV ad hoc networks rely on Radio Frequency technology. However, due to the limited RF spectrum resources and the massive access of IoT devices, the available bandwidth cannot meet the growing spectrum demands, leading to congestion and competition for RF spectrum resources, which compromises communication rates and quality. To address these issues, Free Space Optical communication is considered a viable solution for UAV communication networks. Compared to RF links, laser communication, with its high bandwidth, low latency, and larger modulation bandwidth, has become a significant research direction for UAV ad hoc networks. However, the precise alignment requirements of laser communication and its high sensitivity to environmental conditions, such as meteorological factors, make the topology and routing optimization of UAV networks more complex.
In traditional UAV routing optimization, researchers widely use classical path planning algorithms such as Dijkstra’s algorithm, A* algorithm, and genetic algorithms. While these methods provide effective path selection in static and simple environments, they lack autonomous learning and adaptive capabilities, making it difficult to cope with dynamic changes and complex environments. Dijkstra’s algorithm has the advantage of simplicity and can find globally optimal paths, but its drawback is high computational complexity, especially in large-scale networks, resulting in low execution efficiency and inability to handle dynamically changing network topologies effectively. The A* algorithm is an improvement over Dijkstra’s algorithm. Although it can provide faster path computation in some scenarios compared to Dijkstra’s algorithm, it still relies on static network topologies and cannot adapt to rapidly changing network environments and real-time node position changes, thus facing significant limitations in UAV networks. The Genetic Algorithm, as an optimization algorithm based on natural selection and gene recombination, is commonly used to solve complex combinatorial optimization problems with large search spaces. However, it relies on population evolution processes, has slow convergence speeds, often requires long computation times, and due to its stochastic nature, cannot guarantee finding the global optimum.
To address these challenges, an increasing number of studies are turning to deep learning, particularly Graph Neural Network methods. GNNs flexibly model graph structures, capturing complex dependencies between nodes. They can not only discover data features but also use graph structures to reveal relationships between them, greatly enhancing data analysis capabilities and adapting to dynamically changing network environments, thus showing significant advantages in optimizing path planning and node positioning. Traditional neural networks need to process data from all users simultaneously, but due to the message-passing mechanism, GNNs only need to process data from each user. Therefore, the number of trainable parameters is significantly lower than that of traditional neural network architectures, greatly reducing computational complexity. Unlike traditional algorithms, GNN-based path optimization methods can update network topologies in real-time and dynamically adjust UAV paths, providing more flexible and efficient routing selections.

Problem Analysis
UAV ad hoc networks are inherently distributed networks. In such networks, Unmanned Aerial Vehicles can self-configure the network, dynamically exchange data, and communicate without relying on pre-existing network configurations. Each node acts as both a resource provider and a resource user, significantly improving system fault tolerance, scalability, and flexibility.
Network Model
In this study, we consider a communication system with O pairs of users, including senders and receivers, and M Unmanned Aerial Vehicles. To simplify the problem, we assume that each sender transmits data to a specific receiver. In this context, each UAV can communicate with other user UAVs and relay UAVs in the laser network, with a maximum range of ξ meters. We frame the relay problem in a generalized framework where any UAV can serve as a relay node for users and other UAVs within its coverage. There is no hop limit until data is successfully transmitted to the receiver, so the minimum number of hops is 1, while the maximum number of hops depends on the number of UAVs M. Each user pair can independently choose their respective relay paths, and the number of hops can vary between different user pairs. For example, Sender 2 can choose UAV 1 or UAV 2 as the first-hop relay UAV since they are within communication coverage. If UAV 2 receives data from Sender 2, it will forward the data to Receiver 2 or UAV 3 based on link quality. However, it is important to note that UAV 2 cannot forward data to UAV 1 to avoid forming loops in the relay path. Thus, Sender 2 will have four possible relay paths, and Sender 1’s relay path selection follows a similar pattern.
Channel Model
Research on channel modeling for laser networks primarily focuses on analyzing key factors affecting signal attenuation, including transmission path loss and atmospheric turbulence effects. We systematically analyze the combined impact of these parameters, constructing mathematical models to multi-dimensionally quantify the comprehensive effects of different environmental variables on laser transmission performance. The total channel loss is given by:
$$L_{total}(\lambda, d) = L_p(\lambda, d) + L_{atm}(\lambda, d)$$
Typically, the channel model consists of large-scale path loss and small-scale channel fading. However, since small-scale channel information changes on time scales of milliseconds, it is impractical to design UAV positions based on such rapidly changing channel state information. Therefore, in this study, we only consider large-scale characteristics. Thus, the traffic $f_s$ for sender $s$ can be calculated as:
$$f_s = \min_i r_{s_i} = \min_i B \log_2 \left(1 + \frac{g_0 P}{BN_0 d_{s_i}^{-L}}\right)$$
where $r_{s_i}$ is the transmission rate for the $i$-th hop from sender $s$, $d_{s_i}$ is the distance between two nodes in the $i$-th hop, $g_0$ is the attenuation loss of the laser beam per unit distance, $L$ is the attenuation loss exponent of the laser beam, and $N_0$ is the received Additive White Gaussian Noise. To emphasize the impact of UAV positioning and relay paths on the average communication rate, we set equal bandwidth $B$ and transmit power $P$. From the equation, it can be seen that the traffic $f$ is determined by the hop with the minimum distance $d_i$. In this study, given $O$ pairs of senders and receivers, we need to jointly optimize the position of each UAV and the paths to maximize the total system traffic, i.e.,
$$\max \sum_{m=1}^{O} f_m$$
Path Loss Model
Quantifying path loss is crucial for evaluating the effective communication distance of a link and the required transmit power. Path loss is primarily caused by absorption and scattering of the laser signal during propagation through atmospheric conditions such as fog, rain, and dust. The Beer-Lambert Law provides a method to quantify these effects, typically describing the transmittance of an optical signal after passing through an atmospheric channel. The transmittance $T$ is expressed as:
$$T(\lambda, d) = \frac{P_R}{P_T} = e^{-\alpha(\lambda) d}$$
Path loss $L_p$ can be derived from transmittance:
$$L_p = -10 \log_{10}(T) = -10 \log_{10}(e^{-\alpha(\lambda) d}) = 10 \alpha(\lambda) d \log_{10}(e)$$
Atmospheric Turbulence Model
In Free Space Optical communication, atmospheric turbulence effects significantly interfere with beam propagation, causing random fluctuations in the intensity and phase distribution of the optical signal. This phenomenon is mainly caused by surface heat absorption and temperature gradient changes, leading to uneven air temperature distribution near the ground, resulting in density differences and the formation of air vortices. These air vortices act as lenses of different sizes and refractive indices. When an optical signal passes through such an inhomogeneous atmospheric layer, it significantly degrades the transmission quality of the optical communication link. Based on the variation and inhomogeneity of the refractive index, atmospheric turbulence is classified into four levels: weak turbulence, moderate turbulence, strong turbulence, and saturated turbulence. Two probability density function models are commonly used: the log-normal distribution model and the Gamma-Gamma model, applicable to different turbulence intensities.
This study primarily considers weak turbulence, where atmospheric disturbances are small and do not cause drastic changes in light intensity. Therefore, using the log-normal distribution provides sufficient accuracy and simplicity. The light signal intensity affected by turbulence follows a log-normal distribution. Its specific probability density function is:
$$f(I) = \frac{1}{I \sigma \sqrt{2\pi}} \exp\left\{-\frac{[\ln(I) – \mu]^2}{2\sigma^2}\right\}$$
where $\sigma$ is the standard deviation of the logarithmic intensity, i.e., the standard deviation of the natural logarithm of the light signal intensity, $I$ is the received light signal intensity, and $\mu$ is the expected value of the logarithmic intensity $\ln(I)$.
Therefore, the turbulence loss is calculated as:
$$L_{atm} = -10 \log_{10}\left(\frac{I}{I_0}\right)$$
where $I$ is the received light intensity, and $I_0$ is the average light intensity under no turbulence conditions.
Problem Formulation
To formulate the problem, the topology of all users and UAVs is represented by a weighted undirected graph $G = (V, E)$. The vertex set $V$ represents nodes in the topology, including users and relay UAVs. The edge set $E$ represents laser transmission links. $N(u)$ is the set of all adjacent nodes of node $u$. The set $V$ consists of the user set and the UAV set $V_u$, where the user set is divided into the sender set $V_s$ and the receiver set $V_r$, with $|V_s| = |V_r| = O$, $|V_u| = M$, so $V = \{V_s, V_r, V_u\}$. Define $V_{s_r}$ as the set of receivers for sender $s$. The feature of node $i$ is its position $\mathbf{d}_i$. A transmission link $e_{ij} \in E$ exists if and only if $i, j \in V_u$ or $i \in V_s \cup V_r$, $j \in V_u$. The weight of transmission link $e_{ij}$ is the transmission rate $r_{ij}$ between node $i$ and node $j$.
Given graph $G$, consider the following decision variables, we formulate the problem as:
- A set of binary variables $X = \{x_1, x_2, x_3, \dots, x_{|V_s|}\}$, where $x_s = \{x_{ij}^s | i, j \in V\}$ is a set of relay paths for sender $s$. If nodes $i, j$ transmit data from sender $s$, then $x_{ij}^s = 1$; otherwise, $x_{ij}^s = 0$.
- A set of continuous variables $F = \{f_1, f_2, f_3, \dots, f_{|V_s|}\}$, where $f_s = \{f_{ij}^s | i, j \in V\}$ is the data flow from sender $s$, and $f_{ij}^s$ is the traffic forwarded from node $i$ to node $j$ for data from sender $s$.
- A set of continuous variables $D = \{d_1, d_2, d_3, \dots, d_{|V_u|}\}$, where $d_u$ is the position information of UAV $u$.
Thus, the communication rate maximization problem for all user pairs can be formulated as the following network traffic maximization problem:
$$\max \sum_{s \in V_s} \sum_{u \in V_u} f_{su}$$
Subject to:
$$f_{ij}^s \leq r_{ij}^s \quad \forall i, j \in V, \forall s \in V_s$$
$$\sum_{i \in N(j)} x_{ij}^s f_{ij}^s – \sum_{i \in N(j)} x_{ji}^s f_{ji}^s = 0 \quad \forall j \in V_u, \forall s \in V_s$$
$$\sum_{u \in V_u} x_{su}^s f_{su}^s – \sum_{u \in V_u} x_{ur}^s f_{ur}^s = 0 \quad \forall r \in V_{s_r}, \forall s \in V_s$$
$$\sum_{j \in N(i)} x_{ij}^s \leq 1 \quad \forall i \in V_s, \forall s \in V_s$$
$$x_{ij}^s = \begin{cases} 1 & \text{if } f_{ij}^s \neq 0 \\ 0 & \text{otherwise} \end{cases} \quad \forall i \in V_s, \forall j \in N(i), \forall s \in V_s$$
$$r_{ij}^s = \begin{cases} B \log_2 \left(1 + \frac{g_0 P}{B N_0 \|\mathbf{d}_i – \mathbf{d}_j\|^{-L}}\right) & \text{if } \|\mathbf{d}_i – \mathbf{d}_j\| \leq \xi \\ 0 & \text{otherwise} \end{cases} \quad \forall s \in V_s, \forall i, j \in V$$
Equation (9) indicates that the communication rate between two nodes must be greater than their traffic. Equation (10) states that the traffic received from all neighbor nodes $\sum_{i \in N(j)} x_{ij}^s f_{ij}^s$ should equal the traffic transmitted to other neighbor nodes $\sum_{i \in N(j)} x_{ji}^s f_{ji}^s$, as the role of a relay UAV is to forward all received data to another neighbor node. Equation (11) ensures that the traffic for a sender-receiver pair is equal. Equations (12) and (13) ensure that data from a sender is only transmitted to one adjacent node. Equation (14) determines the communication rate between node $i$ and node $j$. When the distance between two devices (user and UAV or two UAVs) is greater than $\xi$ meters, a communication link cannot be established, and the rate $r_{ij}^s$ is 0. Additionally, to simplify the problem, we assume all nodes have the same transmit power.
Two-Stage Optimization Algorithm Based on GNN
This algorithm can be simplified into two smaller objectives: determining the positions of all Unmanned Aerial Vehicles and selecting relay paths for each pair of senders and receivers. Since the ultimate goal is to maximize the overall network traffic, the positions of the UAVs should be determined based on the relay paths chosen by the users. Therefore, we first propose the HA-GNN algorithm based on a heterogeneous attention mechanism to optimize relay path selection, and then use the UL-GNN algorithm based on unsupervised learning to determine the optimal UAV positioning based on the optimized relay paths, thereby achieving maximized network traffic.
Proposed UAV Relay Path Optimization Sub-Algorithm Based on Heterogeneous Attention Mechanism
Relay Path Optimization with HA-GNN
We propose using reinforcement learning to select the best relay paths for all users. The RL framework of HA-GNN is described as follows:
State: The state space for each pair of sender $s$ and receiver $r$ is modeled as a graph structure $G_s = (V_s^{\text{all}}, E_s)$, where $V_s^{\text{all}} = V_u \cup \{s, r\}$, including all nodes participating in the communication, and $E_s$ is a subset of $E$, removing edges formed by nodes outside $V_s^{\text{all}}$. Additionally, to enable HA-GNN to know which nodes can be selected as the next-hop relay nodes, the state space also includes the currently selected relay nodes $V_s^{\text{sel}}$ and the nodes that can be selected $V_s^n$, where $V_s^n = V_s^{\text{all}} \setminus V_s^{\text{sel}}$. Thus, the state $S = \{G_s, V_s^{\text{sel}}, V_s^n\}$, and the state at the $t$-th hop is denoted as $s_t$. Moreover, in the initial state, $V_s^{\text{sel}} = \{s\}$.
Action: Since HA-GNN needs to select a node as the next hop, the action space $A = V_s^n$, and the action at the $t$-th hop is denoted as $a_t$. When the receiver $r$ is in $V_s^{\text{sel}}$, the selection is complete.
Reward: Since the communication rate for a pair of users is represented by distance, the agent should minimize the maximum distance between any two nodes in the relay path by selecting nodes.
$$R_t = -\max_{v \in V_s^{\text{sel}}} \|\mathbf{d}_{v_{t+1}} – \mathbf{d}_{v_t}\|$$
where $v_t$ is the relay node at the $t$-th hop. The reward is negative, so when RL maximizes the reward, the distance decreases.
Based on Equation (15), we define the objective function of HA-GNN as:
$$\max \alpha = \mathbb{E} \left[ \sum_t R_t \right] = \mathbb{E} \left[ -\min_{s \in V_s} \max_{v \in V_s^{\text{sel}}} \|\mathbf{d}_{v_{t+1}} – \mathbf{d}_{v_t}\| \right]$$
Since RL is used to select the best next-hop relay node, HA-GNN cannot know how good the selected path is until a complete path is selected. Therefore, we cannot directly obtain the gradient of parameter updates after each update. Thus, we compute the policy gradient of HA-GNN to update the parameters in HA-GNN. The policy gradient for parameters $\theta$ used to optimize the objective function $\alpha$ can be described as:
$$\nabla_\theta \alpha = \nabla_\theta \log \pi_\theta$$
Therefore, the parameters $\theta_{\text{HA-GNN}}$ of HA-GNN can be updated as follows, minimizing the maximum distance between relay nodes:
$$\theta_{\text{HA-GNN}} = \theta_{\text{HA-GNN}} + \mu_{\text{HA}} \sum_{s \in V_s} \left( -\min_{s \in V_s} \max_{v \in V_s^{\text{sel}}} \|\mathbf{d}_{v_{t+1}} – \mathbf{d}_{v_t}\| \right) \nabla_\theta \log \pi_\theta$$
where $\mu_{\text{HA}}$ is the learning rate of HA-GNN.
Structure of HA-GNN
Laser networks use directional beams to establish high-speed communication links between UAV nodes, but their actual performance is susceptible to dynamic factors such as node mobility, physical obstructions, and atmospheric attenuation. Traditional schemes rely on fixed weight allocation for link resources and cannot adjust based on real-time channel states, making it difficult to effectively cope with environmental dynamic changes. The heterogeneous attention mechanism adaptively weights multi-dimensional features, providing a new solution for dynamic path selection optimization. Therefore, we adopt a heterogeneous attention mechanism to establish a nonlinear mapping relationship between node attributes and communication rates. Additionally, the input to the relay path selection problem is an unordered sequence of nodes, and the solution is a sequentially ordered node sequence. Hence, we use an encoder-decoder-like policy network.
We utilize a heterogeneous attention mechanism to learn representations. First, define the input data. The input node sequence is represented as $X = \{v_1, v_2, \dots, v_{2O+M}\}$, where each node $v_i$ contains the following three-dimensional features: two-dimensional coordinate identifiers and a binary state identifier $d_{\text{act}}$. If the node is a user (i.e., sender or receiver), the binary state identifier $d_{\text{act}}$ equals 1; otherwise, it equals 0; $i$ is the node index. Then, perform node embedding by projecting the original features into a latent space via linear projection, generating initial node embeddings $h_i^0$ with dimension $d_h = 128$. Then, through $L$ layers of attention layer stacking, the initial node embeddings are transformed into advanced node representations $h_i^L$. Each attention layer consists of multi-head heterogeneous attention sub-layers and feed-forward sub-layers.
Traditional Multi-Head Attention uses self-attention to learn the relationship between any two nodes in the input sequence. However, unlike classic routing problems such as TSP or CVRP where all nodes have homogeneous roles, the nodes in our problem have heterogeneous roles. To address this, we propose a dual-mode attention mechanism: target node self-attention for user nodes and relay UAVs, learning their internal feature representations separately, and cross-role cross-attention modeling the heterogeneous associations between user nodes and relay nodes.
To extract features, self-attention computes three vectors based on the input sequence: Query, Key, and Value. $h_i^{l-1}$ represents the node embedding of the $(l-1)$-th attention layer ($l \in L$). Map it using learnable parameter matrices:
$$\mathbf{q}_i = W_q h_i^{l-1}, \quad \mathbf{k}_i = W_k h_i^{l-1}, \quad \mathbf{v}_i = W_v h_i^{l-1}$$
where $\mathbf{q}_i, \mathbf{k}_i, \mathbf{v}_i$ are the query, key, and value for node $i$, respectively, $W_q \in \mathbb{R}^{d_h \times d_q}$, $W_k \in \mathbb{R}^{d_h \times d_k}$, and $W_v \in \mathbb{R}^{d_h \times d_v}$ are trainable parameters, $d_q, d_k, d_v$ represent the dimensions of query/key and value, respectively. The compatibility between node $i$ and node $j$ is computed as the scalar product of node $i$’s query and node $j$’s key, then further processed with the softmax function to obtain node weights as follows:
$$\mu_{ij} = \frac{\mathbf{q}_i \cdot \mathbf{k}_j}{\sqrt{d_q}}$$
$$a_{ij} = \frac{\exp(\mu_{ij})}{\sum_{j’} \exp(\mu_{ij’})}$$
Simultaneously, use a cross-role cross-attention mechanism to learn the relationship between relay nodes and users. Let $h_i^t$ be the embedding of the $i$-th relay node, and $h_j^c$ be the embedding of the $j$-th user. Correspondingly, the representations of query, key, and value are as follows:
$$\mathbf{q}_i^t = W_{qt} h_i^t, \quad \mathbf{k}_j^c = W_{kc} h_j^c, \quad \mathbf{v}_j^c = W_{vc} h_j^c$$
where all parameter matrices are trainable. Therefore, to measure the association strength between relay node $i$ and user $j$, we use an additive interaction form and generate an attention distribution via the softmax function:
$$\mu_{ij}^{tc} = \frac{\mathbf{q}_i^t \cdot \mathbf{k}_j^c}{\sqrt{d_q}}$$
$$a_{ij}^{tc} = \frac{\exp(\mu_{ij}^{tc})}{\sum_{j’} \exp(\mu_{ij’}^{tc})}$$
The head vector $h_i^m$ is the weighted sum of the outputs of self-attention and cross-attention, as follows:
$$h_i^m = \sum_j a_{ij} \mathbf{v}_j + \sum_j a_{ij}^{tc} \mathbf{v}_j^c$$
where user nodes only participate in cross-attention calculations, and their self-attention contributions are set to 0 to avoid information redundancy. Then, integrate the heterogeneous information captured by different attention heads through linear projection, specifically as follows:
$$h_i’ = \text{BN}(h_i^{l-1} + W_{\text{out}} \text{Cat}(h_i^1, h_i^2, \dots, h_i^M))$$
where $W_{\text{out}}$ is a trainable parameter, and Cat is the concatenation operator.
Finally, through $L$ layers of attention mechanism iteration, advanced node representations are obtained, and the graph representation is computed to represent global network information as follows:
$$h = \frac{1}{2O + M} \sum_{i=1}^{2O+M} h_i^L$$
The obtained graph representation will serve as input to the decoder. Then, through the decoder, the attention value for each node is computed. A higher attention value indicates that the node has a greater impact on the communication rate of the relay path.
Proposed UAV Position Optimization Sub-Algorithm Based on Unsupervised Learning
Position Optimization with UL-GNN
Supervised learning methods rely on large amounts of labeled data to train neural networks, but this is difficult to achieve in UAV positioning optimization. In the scenario of UAV positioning optimization, supervised learning faces the following challenges: high annotation costs—when the number of users and UAVs increases, the computational complexity of generating optimal position labels grows exponentially; and poor adaptability to dynamic environments—predefined optimal solutions难以覆盖复杂场景. This study adopts an unsupervised learning framework to train UL-GNN. Its core advantage is that it does not rely on costly annotated data, can learn features directly from sensor data, adapt to complex dynamic environments, and has higher robustness compared to supervised learning. First, the target value is jointly determined by $D$ and $X$, so the objective function is:
$$J(D, X | G) = \sum_{s \in V_s} \sum_{u \in V_u} f_{su}$$
The gradient of the objective function with respect to $D$ is:
$$\nabla_D J = \nabla_D \sum_{s \in V_s} \sum_{u \in V_u} f_{su}$$
The positions of the UAVs are dynamically generated by UL-GNN based on the network state, so our $D$ is obtained by:
$$D = \text{UL-GNN}_\theta(G)$$
Replacing $D$ in the gradient equation with the above expression, and computing the derivative of $D$ with respect to $\theta_{\text{UL-GNN}}$, the gradient of the objective function with respect to $\theta_{\text{UL-GNN}}$ is:
$$\nabla_{\theta_{\text{UL-GNN}}} J = \left. \nabla_D \sum_{s \in V_s} \sum_{u \in V_u} f_{su} \right|_{D=\text{UL-GNN}_\theta(G)} \cdot \nabla_{\theta_{\text{UL-GNN}}} \text{UL-GNN}_\theta(G)$$
Therefore, $\theta_{\text{UL-GNN}}$ can be updated via gradient descent as:
$$\theta_{\text{UL-GNN}} = \theta_{\text{UL-GNN}} + \mu_{\text{UL}} \left( \left. \nabla_D \sum_{s \in V_s} \sum_{u \in V_u} f_{su} \right|_{D=\text{UL-GNN}_\theta(G)} \cdot \nabla_{\theta_{\text{UL-GNN}}} \text{UL-GNN}_\theta(G) \right)$$
where $\mu_{\text{UL}}$ is the learning rate for updating UL-GNN, $\left. \nabla_D \sum_{s \in V_s} \sum_{u \in V_u} f_{su} \right|_{D=\text{UL-GNN}_\theta(G)}$ is the gradient of the UL-GNN output $D$, and $\nabla_{\theta_{\text{UL-GNN}}} \text{UL-GNN}_\theta(G)$ is the gradient computed by the chain rule.
Structure of UL-GNN
To achieve the goal of maximizing the communication rate for all user pairs, the system needs to dynamically identify user pairs with interaction requirements. The traditional method is to add an identifier to each user’s feature to distinguish associations, but it is difficult to adapt to dynamic scenarios. Therefore, we propose a method based on Long Short-Term Memory to solve this problem. LSTM is a special type of Recurrent Neural Network suitable for learning long sequences. First, the features of user pairs with communication requirements are constructed into two-dimensional time-series data as the input sequence to the LSTM. Then, after processing by this network, each communication node will obtain its corresponding embedding vector representation. It is worth noting that since LSTM processes only one user pair at a time, during backpropagation, the gradient only acts on the associated users, ensuring that only the two users with communication requirements produce parameter coupling, while computations between other nodes are independent.
After completing the feature embedding for all nodes, introduce a GNN to extract the relationships between users and UAVs. This module uses the position information of users and relay UAVs to enable relay UAVs to optimize their positions. Considering the energy consumption constraints of device movement, each UAV should stay as close as possible to its initial coordinates while ensuring service quality. For this purpose, we use an additive attention-based adaptive weight allocation mechanism to quantify the differential impact of different user pairs on the UAV. The specific process of UL-GNN is as follows:
Process information from neighbor nodes, encoding node interaction features through a Multi-Layer Perceptron:
$$m_{i,j}^l = \text{MLP}(x_i^l, x_j^l), \quad \forall (i,j) \in E$$
Use an additive attention mechanism to quantify the differential impact of users on UAVs:
$$\alpha_{ij} = \frac{\exp\left(\text{LeakyReLU}\left( \mathbf{W}_A [ \mathbf{W}_S x_i^l \| \mathbf{W}_S x_j^l ] \right)\right)}{\sum_{k \in \mathcal{N}(i)} \exp\left(\text{LeakyReLU}\left( \mathbf{W}_A [ \mathbf{W}_S x_i^l \| \mathbf{W}_S x_k^l ] \right)\right)}$$
where $\mathbf{W}_A$ and $\mathbf{W}_S$ are trainable parameters, mapping features to a shared space, combined with a nonlinear activation function to enhance relationship modeling capability.
Then, use an aggregation function to integrate weighted messages:
$$x_i^{l+1} = \text{MLP}\left(x_i^l, \rho\left( \{ \alpha_{ij} m_{i,j}^l : (i,j) \in E \} \right)\right)$$
To capture the temporal dependencies of interactions between user pairs and UAVs, use a bidirectional LSTM to extract contextual features:
$$x_i^b = \text{concat}\left(\text{LSTM}_1(x_i^l), \text{LSTM}_2(x_i^l)\right)$$
where $\text{LSTM}_1$ and $\text{LSTM}_2$ process the sequence in opposite directions.
Generate the final optimized position via linear projection:
$$\mathbf{d}_i = W_L x_i^b$$
where $\mathbf{d}_i$ is the optimized position of UAV $i$, and $W_L$ is a trainable parameter.
Simulation Setup and Result Analysis
Simulation Setup
This section conducts simulation verification of the proposed two-stage optimization framework and compares it with the Genetic Algorithm and Bellman-Ford algorithm, evaluating the characteristics of different algorithms from two dimensions: communication rate and convergence speed. Communication rate represents the average data transmission rate of all user pairs, and convergence speed represents the running time required for different algorithms to determine the optimal UAV positions and relay paths for each user pair. The Genetic Algorithm is used as the benchmark for the positioning model UL-GNN, and the Bellman-Ford algorithm is used as the benchmark for the relay path search model HA-GNN.
In the simulation, the positions of the Unmanned Aerial Vehicles are discretized into a 20×20 grid, where each grid is a possible deployment point. The Point-to-Point laser transmission channel is used, the physical layer uses the MAC_PTP protocol, the MAC layer uses the MAC_PTP protocol, and the traffic flow model is CBR. The simulation parameter configuration is shown in Table 1, and the laser communication parameters are shown in Table 2.
| Simulation Parameter | Value |
|---|---|
| Number of user pairs in small network | 10 |
| Number of user pairs in large network | 150 |
| Number of UAVs in small network | 4 |
| Number of UAVs in large network | 35 |
| HA-GNN learning rate | 0.001 |
| UL-GNN learning rate | 0.0001 |
| Maximum communication distance of UAV | 20 km |
| Hidden layer dimension | 128 |
| Weight parameter in attention mechanism | 0.2 |
| Channel Model Parameter | Value |
|---|---|
| Laser wavelength | 1500 nm |
| Transmit power | 10 dBm |
| Receiver sensitivity | -80 dBm |
| Beam divergence angle | 0.5 mrad |
| Atmospheric absorption coefficient | 0.2 dB/km |
| Atmospheric scattering coefficient | 0.1 dB/km |
Simulation Analysis
Figures 2 and 3 show the convergence speed and communication rate performance of the four algorithms in a simulation scenario with 10 pairs of user nodes and 4 relay UAVs. Due to the long convergence time of the genetic algorithm, it is demonstrated in Figure 3. From Figures 2 and 3, it can be seen that in small networks, when the number of users and UAVs is not large and the iteration time is sufficient, the genetic algorithm can obtain a local optimal solution, but the proposed UL-GNN-based algorithm exhibits performance close to the global optimum and has a faster convergence speed. In small networks, BF-UL-GNN converges slightly faster than HA-UL-GNN, but its communication rate is limited by the fixed path strategy.
As the scale of user pairs and relay UAVs increases, the complexity of the network topology grows, and the performance differences of different algorithms become significant. In large-scale networks, since the genetic algorithm requires a lot of time to obtain the optimal solution, it is impossible for the genetic algorithm to obtain a solution within an acceptable time range. Therefore, we introduce the MLP algorithm. As shown in Figure 4, as the number of users increases, the performance of the UL-GNN algorithm gradually declines, mainly due to the difficulty in extracting user associations in complex topologies, leading to performance degradation. The MLP algorithm uses predefined link priorities and fixed routing strategies and does not require explicit modeling of user relationships. Therefore, the increase in the number of users has little impact on it. As shown in Figure 5, the computation time of the BF algorithm is longer than that of the HA-GNN algorithm because the heterogeneous attention mechanism models the relationships between nodes and combines GPU parallel computing, greatly reducing computation time. Additionally, the MLP algorithm has shorter computation time compared to the UL-GNN algorithm because the MLP algorithm can be computed in parallel and uses GPU acceleration.
As shown in Figure 6, as the number of relay UAVs increases, the performance of the proposed UL-GNN algorithm improves more significantly compared to the MLP algorithm because the UL-GNN algorithm can extract the relationships between relay UAVs, thus achieving better performance. As shown in Figure 7, when the number of relay UAVs increases, the computation time of the BF algorithm is still longer than that of the HA-GNN algorithm because the BF algorithm needs to traverse all possible paths globally, and the computation time increases linearly with the number of UAVs, while the MLP algorithm, based on rule-based routing strategies and parallel computing architecture, still has shorter computation time than the UL-GNN algorithm. According to Figures 6 and 7, it is not difficult to see that changes in the number of UAVs have a greater impact on the communication rate than changes in the number of user pairs because each user pair selects its own relay path, so when the relay UAVs remain unchanged, the average communication rate of all users does not change much.
Furthermore, in large-scale laser networks, scalability is an important metric for evaluating algorithms. Therefore, we conducted simulation experiments in large-scale networks to verify the algorithm’s scalability in complex networks, setting the number of user pairs to 150 and the number of relay UAVs to 35. In large-scale networks, since both the GA algorithm and the BF algorithm require a lot of time to obtain a solution, it is impossible for either the GA algorithm or the BF algorithm to obtain a solution within an acceptable time range. Therefore, we only compared with the MLP algorithm. In large-scale networks, the communication rate of the HA-UL-GNN algorithm is 3.22 Mb/s, and the time is 22.08 seconds, while the communication rate of the MLP algorithm is 2.15 Mb/s, and the time is 25.88 seconds.
In large-scale laser networks, HA-UL-GNN not only has a higher transmission rate but also shorter running time, indicating that the algorithm can process more data per unit time and has higher network throughput. This is because the heterogeneous attention mechanism of the HA-UL-GNN algorithm can dynamically weight features, distinguish the functional differences of user and UAV nodes through attention scores, and only model interactions of key nodes, reducing computational complexity and significantly improving convergence efficiency, while the MLP algorithm uses rule-based routing strategies and cannot extract network features to optimize the problem. Additionally, the message-passing mechanism of graph neural networks integrates multi-hop node features through neighborhood aggregation and utilizes GPU parallelization to handle large-scale matrix multiplication, greatly improving algorithm performance.
Conclusion
This study investigates the problem of maximizing traffic in UAV laser communication networks and proposes a two-stage optimization framework based on Graph Neural Networks. By decoupling the traffic maximization problem into two sub-problems—UAV positioning optimization and relay path optimization—we jointly solve them using the HA-GNN relay path optimization algorithm and the UL-GNN UAV positioning optimization algorithm, respectively. Simulation results demonstrate that the performance of this algorithm is superior to BF-UL-GNN, HA-GNN-GA, BF-GA, HA-GNN-MLP, and BF-MLP, and it exhibits certain scalability in large-scale laser networks. In future research, we will consider multi-modal service hybrid scheduling and payload constraint issues.
