Optimization Algorithm for UAV-Assisted Federated Learning Based on Stackelberg Game

In recent years, the rapid development of artificial intelligence (AI) and the Internet of Things (IoT) has led to an increasing demand for real-time processing of massive data, driving the advancement of edge computing technologies. Traditional centralized computing models, limited by bandwidth, computational resources, and communication delays, struggle to meet the requirements for low-latency and high-efficiency data processing. Consequently, distributed computing paradigms have gained attention, with Federated Learning (FL) emerging as a novel distributed machine learning approach. FL addresses data silo issues while preserving data privacy, making it applicable in sensitive domains such as healthcare, finance, and mobile devices. By leveraging the computational power of edge devices, FL transforms data value into model capabilities, enabling collaborative model optimization and further advancing AI with broad application prospects.

To enhance system performance, extensive research has been conducted on federated learning. However, existing methods often impose high demands on the computational capabilities and participation enthusiasm of ground users, which affects overall system efficiency. To address this, we propose a game optimization algorithm in Unmanned Aerial Vehicle (UAV)-assisted federated learning systems. We construct a multi-UAV, multi-user collaborative federated learning system model based on blockchain, introducing a blockchain-based incentive mechanism to reward both UAVs and users. To improve data collection efficiency, we employ the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) method to cluster ground users. UAVs incentivize ground users to upload data through rewards, perform local training after data collection, and elect a UAV to complete global model aggregation. We build a data trading model based on Stackelberg game theory, where UAVs act as leaders to formulate reward strategies, and ground users act as followers to determine optimal data upload schemes. The follower utility function consists of rewards allocated by UAVs and the energy consumption of data transmission, while the leader utility function includes rewards from local training models, rewards allocated to ground users, hovering energy consumption, and training energy consumption. We use the simulated annealing algorithm to optimize the follower utility function, obtaining the optimal solution for transmission power. The leader utility function optimization problem is divided into three subproblems, solved using particle swarm optimization, theoretical derivation, and the golden section method, respectively, to obtain the optimal UAV positions, local training CPU frequencies, and reward allocation ratios. Finally, by solving the Stackelberg game equilibrium, we derive the optimal UAV reward allocation ratio and ground user data upload ratio. Simulation results demonstrate that our algorithm effectively improves data collection efficiency and federated learning performance.

System Model

Our system model comprises N randomly distributed ground users and M UAVs. The horizontal position of ground user n is denoted as $ q_n^{\text{UE}} = \{ x_n^{\text{UE}}, y_n^{\text{UE}} \} $, and the horizontal position of UAV m is denoted as $ q_m^{\text{UAV}} = \{ x_m^{\text{UAV}}, y_m^{\text{UAV}} \} $, both located in a region $ \mathbb{R}^2 $. UAVs operate at a fixed safe height H above the target area, collecting data from ground users and performing federated learning. To reduce flight energy consumption during data collection, we first cluster the ground users, deploying one UAV per cluster for data collection. After collection, UAVs return to the base and use the collected cluster user data for FL local model training. Then, a UAV elected in each round completes global model aggregation. To ensure data transaction security, we incorporate blockchain technology, adopting a tightly coupled architecture where UAVs not only undertake federated learning local training tasks but also participate as blockchain nodes in consensus verification and data security.

Federated Learning

Let the local model parameters of UAV m be parameterized as $ \omega_m $. The local training model for UAV m participating in federated learning is defined by the loss function measuring the difference between the local model output $ T(x; \omega_m) $ and the true label y on the local dataset $ D_m $:

$$F_m(\omega_m) = \frac{1}{D_m} \sum_{i=1}^{D_m} f(x_{m,i}, y_{m,i}; \omega_m), \quad \forall m \in M$$

where $ D_m = \sum_{n \in N_m} D_n’ $, representing the total data uploaded by users in the m-th cluster, and $ f(x_{m,i}, y_{m,i}; \omega_m) $ denotes the loss function associated with data point $ (x_{m,i}, y_{m,i}) $.

To minimize the loss function, we employ the Adam optimizer solved in parallel by each UAV. Each device updates the local model at the start of global iteration j+1 as:

$$\omega_{m,j+1} = \omega_{m,j} – \eta \nabla F_m(\omega_{m,j})$$

After local training, a UAV is selected from the participating UAVs as the aggregation UAV. Each UAV generates its model and sends it to the aggregation UAV for global model aggregation. The global model loss is defined as:

$$F(\omega) = \frac{1}{D} \sum_{m=1}^{M} D_m F_m(\omega_m) = \frac{1}{D} \sum_{m=1}^{M} \sum_{i=1}^{D_m} f(x_{m,i}, y_{m,i}; \omega_m)$$

where $ D = D_1 \cup D_2 \cup \cdots \cup D_M $. Thus, the FL process can be represented as the following optimization problem:

$$\min_{\omega} F(\omega)$$

To address issues with the traditional federated averaging algorithm (FedAvg) in aggregating local models, which is strongly influenced by local model quality and data authenticity, we incorporate local model accuracy and data volume to assist aggregation. The optimized model aggregation method is:

$$\omega = \frac{\sum_{m=1}^{M} \mu_m D_m \omega_m}{\sum_{m=1}^{M} \mu_m D_m}$$

In practice, a threshold $ \delta $ can be set to indicate that the training result meets expectations. When $ F(\omega) \leq \delta $, the model has reached a good performance level, and the task is completed. Alternatively, a global iteration count $ N_{\text{global}} $ can be set, and training ends after reaching this count. We adopt the latter approach for federated learning.

Blockchain-Based Incentive Mechanism

In the incentive mechanism, the immutability of blockchain ensures that incentive rules cannot be arbitrarily modified, and smart contracts automatically allocate rewards, reducing human intervention. Therefore, we introduce a blockchain-based incentive mechanism. All UAVs pre-know the total task reward and participate as blockchain nodes in reward distribution. Smart contracts compute relevant data, execute reward allocation, and store transaction data in blocks.

After task completion, according to the reward mechanism, UAVs distribute the total reward. The reward obtained by a UAV is based on model accuracy and data volume; higher accuracy and more training data result in higher rewards, ensuring high-quality participation. For UAV m, the reward after participation is:

$$R_m = \frac{\mu_m D_m}{\sum_{i=1}^{M} \mu_i D_i} R$$

where R is the total reward for the federated learning task.

Communication Model

The set of all UAVs is denoted as M. The cluster covered by UAV m is denoted as cluster m, with the number of ground users in cluster m denoted as $ N_m $, and the set of ground users in cluster m denoted as $ N_m $. The data volume of each user is denoted as $ D_n $. For user n, the uploaded data volume is:

$$D_n’ = a_n D_n$$

where $ a_n $ is the data upload ratio of user n, $ a_n \in [0,1] $. When $ a_n = 0 $, no data is uploaded; when $ a_n = 1 $, all local data of user n is uploaded. When $ a_n \in (0,1) $, random sampling is used to select data for upload.

Communication between ground users and UAVs uses Frequency Division Multiple Access (FDMA). The total communication bandwidth for each UAV is B, equally allocated to covered ground users. Thus, the bandwidth for user n is:

$$b_n = \frac{B}{N_m}$$

The distance between UAV m and ground user n is:

$$d_{n,m} = \sqrt{(x_n^{\text{UE}} – x_m^{\text{UAV}})^2 + (y_n^{\text{UE}} – y_m^{\text{UAV}})^2 + H^2}$$

Assuming the maximum transmission distance between UAV and ground user is $ d_{\text{max}} $, then:

$$d_{n,m} \leq d_{\text{max}}$$

The channel gain between UAV m and ground user n is:

$$g_{n,m} = h_0 \left( \frac{d_0}{d_{n,m}} \right)^\alpha$$

where $ h_0 $ is the channel gain at reference distance $ d_0 = 1 $ m, and $ \alpha $ is the path loss exponent. Therefore, the data transmission rate (in bits per second) between UAV m and ground user n can be further expressed as:

$$r_{n,m} = b_n \log_2 \left(1 + \frac{g_{n,m} P_n}{\sigma^2}\right)$$

where $ b_n $ is the bandwidth allocated by UAV m to user n, $ P_n $ is the transmit power of user equipment n, and $ \sigma^2 $ is the power of additive white Gaussian noise.

Latency and Energy Consumption Calculation

(1) Latency: The total latency in data collection and model training includes user latency and UAV latency. User latency is the time for each user to upload data, and UAV latency includes data collection latency and local training latency.

User latency is the time for user n to upload data, expressed as:

$$T_{n,m}^{\text{up}} = \frac{D_n’}{r_{n,m}}$$

The maximum data collection time is limited to $ T_{\text{max}}^{\text{up}} $, so:

$$T_{n,m}^{\text{up}} \leq T_{\text{max}}^{\text{up}}$$

The latency for UAV m is the data collection latency, which is the maximum latency among users in its coverage uploading data, expressed as:

$$T_m^{\text{up}} = \max_{n \in N_m} \{ T_{n,m}^{\text{up}} \}$$

The data volume collected by UAV m is the sum of data uploaded by covered ground users:

$$D_m = \sum_{n \in N_m} D_n’$$

The total latency for local training of UAV m is expressed as:

$$T_m^{\text{loc}} = N_m^{\text{global}} N_m^{\text{local}} \frac{C_m D_m}{f_m}$$

where $ N_m^{\text{global}} $ is the number of global iterations, $ N_m^{\text{local}} $ is the number of local iterations, $ C_m $ represents the number of CPU cycles required by UAV m to compute each sample, and $ f_m $ is the CPU frequency of UAV m for local training.

(2) Energy Consumption: The energy consumption of user n during data upload is:

$$E_{n,m}^{\text{up}} = P_n T_{n,m}^{\text{up}}$$

The total energy consumption for local training of UAV m over $ N_m^{\text{global}} $ global iterations is:

$$E_m^{\text{loc}} = N_m^{\text{global}} N_m^{\text{local}} C_m D_m \zeta_m f_m^2$$

where $ N_m^{\text{local}} $ is the number of local iterations, $ C_m $ represents the number of CPU cycles required by UAV m per bit, $ f_m $ is the CPU frequency of UAV m for local training, and $ \zeta_m $ is the energy coefficient of CPU usage.

UAVs hover in the air during data collection, generating hovering energy consumption. Let k denote the hovering power of a UAV, i.e., energy consumption per unit time while hovering. The hovering energy consumption during data collection is:

$$E_m^{\text{hover}} = k T_m^{\text{up}}$$

Stackelberg Game Model

We model the data trading behavior between UAVs and ground users using Stackelberg game theory. Stackelberg game is a leader-follower game model where one party (the leader) makes decisions first, and the other party (the follower) reacts after observing the leader’s decisions. We jointly consider the interests of both UAVs and ground users, constructing a Stackelberg game model where UAVs are leaders and ground users are followers. Leader UAVs determine reward allocation ratio decisions based on the channel transmission environment and their own needs for a given data volume. When ground users receive data collection requests from UAVs, they optimize their data upload ratio based on the allocated rewards to increase their gains in the collaboration process.

Utility Functions

(1) Ground User Utility Function: After each training completion, UAV m receives a reward $ R_m $ from the total reward R based on the blockchain incentive mechanism. Assume UAV m allocates a portion $ \beta_m $ of its reward to the covered ground users. Thus, the total income for users in cluster m is $ \beta_m R_m $. For ground users covered by UAV m, we consider social relationship attributes and mutual influence among users. For a particular user, their income is affected not only by their own uploaded data volume but also by the uploaded data volumes of other users in the same cluster. The data contribution degree among users determines the reward distribution ratio. Therefore, the income of user n is defined as $ \frac{D_n’}{\sum_{n \in N_m} D_n’} \beta_m R_m $. Considering user income and transmission energy consumption, the utility function of ground user n is defined as:

$$U_n = \frac{D_n’}{\sum_{n \in N_m} D_n’} \beta_m R_m – \lambda_1 E_{n,m}^{\text{up}}$$

where $ \lambda_1 $ represents the unit cost of energy consumption for data transmission, reflecting the user’s emphasis on transmission energy consumption.

(2) UAV Utility Function: Since participating UAVs allocate a portion of their reward $ R_m $ to ground users, their income is defined as $ (1 – \beta_m) R_m $. The utility function of a UAV is composed of the reward obtained from participating in federated learning, the energy consumption of local training, and hovering energy consumption. For any participating UAV m, its utility function can be expressed as:

$$U_m = (1 – \beta_m) R_m – \lambda_2 E_m^{\text{loc}} – \lambda_3 E_m^{\text{hover}}$$

where $ \lambda_2 $ represents the unit cost of energy consumption for local model training, and $ \lambda_3 $ represents the unit cost of hovering energy consumption, reflecting the UAV’s emphasis on training energy consumption and hovering energy consumption, respectively.

Problem Description

In the Stackelberg game, there is a direct interest competition between leader UAVs and follower ground users. UAVs have the power to decide the reward allocation ratio, which directly affects the amount of data ground users choose to offload to each UAV. As the amount of data uploaded by ground users changes, UAVs dynamically adjust their reward allocation ratios to maximize their gains in the collaboration process; conversely, the adjusted reward allocation ratios influence ground users’ decisions on data upload ratios. Both parties game based on the reward allocation ratio and data upload ratio. Below, we describe the optimization problems for followers and leaders separately.

(1) Follower Optimization Problem: The utility function of ground user n consists of the reward based on their data contribution and the energy consumption of data transmission. We optimize the association between user and UAV $ z_{n,m} $, transmit power $ P_n $, and data upload ratio $ a_n $ to maximize the user’s utility. The optimization problem for ground user n can be expressed as:

$$
\begin{aligned}
\text{P1: } & \max_{P_n, a_n, z_{n,m}} U_n = \frac{D_n’}{\sum_{n \in N_m} D_n’} \beta_m R_m – \lambda_1 E_{n,m}^{\text{up}} \\
\text{s.t. } & \text{C1: } 0 \leq a_n \leq 1, \quad \forall n \in N \\
& \text{C2: } P_n^{\min} \leq P_n \leq P_n^{\max}, \quad \forall n \in N \\
& \text{C3: } d_{n,m} \leq d_{\max}, \quad \forall n \in N \\
& \text{C4: } 0 < T_{n,m}^{\text{up}} \leq T_{\max}^{\text{up}}, \quad \forall n \in N \\
& \text{C5: } z_{n,m} \in \{0,1\}, \quad \sum_{m=1}^{M} z_{n,m} = 1, \quad \forall n \in N \\
& \text{C6: } U_n > 0, \quad \forall n \in N
\end{aligned}
$$

where C1 is the range of the user’s data upload ratio, C2 is the limitation on user transmit power, C3 indicates that the association between user and UAV needs to satisfy a certain distance, C4 limits the time spent by the user transmitting data, C5 indicates that a user can only be associated with one UAV, and C6 indicates that the income must be positive for the user to participate in contribution.

(2) Leader Optimization Problem: For UAV m, the UAV’s income is the remaining reward after allocating a portion $ \beta_m $ to users, the energy consumption for local training, the hovering energy consumption during data collection, and the hovering energy cost during local training. When the collected data volume $ D_m $ is fixed, a lower reward allocation to users increases the UAV’s own收益. However, a lower reward allocation ratio必定 affects the data upload ratio decision of ground user followers. Therefore, in the game process, the UAV’s reward allocation ratio is influenced by various factors, such as the UAV’s position relative to ground users and the local training CPU frequency. The optimization problem for UAV m can be expressed as:

$$
\begin{aligned}
\text{P2: } & \max_{q_m^{\text{UAV}}, f_m, \beta_m} U_m = (1 – \beta_m) R_m – \lambda_2 E_m^{\text{loc}} – \lambda_3 E_m^{\text{hover}} \\
\text{s.t. } & \text{C7: } f_m^{\min} \leq f_m \leq f_m^{\max}, \quad \forall m \in M \\
& \text{C8: } \| q_m^{\text{UAV}} – q_n^{\text{UE}} \| \leq d_{\max}, \quad \forall n \in N_m \\
& \text{C9: } 0 < \beta_m < 1, \quad \forall m \in M \\
& \text{C10: } T_m^{\text{loc}} \leq T^{\text{loc}}, \quad \forall m \in M
\end{aligned}
$$

where $ d_{\max} $ represents the maximum communication distance of the UAV, $ q_m^{\text{UAV}} $ is the horizontal position of UAV m. C7 limits the CPU frequency of UAV participation in federated learning training, C8 represents the range of UAV positions, C9 indicates that the reward allocated to users cannot exceed the reward received by the UAV, and C10 indicates that the total latency of UAV local training must not exceed $ T^{\text{loc}} $.

Stackelberg Game Equilibrium

Problems P1 and P2 constitute a Stackelberg game. UAVs as leaders first determine the reward parameter $ \beta_m $, and ground users as followers make data upload decisions $ a_n $ (i.e., their uploaded data volume ratio) based on the UAVs’ reward allocation. The decisions of ground users affect the amount of data UAVs obtain from users, and the reward allocation of UAVs determines how much data users upload. Through the Stackelberg game framework, UAVs can guide users to make decisions that improve the overall system utility by reasonably designing rewards $ \beta_m $, achieving a良性互动 between UAVs and ground users. Both parties game based on rewards and available data.

Considering the given association coefficients between users and UAVs $ \{ z_{n,m} \}_{n \in N_m} $, UAV position $ q_m^{\text{UAV}} $, local training CPU frequency $ f_m $, and ground user transmit power $ P_n $, we optimize the remaining variables of UAVs and ground users to maximize their utilities based on this. We define the Stackelberg game equilibrium as follows:

Definition 1 (Stackelberg Game Equilibrium): Suppose $ \{ z_{n,m} \}_{n \in N_m} $ represents the association coefficients between UAV m and all users in the cluster, $ q_m^{\text{UAV}} $ represents the position of UAV m, $ f_m $ represents the local training CPU frequency of UAV m, $ P_n $ represents the transmit power of ground user n, $ \beta_m^* $ is the optimal reward allocation ratio decision of UAV m, and $ a_n^* $ is the optimal data upload ratio decision of user n. Among all solution cases, the Stackelberg game equilibrium can be represented as $ \{ (\beta_m^*, a_n^*) \}_{n \in N_m} $, which satisfies the following conditions:

$$U_m(\beta_m^*, \{ a_n^* \}_{n \in N_m}) \geq U_m(\beta_m, \{ a_n^* \}_{n \in N_m})$$

$$U_n(a_n^*, \beta_m^*) \geq U_n(a_n, \beta_m^*), \quad \forall n \in N_m$$

Formula (25) indicates that given the followers’ optimal solution, $ \{ (\beta_m^*, a_n^*) \}_{n \in N_m} $ can maximize the leader’s utility. Formula (26) indicates that given the leader’s optimal solution, $ (\beta_m^*, a_n^*) $ can maximize the follower’s utility.

Theorem 1: Given the association coefficients between users and UAVs $ \{ z_{n,m} \}_{n \in N_m} $, UAV position $ q_m^{\text{UAV}} $, UAV local training CPU frequency $ f_m $, and ground user transmit power $ P_n $, there exists a unique Nash equilibrium for the decisions of UAVs and ground users.

Proof: Assume the association coefficient between user n and UAV m $ z_{n,m} = 1 $. Then the utility function of user n under the influence of UAV m’s reward allocation can be expressed as:

$$U_n(a_n, \beta_m) = \frac{\beta_m R_m a_n D_n}{a_n D_n + \sum_{i \in N_m, i \neq n} a_i D_i} – \lambda_1 P_n \frac{a_n D_n}{b_n \log_2 \left(1 + \frac{h_0 d_0^\alpha P_n}{\sigma^2 d_{n,m}^\alpha}\right)}$$

By deriving the first-order and second-order derivative functions of the above equation, we can show that the second derivative is always negative, proving that $ U_n(a_n, \beta_m) $ is a strictly concave function, thus having an optimal solution denoted as $ a_n^* $ for user n under UAV m’s decision $ \beta_m $. Similarly, for UAV m, the utility function $ U_m(\beta_m, a_n^*) $ can be shown to be strictly concave, ensuring a unique optimal decision $ \beta_m^* $. Therefore, given the fixed parameters, the Stackelberg game has a unique Nash equilibrium. □

Optimization Algorithms

For the follower’s nonlinear multi-variable optimization problem, we decompose it into two subproblems: user-UAV association algorithm and user upload data ratio and transmit power optimization algorithm, solved using traversal and simulated annealing algorithms, respectively. For the leader’s optimization problem, we decompose it into three subproblems: UAV position optimization, UAV computational resource optimization, and UAV reward allocation ratio optimization, solved using particle swarm optimization, theoretical derivation method, and golden section method, respectively. By iterating to reach Nash equilibrium, we obtain the optimal solutions for all parameters. Based on the solving algorithms for each layer, we further provide a joint optimization algorithm.

Follower Optimization Problem Solving

This section proposes corresponding solving algorithms for the ground user optimization problem. Given the UAVs’ reward allocation ratio decision $ \beta_m $, we first solve the user association problem, then for the ground user transmit power decision and data upload ratio decision problem, we adopt a heuristic optimization algorithm based on simulated annealing.

User-UAV Association Algorithm

The user-UAV association process分为两步: first, cluster randomly distributed ground users to初步 establish the association relationship between ground users and UAVs; then, for users in overlapping coverage areas of multiple UAVs, use a traversal algorithm to further determine the user-UAV association relationship.

(1) User Clustering Algorithm: We use an improved DBSCAN algorithm to cluster ground users to achieve low-complexity device cluster matching, and improve the clustering radius $ R_D $ based on the cluster radius size after clustering. To reasonably plan the number and position distribution of UAVs, we assume that one UAV covers at least 3 ground users. We use the DBSCAN method to cluster ground users. First, use the original DBSCAN method for clustering, then judge the radius of each cluster. When the cluster diameter is less than or equal to the UAV coverage diameter $ 2R_{\max} $, the cluster is the minimum cluster, and one UAV can be deployed for it; when the cluster diameter is greater than the UAV coverage diameter $ 2R_{\max} $, reduce the clustering radius $ R_D $ and re-cluster until all cluster radii are less than or equal to the UAV coverage radius. Specific steps are shown in Algorithm 1.

Algorithm 1: User Clustering Algorithm

Input: Ground user set $ \{ q_n^{\text{UE}} \}_{n \in N} $, user transmit power $ \{ P_n \}_{n \in N} $, total bandwidth B, UAV movement range $ \mathbb{R}^2 $, maximum carrying user数量 C, UAV coverage radius $ R_{\max} $, parameters in DBSCAN clustering algorithm (clustering radius $ R_D $, minimum sample数 $ N_{\text{MinPts}} $)

Output: UAV position set $ \{ q_m^{\text{UAV}} \}_{m \in M} $, clustering result $ N_m $

Initialize: $ R_D = 70 $
Clustering: Use the original DBSCAN method to cluster N ground users,得到 $ \{ N_1, N_2, \ldots, N_M \} $, where M is the number of clusters.
for m in $ \{ N_1, N_2, \ldots, N_M \} $ do
$ d_m = \max_{i,j \in N_m} \| q_i^{\text{UE}} – q_j^{\text{UE}} \| $
if $ d_m > 2 R_{\max} $ then
$ R_D = R_D – 5 $
Go to step 2
end if
end for
for m in $ \{ N_1, N_2, \ldots, N_M \} $ do
Compute cluster center $ c_m = \frac{1}{|N_m|} \sum_{n \in N_m} q_n^{\text{UE}} $
Each cluster center as UAV initial position $ q_m^{\text{UAV}} \leftarrow c_m $
end for
return $ \{ q_m^{\text{UAV}} \} $, $ N_m $

(2) User Association Optimization in Overlapping Coverage Areas: When a user is within the coverage of two or more UAVs, to optimize the user’s收益, based on the determined transmit power $ \{ P_n^* \}_{n \in N_m} $ and data upload ratio $ \{ a_n^* \}_{n \in N_m} $, we need to further optimize the user-UAV association. The optimization problem is expressed as:

$$
\begin{aligned}
\text{P1-1: } & \max_{z_{n,m}} U_n = \frac{D_n’}{\sum_{n \in N_m} D_n’} \beta_m R_m – \lambda_1 E_{n,m}^{\text{up}} \\
\text{s.t. } & \text{C3, C4, C5, C6}
\end{aligned}
$$

We use a traversal method to solve optimization problem P1-1. For ground user n, traverse all UAVs that satisfy the constraints, compute $ U_n $, and take the association method corresponding to the maximum $ U_n $ to determine the best pairing between ground user n and UAV m. The specific process is shown in Algorithm 2.

Algorithm 2: User-UAV Association Optimization Algorithm

Input: UAV reward allocation ratio $ \beta_m $, transmit power $ \{ P_n^* \}_{n \in N_m} $ and data upload ratio $ \{ a_n^* \}_{n \in N_m} $ of all users in the cluster, maximum upload time $ T_{\max}^{\text{up}} $, maximum communication distance between UAV and user $ d_{\max} $

Output: Best association $ z_{n,m} $ between ground user n and UAV m

Initialize: $ U_n^{\max} = 0 $, $ z_{n,m} = 0 $
for m from 1 to M do
if constraints C3, C4, C5, C6 hold then
Compute objective function value $ U_n(z_{n,m}) $
if $ U_n(z_{n,m}) > U_n^{\max} $ then
$ z_{n,m} = 1 $
$ U_n^{\max} = U_n(z_{n,m}) $
end if
end if
end for
return $ z_{n,m} $

User Upload Data Ratio and Transmit Power Optimization Algorithm

After determining the user-UAV association, constraints C3 and C5 of problem P1 are satisfied. Therefore, problem P1 can be transformed into:

$$
\begin{aligned}
\text{P1-2: } & \max_{P_n, a_n} U_n = \frac{D_n’}{\sum_{n \in N_m} D_n’} \beta_m R_m – \lambda_1 E_{n,m}^{\text{up}} \\
\text{s.t. } & \text{C1, C2, C4, C6}
\end{aligned}
$$

We use the simulated annealing algorithm to solve this. Simulated Annealing (SA) is a global optimization algorithm inspired by the physical annealing process. In annealing, solids at high temperature have molecules in a disordered state; as temperature gradually decreases, molecules tend to become ordered and eventually reach the lowest energy state. This process can be analogized to solving optimization problem P1-2, where the solution state consists of variables $ P_n $ and $ a_n $, each state representing a set of possible parameter values. Randomly initialize $ P_n $ and $ a_n $, compute the current objective function value $ U_n $, and set the initial temperature $ T_{\max} $. Then, introduce small disturbances based on the current solution to generate a new solution $ (P_n’, a_n’) $, compute the new objective function value $ U_n’ $. If the new solution is better than the current solution, accept the new solution; if the new solution is worse than the current solution, accept it with a certain probability to avoid local optima. Simultaneously, use an exponential cooling strategy to gradually reduce the temperature. Finally, when the temperature drops to a set threshold or there is no improvement after several consecutive iterations, the algorithm terminates, outputting the optimal solution $ (P_n^*, a_n^*) $. By randomly perturbing the solution in the search space and accepting worse solutions with a certain probability, it avoids falling into local optima and eventually converges to the global optimal solution.

By using the simulated annealing algorithm, we jointly solve $ a_n $ and $ P_n $ to maximize the utility function of user n. Specific steps are shown in Algorithm 3.

Algorithm 3: User Data Upload Ratio and Transmit Power Optimization Algorithm Based on Simulated Annealing

Input: UAV reward allocation ratio $ \beta_m $, data upload ratios $ \{ a_i \}_{i \in N_m, i \neq n} $ of other users in the cluster except user n, transmit power range $ [P_n^{\min}, P_n^{\max}] $, maximum upload time $ T_{\max}^{\text{up}} $

Output: User upload ratio $ a_n $, transmit power $ P_n $

Initialize: initial temperature $ T_{\max} $ and final temperature $ T_{\min} $, maximum iteration次数 max_iter, cooling rate cooling_rate, penalty factor $ \rho $, randomly generate initial solution $ (P_n, a_n) $, iteration=0
while iteration <= max_iter and $ T \geq T_{\min} $ do
Generate new solution $ (P_n’, a_n’) $ by perturbation
if $ (P_n’, a_n’) $ satisfies constraints then
Compute objective function value $ U_n(P_n’, a_n’) $
Compute $ \Delta U = U_n(P_n’, a_n’) – U_n(P_n, a_n) $
if $ \Delta U < 0 $ or random() < exp(-$ \Delta U / T $) then
Accept new solution
$ (P_n, a_n) \leftarrow (P_n’, a_n’) $
end if
Update temperature T = cooling_rate * T
end if
Iteration = iteration + 1
end while
return $ (P_n^*, a_n^*) $

Leader Optimization Problem Solving

Since the objective function in problem P2 contains $ \max_{n \in N_m} \{ T_{n,m}^{\text{up}} \} $, we introduce an auxiliary variable $ \nu_m = \max_{n \in N_m} \{ T_{n,m}^{\text{up}} \} $, adding constraint C11:

C11: $ \nu_m \geq T_{n,m}^{\text{up}}, \quad \forall n \in N_m $

Thus, the original optimization problem P2 is rewritten as P3:

$$
\begin{aligned}
\text{P3: } & \max_{q_m^{\text{UAV}}, f_m, \beta_m, \nu_m} U_m = (1 – \beta_m) R_m – \lambda_2 E_m^{\text{loc}} – \lambda_3 k \nu_m \\
\text{s.t. } & \text{C7, C8, C9, C10, C11}
\end{aligned}
$$

Since problem P3 is non-convex and cannot be directly solved, we decouple P3 into 3 subproblems: UAV position optimization subproblem, UAV local training CPU frequency subproblem, and UAV reward allocation ratio subproblem, solved using particle swarm optimization, direct solving method, and golden section method, respectively.

UAV Position Optimization Subproblem

UAV position affects the hovering time during data collection, thus affecting hovering energy consumption. Therefore, when the UAV reward allocation ratio $ \beta_m $ and local training CPU frequency $ f_m $ are determined, optimize UAV position $ q_m^{\text{UAV}} $. Problem P3 transforms into:

$$
\begin{aligned}
\text{P3-1: } & \min_{q_m^{\text{UAV}}, \nu_m} \nu_m \\
\text{s.t. } & \text{C8, C10}
\end{aligned}
$$

Particle Swarm Optimization (PSO) is a global optimization method based on swarm intelligence, inspired by the cooperative behavior of bird flocks or fish schools. In PSO, each solution is regarded as a particle, particles move in the search space according to a certain velocity, and continuously update positions through individual and group experience, eventually converging to the optimal solution. In this optimization problem, our goal is to minimize the maximum data collection latency of the UAV. Each particle p consists of UAV position $ q_m^{\text{UAV}} $ and data collection maximum latency $ \nu_m $. Randomly initialize a group of particles in the search space, each particle’s position $ (q_m^{\text{UAV}}, \nu_m) $ set within a reasonable range, each particle’s velocity $ v_p $ also randomly initialized. Then for each particle, compute the fitness function, and check if all constraints are satisfied. If not, use a penalty function for adjustment. Each particle updates velocity according to the following formula:

$$v_p = \omega v_p + c_1 r_1 (p_{\text{best}} – p) + c_2 r_2 (g_{\text{best}} – p)$$

where $ \omega $ is the inertia weight, controlling the particle’s search ability; $ c_1 $ and $ c_2 $ are learning factors, determining the degree to which the particle is influenced by its own and the global optimal solution; $ r_1 $ and $ r_2 $ are random numbers, used to increase randomness; $ p_{\text{best}} $ is the historical optimal solution of the current particle; $ g_{\text{best}} $ is the historical optimal solution of the entire swarm.

The particle’s position is updated by $ p = p + v_p $, and boundary checks are performed on $ q_m^{\text{UAV}} $ to ensure satisfaction of communication coverage constraints. After each iteration, update the global optimal solution $ g_{\text{best}} $ until the maximum iteration count is reached or the optimal solution converges.

We use the particle swarm algorithm to solve problem P3-1, enabling the UAV to optimize its position to minimize hovering time under fixed data collection conditions, thereby minimizing hovering energy consumption during data collection. Specific steps are shown in Algorithm 4.

Algorithm 4: UAV Position and Collection Time Optimization Algorithm Based on Particle Swarm Optimization

Input: Ground user set $ \{ q_n^{\text{UE}} \}_{n \in N_m} $, user upload ratio $ \{ a_n \}_{n \in N_m} $, transmit power $ P_n $, total bandwidth B, noise power $ \sigma^2 $, signal attenuation factor $ \alpha $, maximum communication distance between UAV and user $ d_{\max} $

Output: UAV position $ q_m^{\text{UAV}} $ and collection time $ \nu_m $

Initialize: particle swarm size $ N_{\text{particles}} $, maximum iteration次数 $ N_{\max} $, inertia weight $ \omega $, cognitive coefficient $ c_1 $, social coefficient $ c_2 $, particle position $ p_i = (x_i, y_i, \nu_i) $ and velocity $ v_i = (v_{xi}, v_{yi}, v_{\nu i}) $, where $ i = 1,2,\ldots,N_{\text{particles}} $
Compute each particle’s objective function value $ \nu_m(p_i) $
Set each particle’s individual optimal solution $ p_{\text{best},i} $ and global optimal solution $ p_{\text{best,global}} $
for each iteration t from 1 to $ N_{\max} $ do
for each particle i from 1 to $ N_{\text{particles}} $ do
Compute current particle’s objective function value $ \nu_m(p_i) $
if $ \nu_m(p_i) < \nu_m(p_{\text{best},i}) $ then
Update individual optimal solution $ p_{\text{best},i} = p_i $
end if
if $ \nu_m(p_i) < \nu_m(p_{\text{best,global}}) $ then
Update global optimal solution $ p_{\text{best,global}} = p_i $
end if
Update particle velocity and position
Check updated particle position $ p_i(t+1) $ satisfies constraints
end for
end for
return global optimal solution $ p_{\text{best,global}} $, corresponding objective function value $ \nu_m(p_{\text{best,global}}) $

UAV Computational Resource Optimization Subproblem

The UAV’s computational frequency determines the energy consumption of local training. When the UAV reward allocation ratio and position are determined, the UAV computational resource optimization subproblem can be expressed as:

$$
\begin{aligned}
\text{P3-2: } & \min_{f_m} \lambda_2 \zeta_m f_m^2 \\
\text{s.t. } & \text{C7, C10}
\end{aligned}
$$

Let $ \phi(f_m) = \lambda_2 \zeta_m f_m^2 $. Obviously, when $ f_m > 0 $, this expression is monotonically increasing with $ f_m $. Therefore, when $ f_m $ takes the minimum value satisfying the constraints, the objective function value of subproblem P3-2 is minimized. Thus, the optimal solution for $ f_m $ is:

$$f_m^* = \max \left\{ f_m^{\min}, \frac{C_m D_m N_m^{\text{global}} N_m^{\text{local}}}{T^{\text{loc}}} \right\}$$

UAV Reward Allocation Ratio Optimization Subproblem

The UAV’s reward allocation mechanism determines the收益 ground users obtain from contributing data, thus affecting ground users’ data offloading decisions. When the UAV position and CPU frequency are determined, optimize the UAV reward allocation ratio. The reward allocation ratio optimization subproblem can be expressed as:

$$
\begin{aligned}
\text{P3-3: } & \max_{\beta_m} U_m = (1 – \beta_m) R_m – \lambda_2 E_m^{\text{loc}} – \lambda_3 k \nu_m \\
\text{s.t. } & \text{C9}
\end{aligned}
$$

Given the UAV position and CPU frequency, a larger UAV reward allocation ratio $ \beta_m $ leads to a larger data upload ratio $ a_n $ for ground user n, i.e., $ \beta_m $ and $ a_n $ are positively correlated. The quality of local model training by the UAV is higher, allowing it to obtain more rewards. But as $ a_n $ increases, the energy consumption cost of UAV m also gradually increases, and the data upload ratio $ a_n $ of ground users is limited. Therefore, when $ \beta_m $ increases to a certain threshold, the utility of UAV m reaches its maximum. We use the golden section method to solve subproblem P3-3.

The process of using the golden section method to solve subproblem P3-3 is shown in Algorithm 5. In the algorithm initialization phase, obtain the upper bound $ \beta_m^{\min} $ and lower bound $ \beta_m^{\max} $ of the reward allocation ratio $ \beta_m $ of UAV m, and define the golden section ratio $ \phi $. As shown in lines 2-3 of Algorithm 5, when the difference between the upper bound $ \beta_m^{\min} $ and lower bound $ \beta_m^{\max} $ is greater than 0.01, compute two golden section points c and d. As shown in lines 4-11 of Algorithm 5, for c and d, execute Algorithm 2 for users $ N_m $ covered by UAV m to obtain the proportion of data that ground users can upload to UAV m under the current reward allocation ratio $ \{ a_n^* \}_{n \in N_m} $, then according to $ \{ a_n^* \}_{n \in N_m} $, execute Algorithm 4 to obtain the optimal position of the UAV $ (q_m^{\text{UAV}})^* $, then according to $ \{ a_n^* \}_{n \in N_m} $, $ (q_m^{\text{UAV}})^* $, $ f_m^* $ and formula (44) compute the utility function $ U_m(c, a_n^*) $ and $ U_m(d, a_n^*) $ corresponding to UAV m. Next, as shown in lines 12-21 of Algorithm 5, by judging the sizes of $ U_m(c, a_n^*) $ and $ U_m(d, a_n^*) $, update the upper bound $ \beta_m^{\min} $ and lower bound $ \beta_m^{\max} $ of $ \beta_m $, until the difference between the upper bound $ \beta_m^{\min} $ and lower bound $ \beta_m^{\max} $ is less than or equal to 0.01. As shown in line 22 of Algorithm 5, after the loop, take the larger value between $ U_m(\beta_m^{\min}, a_n^*) $ and $ U_m(\beta_m^{\max}, a_n^*) $ as the optimal value of UAV m, and the corresponding solution as the optimal solution $ \beta_m^* $ of UAV m.

Algorithm 5: UAV Reward Allocation Ratio Optimization Algorithm Based on Golden Section Method

Input: User-UAV association coefficients $ \{ z_{n,m} \}_{n \in N_m} $, UAV position $ q_m^{\text{UAV}} $

Output: UAV optimal reward allocation ratio $ \beta_m^* $, maximum utility function value $ U_m(\beta_m^*, a_n^*) $, user data upload ratio $ a_n^* $

Initialize: set initial interval [0,1], golden section ratio $ \phi = \frac{\sqrt{5}-1}{2} $, $ \beta_m^{\min} = 0 $, $ \beta_m^{\max} = 1 $
while $ \beta_m^{\max} – \beta_m^{\min} > 0.01 $ do
Compute $ c = \beta_m^{\min} + (1 – \phi)(\beta_m^{\max} – \beta_m^{\min}) $, $ d = \beta_m^{\min} + \phi (\beta_m^{\max} – \beta_m^{\min}) $
for c, d do:
for n from 1 to $ N_m $ do
Execute Algorithm 2, obtain ground user feedback data upload ratio $ a_n^* $
end for
According to formula (43) solve UAV local training optimal CPU frequency $ f_m^* $
Run federated learning, obtain $ \{ \mu_m \}_{m \in M} $
Execute Algorithm 3, and according to formula (44) compute UAV m’s current收益 $ U_m(c, a_n^*) $ or $ U_m(d, a_n^*) $
end for
if $ U_m(c, a_n^*) < U_m(d, a_n^*) $ then
$ \beta_m^{\max} = d $, $ U_m(\beta_m^{\max}, a_n^*) = U_m(d, a_n^*) $
end if
if $ U_m(c, a_n^*) = U_m(d, a_n^*) $ then
$ \beta_m^{\min} = c $, $ \beta_m^{\max} = d $, $ U_m(\beta_m^{\min}, a_n^*) = U_m(c, a_n^*) $, $ U_m(\beta_m^{\max}, a_n^*) = U_m(d, a_n^*) $
end if
if $ U_m(c, a_n^*) > U_m(d, a_n^*) $ then
$ \beta_m^{\min} = c $, $ U_m(\beta_m^{\min}, a_n^*) = U_m(c, a_n^*) $
end if
end while
$ U_m(\beta_m^*, a_n^*) = \max \{ U_m(\beta_m^{\min}, a_n^*), U_m(\beta_m^{\max}, a_n^*) \} $, $ \beta_m^* = \arg \max U_m(\beta_m, a_n^*) $
return $ \beta_m^* $, $ U_m(\beta_m^*, a_n^*) $

Joint Optimization Algorithm

In this paper, Algorithm 1 and Algorithm 2 can obtain the optimal association decisions of follower ground users under given leader UAV decisions. Algorithm 3 can solve the optimal strategy of follower ground users based on leader UAV decisions. Algorithm 4 can solve the optimal position of leader UAVs based on the strategies fed back by follower ground users. Algorithm 5 can solve the optimal reward allocation ratio of leader UAVs based on the strategies fed back by follower ground users, UAV positions, and UAV local training CPU frequency. Algorithms 1-3 are optimizations performed by ground users when the leader UAV allocation ratio $ \beta_m $ is determined; when the allocation ratio $ \beta_m $ changes, the optimal solution of ground users also changes; Algorithms 4-5 are optimizations performed by UAVs when the ground user upload data ratio is determined; when the upload data ratio changes, the optimization solution of UAVs also changes. Therefore, ground users and UAVs还需要进行博弈. When the收益 of ground users and UAVs reach Stackelberg equilibrium, the optimal solutions for all optimization variables are obtained. The specific process is shown in Algorithm 6.

As shown in lines 1-2 of Algorithm 6, in the initialization phase, initialize the data reward allocation ratio for all UAV sets $ M = \{1,2,\ldots,M\} $, data upload ratio and transmit power for all ground users $ N = \{1,2,\ldots,N\} $. As shown in line 3 of Algorithm 6, use Algorithm 1 to solve the UAV position set $ \{ q_m^{\text{UAV}} \}_{m \in M} $, clustering result $ N_m $, and number of UAVs M. As shown in line 4 of Algorithm 6, as the algorithm iterates, when the utility of each UAV no longer changes under given accuracy $ \epsilon $, the algorithm ends. As shown in lines 5-11 of Algorithm 6, in each cycle, after knowing the rewards allocated by UAVs and all users’ decisions, users located in the coverage of two or more UAVs all have the power to use Algorithm 2 to decide which UAV to associate with, solving the association decision $ z_{n,m} $ of that user. As shown in lines 12-15 of Algorithm 6, each UAV uses Algorithm 5 to optimize its own reward allocation ratio in sequence, iterating until the utility of each UAV reaches accuracy $ \epsilon $ then stopping. As shown in lines 16-17 of Algorithm 6, bring the optimized variables $ a_n^*(\tau) $ and $ P_n^*(\tau) $ into formula (36) to compute the final utility $ \{ U_n \}_{n \in N} $ of each user.

Algorithm 6: Joint Optimization Algorithm

Input: Ground user set $ \{ q_n^{\text{UE}} \}_{n \in N} $, total bandwidth B, noise power $ \sigma^2 $, signal attenuation factor $ \alpha $, maximum communication distance between UAV and user $ d_{\max} $

Output: UAV optimal reward allocation ratio $ \{ \beta_m^* \}_{m \in M} $, UAV maximum utility function value $ \{ U_m \}_{m \in M} $, user optimal data upload ratio $ \{ a_n^* \}_{n \in N} $, user utility function value $ \{ U_n \}_{n \in N} $

Initialize: UAV reward allocation ratio $ \{ \beta_m \}_{m \in M} $, all users’ transmit power $ \{ P_n^* \}_{n \in N} $, user data upload ratio $ \{ a_n^* \}_{n \in N} $, accuracy $ \epsilon $, $ \tau = 0 $
Execute Algorithm 1, obtain UAV position set $ \{ q_m^{\text{UAV}} \}_{m \in M} $, clustering result $ N_m $, number of UAVs M
while $ \frac{|U_m(\tau) – U_m(\tau-1)|}{|U_m(\tau-1)|} \leq \epsilon $ do
for n from 1 to N do
for m from 1 to M do
if $ d_{n,m} \leq d_{\max} $ then
Execute Algorithm 2, obtain best association $ z_{n,m} $ between ground user n and UAV m
end if
end for
end for
end while
for m from 1 to M do
Execute Algorithm 5, obtain UAV optimal reward allocation ratio $ \beta_m^*(\tau) $, maximum utility function value $ U_m(\tau) $,
user data upload ratio $ a_n^*(\tau) $, user transmit power $ P_n^*(\tau) $,
end for
for n from 1 to N do
Execute Algorithm 2, obtain user utility function value $ \{ U_n \}_{n \in N} $
end for
return $ \beta_m^*(\tau) $, $ U_m(\tau) $, $ a_n^*(\tau) $, $ \{ U_n \}_{n \in N} $

In Algorithm 1, the complexity of the DBSCAN method is $ O(N^2) $. After clustering, in the for loop checking and adjusting cluster diameter, the distance between all users in the cluster needs to be compared, complexity $ O(N^2) $. Thus, the complexity of Algorithm 1 is $ O(N^2) $. Assume the while loop of Algorithm 6 executes L times. Inside the while loop, execute a double for loop, and execute Algorithm 2 under the condition that constraint C3 is satisfied. Algorithm 2 traverses all clusters, judging whether constraints C3~C5 hold, so the complexity of Algorithm 2 is $ O(M) $, and the complexity of the while loop is $ O(L N M) $. In the subsequent for loop, obtain the optimal solutions for UAVs and ground users by executing Algorithm 5. Algorithm 5 obtains the optimal solutions for ground users by traversing ground users and executing Algorithm 2, then executes Algorithm 3 to obtain the optimal solution for UAVs, and the complexity of Algorithm 3 is $ O(1) $, so the complexity of Algorithm 5 is $ O(N M) $, thus the complexity of this for loop is $ O(N M) $. Finally, by traversing ground users, compute user utility function values, complexity $ O(N) $. In summary, the complexity of Algorithm 6 is $ O(L N M) $.

Simulation Results and Analysis

We use Python to simulate the proposed algorithm to verify its effectiveness. Users are randomly distributed in a 400m * 400m square area. UAVs are fixed at a height of 50m with a depression angle of 45°, so the UAV coverage radius is 50m, and the maximum transmission distance $ d_{\max} = 50\sqrt{2} $ m. We use the MNIST dataset (60000 training data and 10000 test data, 10 classes) to train a Convolutional Neural Network (CNN) with federated learning and verify the proposed scheme. Assume each user has 600 MNIST training data, minimum transmit power 0.01W, maximum transmit power 0.1W, channel gain at reference distance $ d_0 = 1 $ m is $ h_0 = 10^{-8} $, CPU cycles required to complete training task per bit of data $ C_m = 15 $, UAV participation in federated learning training CPU frequency range $ f_m = 0.1 \sim 2 $ GHz, total bandwidth $ B = 10 $ MHz. Main simulation parameters are shown in Table 1 [15, 20].

Table 1: Simulation Parameters
Parameter	Value
Path loss exponent $ \alpha $	2
Effective capacitance coefficient $ \zeta_m $	$ 10^{-28} $
Hovering power k (J/s)	145
Local iteration count $ N_m^{\text{local}} $	5
Global iteration count $ N_m^{\text{global}} $	6
Maximum collection time $ T_{\max}^{\text{up}} $ (s)	0.8
Maximum local training total time $ T^{\text{loc}} $ (s)	1
Weighting factor $ \lambda_1 $	10
Weighting factor $ \lambda_2 $	100
Weighting factor $ \lambda_3 $	0.01
White noise power $ \sigma^2 $ (dBm)	-170

To verify the performance of our proposed algorithm, we compare it with five algorithms:

Algorithm 1: Literature [15] algorithm. This algorithm does not consider the interest game between UAVs and ground users; UAVs directly collect all data from ground users.
Algorithm 2: Literature [21] algorithm. This algorithm assumes uniform distribution for data volume used in local training and ground user transmit power.
Algorithm 3: UAV position fixed. The horizontal position of the UAV is the centroid of the current cluster’s ground users.
Algorithm 4: UAV local training CPU frequency random. Do not optimize UAV local training CPU frequency, randomly select values within $ [f_m^{\min}, f_m^{\max}] $.
Algorithm 5: Ground user transmit power random. Do not optimize ground user transmit power, randomly select values within $ [P_n^{\min}, P_n^{\max}] $.

Figure 4 shows the ground user clustering results, where the number of ground users is 30, the radius $ R_D $ in the DBSCAN algorithm is set to 70m, and the minimum sample number $ N_{\text{MinPts}} $ is set to 3. Using Algorithm 1 to cluster ground users, from Figure 4, it can be seen that out of 30 ground users, 23 are successfully clustered, each cluster marked with a different color, totaling 6 clusters, deploying 6 UAVs, covering 4, 3, 5, 3, 4, 4 ground users respectively. The UAV initial positions are the arithmetic mean centers of each cluster.

Figure 5 shows the performance comparison of our proposed algorithm with the other four algorithms under different total bandwidths. Among them, Figure 5(a) compares the utility of UAVs under different schemes, and Figure 5(b) compares the total utility of ground users under different schemes. As shown in Figure 5(a), as the total bandwidth increases, the average utility of UAVs under all four algorithms shows an upward trend. As shown in Figure 5(b), the average utility of ground users under the three algorithms shows a downward trend. This is because when the total bandwidth increases, the bandwidth allocated to each user increases, the user’s transmission rate increases, more data is transmitted, and UAVs reduce the reward allocation ratio to maximize utility, so the average utility of UAVs increases with bandwidth, while the average utility of ground users decreases with bandwidth.

From Figure 5(a) and Figure 5(b), it can be seen that our proposed algorithm brings the highest utility to both UAVs and ground users. At the same time, it can be seen that the random UAV local training CPU frequency brings the lowest utility to UAVs, and the random ground user transmit power brings the lowest utility to users, because random values cannot effectively utilize computational and power resources. Compared with the algorithms in literature [15] and literature [21], our proposed scheme can effectively improve the utility of ground users and UAVs. This is because in literature [15], there is no game relationship between UAVs and ground users, the ground user upload data ratio is fixed at 1, the energy consumption of UAV local training and ground user data transmission both increase, and the average utility of both UAVs and ground users decreases. In literature [21], the data volume used for local training and the ground user transmit power are assumed to be uniformly distributed, this setting leads to significantly lower average utility function values for UAVs and ground users in this literature compared to the corresponding indicators after optimization by our algorithm. Specifically, by optimizing ground user transmit power and a non-uniform data resource allocation mechanism, our algorithm achieves improved system utility under game equilibrium. Therefore, our algorithm can bring higher utility to UAVs. Compared with fixed UAV positions, our optimization of UAV positions enables UAVs to collect more data at lower cost under the same conditions, resulting in higher average utility for UAVs.

Figure 6 shows the impact of different UAV hovering heights H on the utility of UAVs and ground users. It can be seen that the average utility of UAVs decreases as the UAV hovering height increases, and the average utility of ground users increases as the UAV hovering height increases. This is because the increase in UAV height reduces the channel gain between UAVs and ground users, increasing the cost of user data upload, reducing the amount of data transmitted. To obtain more data, UAVs increase the reward allocation ratio to users, so the average utility of UAVs decreases, and the average utility of users increases.

Figure 7 shows the comparison of UAV average utility and user average utility under different maximum data collection latencies. From Figure 7(a), it can be seen that changes in maximum collection latency have little impact on UAV average utility. This is because when the maximum collection time increases, ground users upload more data, and to maximize their utility, UAVs reduce the reward allocation ratio $ \beta_m $, reducing the expenditure to users, but hovering energy consumption increases, and local training energy consumption also increases, so the average utility of UAVs changes little. From Figure 7(b), it can be seen that the average utility of users decreases as the maximum collection time increases. This is because UAVs incentivize users to contribute data with a smaller reward allocation ratio $ \beta_m $, the average income of ground users decreases, and at the same time, the user upload time increases, transmission energy consumption increases, so the average utility of users decreases.

Figure 8 shows the comparison of UAV average utility and user average utility under different numbers of ground users. From Figure 8(a) and Figure 8(b), it can be seen that the average utility of both UAVs and ground users decreases as the number of ground users increases. This is because the increase in ground users leads to UAVs competing with each other to obtain more benefits by collecting more data, resulting in increased training energy consumption and reward allocation ratios for each, thus reducing average utility. The increase in the number of ground users participating in data upload also reduces the average utility of ground users.

Conclusion

In multi-UAV multi-user scenarios, addressing the issues of insufficient computational capability and low participation enthusiasm of ground users in UAV-assisted federated learning systems, we constructed a UAV-assisted federated learning system model based on data collection, introduced blockchain into the system model to ensure data security and reliability, analyzed the costs of UAVs and ground users in data collection and federated learning training, constructed their utility functions, first used an improved DBSCAN algorithm to cluster ground users to achieve low-complexity device cluster matching, then proposed a UAV-assisted federated learning algorithm based on Stackelberg game, with UAVs as leaders formulating reward strategies and ground users as followers optimizing data upload schemes. By dynamically iterating, we solved the Nash equilibrium solution of this game model. Finally, we simulated our algorithm using Python. Simulation results show that our proposed algorithm can effectively improve the efficiency of UAV data collection and federated learning, and as the number of users increases, the performance improvement of our algorithm becomes more significant. Under a scale of 40 users, compared to literature [15] and literature [21], utility is improved by 25% and 11% respectively, and under a scale of 50 users, the improvement幅度 reaches 92% and 33%, significantly outperforming existing schemes.

The integration of drone technology in federated learning systems presents a promising direction for future research. The use of Unmanned Aerial Vehicles not only enhances data collection efficiency but also provides a flexible and dynamic platform for distributed machine learning. Our work demonstrates the potential of Stackelberg game-based optimization in balancing the interests of UAVs and ground users, ensuring sustainable participation and efficient resource utilization. Future work could explore more complex scenarios, such as dynamic environments and multi-objective optimization, to further advance the capabilities of UAV-assisted federated learning systems.

Parameter	Value
Path loss exponent \( \alpha \)	2
Effective capacitance coefficient \( \zeta_m \)	\( 10^{-28} \)
Hovering power k (J/s)	145
Local iteration count \( N_m^{\text{local}} \)	5
Global iteration count \( N_m^{\text{global}} \)	6
Maximum collection time \( T_{\max}^{\text{up}} \) (s)	0.8
Maximum local training total time \( T^{\text{loc}} \) (s)	1
Weighting factor \( \lambda_1 \)	10
Weighting factor \( \lambda_2 \)	100
Weighting factor \( \lambda_3 \)	0.01
White noise power \( \sigma^2 \) (dBm)	-170