Cooperative Trajectory Optimization and Task Offloading for Unmanned Aerial Vehicle Networks with Fairness Considerations

In recent years, the rapid development of drone technology has revolutionized various fields, particularly in Internet of Things (IoT) systems. As a researcher in this domain, I have focused on addressing the challenges of computational offloading and resource allocation in multi-Unmanned Aerial Vehicle (UAV) assisted mobile edge computing (MEC) networks. The integration of drone technology into MEC systems offers flexible and reliable communication, but it also introduces complexities related to energy consumption, data privacy, and fairness among user devices. In this article, I explore a novel approach that combines federated learning with reinforcement learning to optimize UAV trajectories and task offloading decisions while ensuring user fairness. The proposed method leverages the Actor-Critic network within a federated framework, enabling efficient utilization of heterogeneous resources and minimizing total energy consumption. Throughout this work, I emphasize the importance of drone technology in enhancing MEC systems and discuss how Unmanned Aerial Vehicle networks can be coordinated to achieve balanced performance.

The proliferation of IoT devices has led to an exponential growth in computational demands, often exceeding the capabilities of individual user devices (UDs). These devices typically suffer from limited battery life and processing power, making task offloading to edge servers a viable solution. Traditional MEC systems rely on fixed ground stations, but in remote or disaster-stricken areas, such infrastructure may be unavailable. This is where drone technology comes into play; Unmanned Aerial Vehicles equipped with MEC servers can provide on-demand computing services. However, single UAVs often lack sufficient capacity for dense task scenarios, necessitating multi-UAV collaboration. In my research, I address the issues of data privacy and unfair resource allocation by incorporating federated learning and fairness metrics. The Jain fairness index is introduced to measure user equity, ensuring that devices with poorer channel conditions receive adequate resources. By optimizing UAV trajectories and offloading decisions, I aim to maximize system efficiency and fairness simultaneously.

To model the system, I consider a scenario with multiple UAVs flying at a fixed height, serving a set of UDs over discrete time slots. The communication model accounts for path loss and line-of-sight probabilities, while the computation model includes local processing, offloading to UAVs, and task migration between UAVs. The optimization problem involves minimizing total energy consumption and maximizing user fairness, subject to constraints like UAV mobility and resource limits. I formulate this as a non-convex problem and solve it using a federated reinforcement learning algorithm combined with Actor-Critic networks (FRLACN). This algorithm allows each UAV to learn optimal policies locally and aggregate knowledge globally without sharing raw data, thus preserving privacy. Simulations demonstrate that FRLACN outperforms existing methods in terms of energy savings, data transmission costs, and fairness improvement.

System Model and Formulation

In this section, I describe the system model in detail, covering the communication, computation, and fairness aspects. The network consists of N user devices (UDs), M UAVs equipped with MEC servers, and a cloud server for global coordination. The UAVs operate at a fixed altitude H over a square area with side length l_max. Each UAV’s position at time slot t is defined by coordinates (X_m,t, Y_m,t, H), which evolve based on the flight direction angle α_m,t and distance d_m,t. The horizontal distance between UAV m and UD n is given by:

$$d_{n,m,t} = \sqrt{(X_{m,t} – x_n)^2 + (Y_{m,t} – y_n)^2}$$

where (x_n, y_n) is the UD’s location. To ensure coverage, this distance must not exceed the maximum coverage radius d_max. The path loss between UD n and UAV m is modeled using a probabilistic line-of-sight (LoS) channel, with the LoS probability expressed as:

$$P_{\text{LoS}} = \frac{1}{1 + a \exp(-b(\theta – a))}$$

Here, θ is the elevation angle, and a, b are environment-dependent parameters. The total path loss L combines free-space loss and additional losses for LoS and NLoS links:

$$L = L_{\text{FS}} + P_{\text{LoS}} \eta_{\text{LoS}} + (1 – P_{\text{LoS}}) \eta_{\text{NLoS}}$$

where L_FS is the free-space path loss. For task offloading, I consider two strategies: task partitioning and task migration. Task partitioning divides a computation task O_n,t = {C_n,t, D_n,t, T_max_n,t} into Q_n equal parts, where C_n,t is the required CPU cycles, D_n,t is the data size, and T_max_n,t is the maximum tolerable delay. Each part can be processed locally, offloaded to a UAV, or migrated to another UAV. The data rate for offloading from UD n to UAV m is:

$$R_{n,m,t} = B \log_2 \left(1 + \frac{\rho_n g_{n,m}}{\sigma^2 + \sum_{m’ \neq m} \rho_n g_{m,m’}}\right)$$

where B is the bandwidth, ρ_n is the transmission power, g_{n,m} is the channel gain, and σ^2 is the noise power. For task migration between UAVs m and m’, the data rate is:

$$R_{m,m’,t} = B \log_2 \left(1 + \frac{\rho_m g_{m,m’}}{\sigma^2}\right)$$

The computation model includes local execution, offloading to a UAV, and migration. The local computation time for a task part is:

$$T^{\text{loc}}_{n,t} = \frac{C_{n,t} D_{n,t}}{f_n Q_n}$$

where f_n is the UD’s computational capability. The energy consumption for local computation is:

$$E^{\text{loc}}_{n,t} = \kappa_n (f_n)^2 C_{n,t} D_{n,t} / Q_n$$

For offloading to UAV m, the transmission time is:

$$T^{\text{tran}}_{n,m,t} = \frac{D_{n,t}}{R_{n,m,t} Q_n}$$

and the computation time on UAV m is:

$$T^{\text{com}}_{n,m,t} = \frac{C_{n,t} D_{n,t}}{f_{m,n} Q_n}$$

where f_{m,n} is the computational resource allocated by UAV m to UD n. The total energy consumption for offloading is the sum of transmission and computation energies. Similarly, for task migration, the times and energies are defined analogously. The total delay for completing a task is the maximum among the local, offloading, and migration components. To ensure fairness, I use the Jain fairness index φ, defined as:

$$\phi = \frac{\left( \sum_{n=1}^N \sum_{m=1}^M \sum_{t=1}^T \frac{R_{n,m,t}}{D_{n,t}} \right)^2}{M N \sum_{n=1}^N \sum_{m=1}^M \sum_{t=1}^T \left( \frac{R_{n,m,t}}{D_{n,t}} \right)^2}$$

This metric ranges from 0 to 1, with higher values indicating better fairness. The overall optimization problem is to maximize Ψ = μ_1 φ – μ_2 E_total, where E_total is the total energy consumption, and μ_1, μ_2 are weight factors. The constraints include UAV mobility limits, coverage requirements, resource capacity, and delay bounds.

Table 1: Key Parameters in the System Model
Parameter	Description	Typical Value
N	Number of user devices	20-25
M	Number of UAVs	4
H	UAV flight height	100 m
l_max	Flight area side length	1000 m
d_max	Maximum coverage radius	200 m
B	Channel bandwidth	1 MHz
ρ_n	UD transmission power	0.2 W
σ^2	Noise power	10^{-9} W

Problem Formulation and Optimization

The core problem I address is the joint optimization of UAV trajectories and task offloading decisions to maximize system efficiency and fairness. Formally, the problem is stated as:

$$\max \Psi = \mu_1 \phi – \mu_2 (E^{\text{loc}}_{n,t} + E_{n,m,t} + E_{m,m’,t})$$

subject to constraints on UAV positions, distances, resource allocation, and delays. This is a mixed-integer non-linear programming problem, which is NP-hard due to its non-convexity and combinatorial nature. Traditional optimization methods struggle with such problems, so I employ a machine learning-based approach. The problem is framed as a Markov Decision Process (MDP), where each UAV acts as an agent interacting with the environment. The state space for UAV m at time t includes its position, the task data sizes, and inter-UAV distances:

$$s_{m,t} = (X_m, Y_m, H, D_{n,t}, d_{m,m’,t})$$

The action space consists of the flight angle and distance:

$$a_{m,t} = (\alpha_{m,t}, d_{m,t})$$

The reward function combines the objective Ψ with penalties for collisions and boundary violations:

$$r_{m,t} = \Psi – P_{\text{crash}} – P_{\text{cross}}$$

To solve this MDP, I propose the Federated Reinforcement Learning Combined with Actor-Critic Network (FRLACN) algorithm. This approach integrates federated learning with deep reinforcement learning to handle data privacy and heterogeneity. Each UAV maintains an Actor network that outputs actions and a Critic network that evaluates action values. Local training is performed using policy gradient methods, and periodically, gradients are aggregated at a central server using a weighted average based on rewards. This federated aggregation reduces communication overhead and enhances privacy by avoiding raw data sharing. The algorithm proceeds as follows: in each time slot, UAVs collect experiences and update their networks locally. After a fixed number of local training steps, UAVs with rewards above a threshold upload their gradients to the server. The server computes a global gradient update and broadcasts it back. UAVs then test the new policy and decide whether to adopt it or revert to their local version. This process iterates until convergence.

The FRLACN algorithm leverages the strengths of Actor-Critic methods for continuous action spaces and federated learning for distributed optimization. The Actor network uses a policy function π(a|s) parameterized by ω, updated via the policy gradient:

$$\nabla_\omega J(\omega) = \mathbb{E} \left[ \nabla_\omega \log \pi(a|s) Q(s,a) \right]$$

where Q(s,a) is the action-value function estimated by the Critic network. The Critic is updated using temporal difference learning. In the federated setting, the global update for the Actor parameters is:

$$\omega_{\text{global}} = \frac{1}{\sum_m \Omega_m} \sum_m \left( \frac{r_{m,t}}{\sum_m r_{m,t}} \right) \Omega_m \omega_m$$

where Ω_m is the batch size for UAV m. This weighting ensures that UAVs with higher rewards contribute more to the global model. The use of drone technology in this framework allows for dynamic adaptation to changing environments, while the Unmanned Aerial Vehicle networks enable scalable and efficient task offloading.

Table 2: Comparison of Algorithm Components
Component	Description	Role in FRLACN
Actor Network	Generates actions based on states	Outputs flight and offloading decisions
Critic Network	Evaluates action quality	Provides gradient directions for updates
Federated Aggregation	Combines local models globally	Enhances privacy and reduces communication
Experience Replay	Stores past interactions	Stabilizes training with batch learning

Simulation Results and Analysis

To evaluate the performance of FRLACN, I conducted simulations in a Python environment using TensorFlow. The setup included 4 UAVs and up to 25 UDs in a 1000m x 1000m area. Key parameters are summarized in Table 3. I compared FRLACN against three baseline algorithms: Centralized DDPG (Centralized), Multi-Agent DDPG (MADDPG), and Federated Averaging (FedAvg). The metrics of interest were total energy consumption, data transmission cost, user fairness, and convergence behavior.

The results show that FRLACN achieves superior performance across all metrics. In terms of energy consumption, FRLACN reduced total energy by up to 11% compared to FedAvg, 36% compared to Centralized, and 68% compared to MADDPG when N=20 UDs. This is attributed to the efficient trajectory planning and resource allocation learned by the Actor-Critic networks. Data transmission costs were also lower, with FRLACN achieving reductions of 8.7% over FedAvg, 49% over MADDPG, and 51% over Centralized for N=25 UDs. This highlights the benefits of federated learning in minimizing communication overhead while preserving privacy. User fairness, measured by the Jain index, improved with increasing time slots, reaching values above 0.9 for FRLACN, compared to 0.85 for FedAvg and lower for others. This demonstrates the effectiveness of the fairness-aware optimization.

Table 3: Simulation Parameters and Values
Parameter	Value
UAV computational resource (f_mn, f_m’m)	10 GHz
UD computational resource (f_n)	2 GHz
Task CPU cycles (C_n,t)	1-2.5 G/cycle
System bandwidth (B)	1 MHz
UD transmission power (ρ_n)	0.2 W
UAV transmission power (ρ_m)	1.0 W
Task data size (D_n,t)	100 MB
UAV flight speed (v_max)	15 m/s
Discount factor	0.95
Learning rate	0.0001

Convergence analysis revealed that FRLACN stabilizes faster than other algorithms, reaching a steady-state reward within 500 training episodes, whereas Centralized and MADDPG required over 1000 episodes. This is due to the federated aggregation mechanism, which filters out poor policies and accelerates learning. The incorporation of task migration further enhanced performance by balancing loads among UAVs, as shown in comparative tests where non-cooperative schemes had higher energy consumption and lower fairness. These results underscore the importance of multi-UAV coordination in drone technology for MEC systems.

In addition, I analyzed the impact of varying the number of UDs on system performance. As N increased from 10 to 25, FRLACN maintained low energy growth rates, while other algorithms exhibited steep increases. For instance, at N=25, the energy consumption for FRLACN was 70.39% lower than FedAvg, 68.21% lower than MADDPG, and 36.11% lower than Centralized. Similarly, data transmission costs rose linearly for baseline methods but remained moderate for FRLACN, thanks to the optimized offloading decisions and trajectory planning. The fairness index φ improved with more time slots, indicating that longer operational periods allow for better resource distribution. In summary, the simulations validate the efficacy of FRLACN in real-world scenarios, demonstrating its robustness and scalability.

Conclusion and Future Directions

In this article, I have presented a comprehensive study on cooperative trajectory optimization and task offloading for UAV-assisted MEC systems with a focus on fairness. The proposed FRLACN algorithm combines federated learning and reinforcement learning to address key challenges such as energy efficiency, data privacy, and equitable resource allocation. By leveraging drone technology, the system achieves dynamic adaptation to user demands while minimizing costs. The integration of the Actor-Critic network enables precise policy updates, and the federated framework ensures privacy preservation. Simulation results confirm that FRLACN outperforms existing methods in terms of energy savings, communication efficiency, and fairness improvement.

Looking ahead, several avenues for future research emerge. First, extending the approach to dynamic environments with mobile UDs and time-varying channels could enhance robustness. This might involve real-time adjustment of federated aggregation periods or incorporation of transfer learning. Second, exploring multi-objective optimization that balances energy, latency, and security could lead to more comprehensive solutions. For instance, integrating physical layer security techniques might mitigate privacy risks during task migration. Third, the fusion of blockchain technology with federated learning could provide immutable records of model updates, increasing trust and transparency in UAV networks. Additionally, developing digital twin models for UAV clusters could enable offline pre-training and online fine-tuning, reducing computational overhead in real-time decision-making. As drone technology continues to evolve, Unmanned Aerial Vehicle networks will play a pivotal role in shaping the future of edge computing, and I believe that approaches like FRLACN will be crucial for harnessing their full potential.