Energy Efficiency Optimization in Drone-Assisted Edge Computing for Ground Interference Environments

In recent years, the rapid advancement of information technologies, such as the Internet of Things and 5G, has led to a surge in demand for real-time data processing and transmission from numerous devices. Mobile edge computing (MEC) has emerged as a promising computing architecture that shifts computational tasks and data sources from the cloud to edge nodes near users, significantly enhancing real-time data processing and analysis capabilities. As a result, MEC has become a focal point of research and application. However, traditional ground-based MEC deployments suffer from low flexibility and are prone to significant signal attenuation due to obstacles, which directly compromises service stability and user experience.

The integration of drone technology, particularly Unmanned Aerial Vehicles (UAVs), has introduced new possibilities for MEC systems. UAVs offer high flexibility, rapid deployment, convenient communication, and line-of-sight (LoS) transmission networks, making them ideal platforms for carrying edge servers to provide computational offloading services. This approach overcomes the inflexibility of traditional ground-based MEC networks by leveraging the dynamic flight capabilities of Unmanned Aerial Vehicles. Despite these advantages, UAV-assisted MEC systems face critical challenges, including increased energy consumption due to the additional load from edge servers, heightened computational energy demands, and elevated technical complexity and production costs. Furthermore, ground-based interference sources, such as radio towers and high-power base stations, pose significant limitations on UAV applications. Therefore, it is essential to design energy-efficient UAV-assisted MEC systems that can adapt to interference-rich environments.

In this article, I address the problem of maximizing energy efficiency (EE) in UAV-assisted MEC systems under ground interference conditions. I propose a joint optimization framework that combines UAV trajectory planning and user scheduling over a specified task time, modeled as a Markov Decision Process (MDP). To solve this complex problem, I employ a Dueling Double Deep Q-Network (D3QN) algorithm, which enhances the UAV’s ability to intelligently perceive interference, optimize communication links, and improve trajectory stability, thereby maximizing system EE. My contributions include the development of a comprehensive system model that accounts for communication, computation, and flight energy consumption, and the demonstration of the superiority of the proposed approach through simulations compared to benchmark schemes.

The system model considers a UAV-assisted MEC communication system with ground interference. Let $\mathcal{T}$ denote the set of $K$ fixed ground terminal users, with $q_k = [x_k, y_k, 0]$ representing the coordinates of user $k$, where $k \in \mathcal{T}$. Similarly, let $\mathcal{M}$ represent the set of $M$ fixed ground interferers, with $q_m = [x_m, y_m]$ denoting the position of interferer $m$, where $m \in \mathcal{M}$. The UAV is equipped with an onboard MEC server to provide computational offloading services for ground users. $D_k$ represents the amount of computational task data that ground user $k$ needs to offload to the UAV’s MEC server. Once all tasks for user $k$ are processed, the user no longer requests offloading services. The UAV starts from an initial point $q_0 = [x_0, y_0]$ and flies at a fixed altitude $H_u$, with its coordinates at time slot $n$ given by $q_u[n] = [x_u[n], y_u[n], H_u]$.

To simplify the problem, the task execution time $T$ is divided into $N$ sufficiently small time slots $t_n$, where $n \in \mathcal{N} \triangleq \{1, \ldots, N\}$, ensuring that the positions of all nodes remain approximately constant within each slot. The communication link model accounts for air-to-ground channels using free-space path loss. The channel power gain between ground user $k$ and the UAV $u$ at time slot $n$ is expressed as:

$$g_{k,u}[n] = \beta_0 \left( \|q_k – q_u[n]\|^2 + H_u^2 \right)^{-1}, \quad \forall n \in \mathcal{N}, \forall k \in \mathcal{T}$$

where $\beta_0$ is the channel power gain at a reference distance of 1 m. For interference, assuming the worst-case scenario with LoS channels, the channel power gain from interferer $m$ to the UAV at time slot $n$ is:

$$j_{m,u}[n] = \beta_0 \left( \|q_m – q_u[n]\|^2 + H_u^2 \right)^{-1}, \quad n \in \mathcal{N}, m \in \mathcal{M}$$

The achievable communication rate from user $k$ to the UAV at time slot $n$ is then:

$$r_{k,u}[n] = B \log_2 \left( 1 + \frac{P_k g_{k,u}[n]}{\sum_{m=1}^{M} P_m j_{m,u}[n] + \sigma^2} \right)$$

where $B$ is the bandwidth allocated to each user, $P_k$ and $P_m$ are the transmit powers of user $k$ and interferer $m$, respectively, and $\sigma^2$ is the power of additive white Gaussian noise (AWGN) at the UAV receiver.

For computational offloading, I assume the UAV has sufficient computational resources to handle all user tasks. In each time slot, the UAV can only serve one user, defined by the scheduling variable $b_k[n]$, where $b_k[n] = 1$ if user $k$ is served in slot $n$, and $b_k[n] = 0$ otherwise. The amount of offloaded task bits received by the UAV from user $k$ in slot $n$ is:

$$R_{k,u}[n] = b_k[n] t_n r_{k,u}[n]$$

where $t_n$ is the duration of the time slot.

The system energy consumption comprises communication, computation, and flight-related energy. However, for simplicity, I focus on computation and flight energy, as communication energy is negligible for small task sizes. Based on rotor drone flight dynamics, the flight energy consumption at time slot $n$ includes horizontal flight energy and blade drag energy. Since the UAV flies at a constant altitude, vertical energy is ignored. The horizontal flight energy is:

$$E_{\text{level}}[n] = \frac{W}{\sqrt{2 \rho A}} \cdot \frac{1}{\sqrt{\|(v_x[n], v_y[n])\|^2 + \sqrt{\|(v_x[n], v_y[n])\|^4 + 4V_h^4}}}$$

where $W = mg$ is the UAV’s weight, $\rho$ is air density, $A$ is the rotor disk area, $\|(v_x[n], v_y[n])\|$ is the horizontal speed, and $V_h = \sqrt{\frac{W}{2\rho A}}$ is the hover-induced velocity. The blade drag energy is:

$$E_{\text{drag}}[n] = \frac{1}{8} C_{D0} \rho A \|(v_x[n], v_y[n])\|^3$$

where $C_{D0}$ is the profile drag coefficient. Thus, the total flight energy at slot $n$ is:

$$E_{u,f}[n] = E_{\text{level}}[n] + E_{\text{drag}}[n]$$

The computational energy consumption for offloaded tasks is:

$$E_{u,c}[n] = \sum_{k \in \mathcal{T}} \gamma_u R_{k,u}[n] (f_{k,u})^2$$

where $\gamma_u$ is the effective capacitance coefficient of the UAV’s MEC server, and $f_{k,u}$ is the CPU frequency allocated to user $k$’s tasks. The total system energy consumption is:

$$E_{\text{total}} = \sum_{n=1}^{N} E_{u,f}[n] + \sum_{n=1}^{N} E_{u,c}[n]$$

and the system energy efficiency (EE) is defined as:

$$E_{\text{EE}} = \frac{\sum_{n=1}^{N} \sum_{k \in \mathcal{T}} R_{k,u}[n]}{E_{\text{total}}}$$

I formulate the energy efficiency maximization problem as follows:

$$\begin{aligned}
\max_{\mathbf{Q}, \mathbf{B}} \quad & E_{\text{EE}} \\
\text{s.t.} \quad & \sum_{k \in \mathcal{T}} b_k[n] \leq 1, \quad b_k[n] \in \{0,1\} \\
& \sum_{n=1}^{N} R_{k,u}[n] \geq D_k \\
& x_{\text{min}} \leq x_u[n] \leq x_{\text{max}} \\
& y_{\text{min}} \leq y_u[n] \leq y_{\text{max}}
\end{aligned}$$

where $\mathbf{Q} \triangleq \{q_u[n]\}$ is the UAV trajectory, $\mathbf{B} \triangleq \{b_k[n]\}$ is the user scheduling strategy, and the constraints ensure that only one user is served per slot, all tasks are completed, and the UAV stays within defined boundaries. This problem is non-convex and challenging to solve with traditional methods, so I propose a deep reinforcement learning (DRL) approach using the D3QN algorithm.

The problem is modeled as an MDP, where the UAV acts as the agent. The state space $s[n]$ at time slot $n$ includes the UAV’s horizontal position $q_u[n]$, all user positions $q_k$, all interferer positions $q_m$, remaining task amounts $D_k$, remaining time $N – n$, and UAV activity boundaries $x_{\text{area}}, y_{\text{area}}$. Thus, the state is defined as:

$$s[n] = [q_u[n], q_{g1}, \ldots, q_{gK}, q_{j1}, \ldots, q_{jM}, D_{g1}[n], \ldots, D_{gK}[n], N – n, x_{\text{area}}, y_{\text{area}}]$$

The action space $a[n]$ comprises the UAV’s movement and user scheduling. The UAV’s horizontal movement actions are limited to adjacent grids: $a_u[n] \in A_{\square} \triangleq \{(0, y_s), (0, -y_s), (x_s, 0), (-x_s, 0), (0,0)\}$, where $x_s$ and $y_s$ are step sizes. The next position is updated as $q_u[n+1] = q_u[n] + a_u[n]$.

The reward function $r[n]$ for state-action pair $(s[n], a[n])$ is designed to maximize EE while penalizing undesirable behaviors:

$$r[n] = E_{\text{EE}}[n] – P_0 – P_1 – P_2 – P_3 + R_0$$

where $E_{\text{EE}}[n]$ is the instantaneous EE, $P_0$ is a penalty for exceeding boundaries, $P_1$ for serving completed users, $P_2$ for low signal-to-interference-plus-noise ratio (SINR) below threshold $S_0$, $P_3$ for SINR between $S_0$ and $S_1$, and $R_0$ is a reward for completing a user’s task.

The D3QN algorithm leverages a dueling network architecture to separately estimate state value and advantage functions, reducing overestimation bias. The Q-value is computed as:

$$Q(s[n], a[n]; \theta, \alpha, \beta) = V(s[n]; \theta, \beta) + A(s[n], a[n]; \theta, \alpha) – \frac{1}{N} \sum_{a’} A(s[n], a’; \theta, \alpha)$$

where $\theta$, $\alpha$, and $\beta$ are network parameters. The target Q-value for training is:

$$Q_n^{\text{target}} = \begin{cases}
r[n] + \gamma Q(s[n+1], a_{\text{max}}; \theta^{-}, \alpha^{-}, \beta^{-}), & \text{if } d[n] = 0 \\
r[n], & \text{if } d[n] = 1
\end{cases}$$

where $\gamma$ is the discount factor, $a_{\text{max}} = \arg\max_a Q(s[n+1], a; \theta, \alpha, \beta)$, and $d[n]$ indicates terminal state. The loss function is:

$$L(\theta, \alpha, \beta) = \mathbb{E} \left[ \left( Q_n^{\text{target}} – Q(s[n], a[n]; \theta, \alpha, \beta) \right)^2 \right]$$

I use prioritized experience replay and Dropout regularization for stability. The algorithm involves initialization, exploration, and training phases, as summarized in Algorithm 1.

Table 1: Simulation Environment Parameters
Parameter	Value
Number of time slots $N$	400
Time slot length $\delta_t$ (s)	1
Flight grid size (m×m)	10×10
Bandwidth $B$ (MHz)	1
User/interferer power (W)	0.1
Carrier wavelength (m)	750
Reference channel gain $\beta_0$ (dB)	-40
AWGN power $\sigma^2$ (dBm)	-160
Effective capacitance $\gamma_u$	1×10^-22
CPU frequency $f_{k,u}$ (GHz)	1
Rotor disk area $A$ (m²)	0.18
Air density $\rho$ (kg/m³)	1.225
UAV mass (kg)	4
Drag coefficient $C_{D0}$	0.08
Penalties $P_0, P_1, P_2, P_3$	50, 50, 10, 5
Thresholds $S_0, S_1$	-1.5, -0.5
Reward $R_0$	10

For simulation analysis, I compare the proposed D3QN-based joint optimization scheme with three benchmarks: (1) D3QN-based resource scheduling with fixed diagonal trajectory, (2) D3QN-based resource scheduling with sequential user visitation, and (3) DDQN-based joint optimization. The UAV operates in a 1000 m × 1000 m area at 50 m altitude, starting from [100, 100, 0]. Ground interferers are located at [50, 450, 0], [700, 150, 0], and [400, 880, 0]. Users are randomly distributed, each with tasks $D_k$ between 86 and 128 Mbits. The D3QN network has an input layer, four hidden layers with 128 neurons each, and an output layer, using LeakyReLU activation and Adam optimizer. Training parameters include 9000 episodes, experience replay size of 1,000,000, mini-batch size of 2000, discount factor $\gamma = 0.9$, and learning rate 0.001.

The convergence analysis shows that the proposed D3QN scheme achieves higher cumulative rewards compared to DDQN, stabilizing after approximately 12,000 training episodes. This indicates that the UAV learns optimal trajectories and user scheduling strategies effectively. The trajectory comparison reveals that the D3QN-based approach enables the UAV to navigate efficiently toward users while avoiding interferers, unlike benchmarks that suffer from fixed paths or unnecessary detours. The energy efficiency performance, evaluated via cumulative distribution function (CDF), demonstrates that the D3QN scheme significantly outperforms others, with higher and more stable EE values. For instance, the D3QN-based joint optimization achieves up to 30% improvement in EE over the DDQN scheme, highlighting the benefits of intelligent interference-aware trajectory planning in drone technology.

In conclusion, I have developed a novel framework for optimizing energy efficiency in UAV-assisted MEC systems under ground interference. By leveraging the D3QN algorithm, the UAV dynamically plans its trajectory and schedules users to maximize EE while adapting to interference. Simulation results validate the superiority of this approach, emphasizing the potential of drone technology in enhancing MEC systems. Future work could explore multi-UAV scenarios and advanced interference mitigation techniques to further improve performance.