Energy Efficiency Optimization for UAV-Assisted Edge Computing in Ground Interference Environments

Rapid technological advancements in IoT and 5G have intensified demand for real-time data processing. Mobile Edge Computing (MEC) addresses this by decentralizing computational tasks, yet traditional ground-based deployments face limitations in flexibility and signal degradation from obstacles. Unmanned Aerial Vehicles (UAVs) present transformative solutions with their mobility and Line-of-Sight (LoS) advantages, enabling dynamic edge computing services. However, integrating edge servers introduces critical challenges: increased UAV payload affecting flight energy, elevated computational energy consumption, and heightened system complexity. Ground-based interference sources like radio towers further degrade performance in UAV-assisted MEC systems. This work proposes a novel energy efficiency optimization framework using deep reinforcement learning to address these constraints.

System Model and Problem Formulation

Consider a UAV-assisted MEC system with $K$ ground users $\mathcal{T}$ at positions $q_k = [x_k, y_k, 0]$, $M$ ground interferers $\mathcal{M}$ at $q_m = [x_m, y_m]$, and a UAV at fixed altitude $H_u$ with coordinates $q_u = [x_u, y_u, H_u]$. The time horizon $T$ is discretized into $N$ slots $\mathcal{N} \triangleq \{1,\dots,N\}$ with duration $t_n$. UAV trajectory is defined as $Q \triangleq \{q_u[n]\}$ where $q_u[n] = [x_u[n], y_u[n], H_u]$.

Communication Model

The channel gain between user $k$ and UAV is:

$$g_{k,u}[n] = \beta_0 \left( \lVert q_k – q_u[n] \rVert^2 + H_u^2 \right)^{-1}$$

Interferer-to-UAV channel gain follows LoS propagation:

$$j_{m,u}[n] = \beta_0 \left( \lVert q_m – q_u[n] \rVert^2 + H_u^2 \right)^{-1}$$

The achievable uplink rate is:

$$r_{k,u}[n] = B \log_2 \left(1 + \frac{P_k g_{k,u}[n]}{\sum_{m=1}^M P_m j_{m,u}[n] + \sigma^2}\right)$$

where $B$ is bandwidth, $P_k$/$P_m$ are transmit powers, and $\sigma^2$ is noise power.

Computation Offloading Model

Let $b_k[n] \in \{0,1\}$ indicate user scheduling. Offloaded data received by UAV:

$$R_{k,u}[n] = b_k[n] t_n r_{k,u}[n]$$

Energy Consumption Model

Total energy combines flight and computation components. Horizontal flight energy:

$$E_{\text{level}}[n] = \frac{W}{\sqrt{2\rho A}} \cdot \frac{1}{\sqrt{\lVert (v_x[n], v_y[n]) \rVert^2 + \sqrt{\lVert (v_x[n], v_y[n]) \rVert^4 + 4V_h^4}}}$$

Blade drag energy:

$$E_{\text{drag}}[n] = \frac{1}{8} C_{D0} \rho A \lVert (v_x[n], v_y[n]) \rVert^3$$

Total flight energy per slot:

$$E_{u,f}[n] = \frac{W}{\sqrt{2\rho A}} \cdot \frac{1}{\sqrt{\lVert \mathbf{v}[n] \rVert^2 + \sqrt{\lVert \mathbf{v}[n] \rVert^4 + 4V_h^4}}} + \frac{C_{D0}\rho A}{8} \lVert \mathbf{v}[n] \rVert^3$$

Computation energy:

$$E_{u,c}[n] = \sum_{k \in \mathcal{T}} \gamma_u R_{k,u}[n] (f_{k,u})^2$$

Total system energy:

$$E_{\text{total}} = \sum_{n=1}^N E_{u,f}[n] + \sum_{n=1}^N E_{u,c}[n]$$

Energy Efficiency Maximization

System energy efficiency (EE) is defined as:

$$EE = \frac{\sum_{n=1}^N \sum_{k \in \mathcal{T}} R_{k,u}[n]}{E_{\text{total}}}$$

We formulate the joint optimization problem:

$$
\begin{aligned}
\max_{\mathbf{Q},\mathbf{B}} \quad & EE \\
\text{s.t.} \quad & \sum_{k \in \mathcal{T}} b_k[n] \leq 1, b_k[n] \in \{0,1\} \\
& \sum R_{k,u}[n] \geq D_k \quad \forall k \\
& x_{\min} \leq x_u[n] \leq x_{\max} \\
& y_{\min} \leq y_u[n] \leq y_{\max}
\end{aligned}
$$

Parameter	Value
Area Dimensions	1000m × 1000m
UAV Altitude ($H_u$)	50m
Bandwidth ($B$)	1MHz
Transmit Power ($P_k$, $P_m$)	0.1W
Noise Power ($\sigma^2$)	-160dBm
Rotor Disk Area ($A$)	0.18m²
Drag Coefficient ($C_{D0}$)	0.08

D3QN-Based Joint Optimization Algorithm

The optimization problem is modeled as a Markov Decision Process (MDP) solved via Dueling Double Deep Q-Network (D3QN) to handle high-dimensional state spaces.

MDP Formulation

State Space: $s[n] = [q_u[n], \{q_k\}, \{q_m\}, \{D_k[n]\}, N-n, x_{\text{area}}, y_{\text{area}}]$
Action Space: $\mathcal{A} \triangleq \{(0, y_s), (0, -y_s), (x_s, 0), (-x_s, 0), (0,0)\}$ (5 movement directions)
Reward Function:
$r[n] = EE[n] – P_0 – P_1 – P_2 – P_3 + R_0$
where penalties $P_i$ enforce constraints and $R_0$ rewards task completion.

D3QN Architecture

The Q-value decomposition:

$$Q(s,a;\theta,\alpha,\beta) = V(s;\theta,\beta) + A(s,a;\theta,\alpha) – \frac{1}{|\mathcal{A}|} \sum_{a’} A(s,a’;\theta,\alpha)$$

Target Q-value calculation:

$$Q^{\text{target}}_n = \begin{cases}
r[n] + \gamma Q(s[n+1], a_{\max}; \theta^{-},\alpha^{-},\beta^{-}) & d[n]=0 \\
r[n] & d[n]=1
\end{cases}$$

Loss function for network updates:

$$\mathcal{L}(\theta,\alpha,\beta) = \mathbb{E}\left[ \left( Q^{\text{target}}_n – Q(s[n],a[n];\theta,\alpha,\beta) \right)^2 \right]$$

Component	Specification
Neural Network	4 hidden layers (128 neurons each)
Activation Function	LeakyReLU
Optimizer	Adam (LR=0.001)
Experience Replay Size	1,000,000
Discount Factor ($\gamma$)	0.9

Simulation Results and Analysis

We compare our D3QN-based approach against three baselines: Fixed Trajectory with D3QN Scheduling, User-Sequential with D3QN Scheduling, and DDQN Joint Optimization.

Convergence Performance

The D3QN algorithm demonstrates superior convergence with higher cumulative rewards compared to DDQN, achieving stability after 12,000 training episodes. This indicates effective policy learning for interference-aware trajectory planning:

$$R_{\text{cumulative}} = \sum_{n=1}^N \gamma^{n-1} r[n]$$

Trajectory Optimization

Our D3QN-based Unmanned Aerial Vehicle trajectory shows intelligent avoidance of interference sources while minimizing flight distance to users. Compared to baseline paths, it eliminates redundant movements and maintains safer distances from interferers, significantly improving communication quality.

Energy Efficiency Comparison

Cumulative Distribution Function (CDF) analysis confirms our approach’s superiority:

$$F_{EE}(x) = P(EE \leq x)$$

The D3QN-based Unmanned Aerial Vehicle system achieves 38% higher median EE than DDQN and 72% improvement over fixed-trajectory approaches. This demonstrates how drone technology enables adaptive resource allocation in dynamic environments.

Scheme	Median EE (Mbits/J)	Improvement
Fixed Trajectory + D3QN	1.82	Baseline
User-Sequential + D3QN	2.15	+18%
DDQN Joint Optimization	2.94	+62%
D3QN Joint Optimization	4.06	+123%

Conclusion

This work presents a novel D3QN-based optimization framework for UAV-assisted MEC systems operating in ground interference environments. By jointly optimizing Unmanned Aerial Vehicle trajectories and user scheduling through deep reinforcement learning, we significantly enhance system energy efficiency. Our approach enables drones to intelligently perceive and avoid interference sources while minimizing flight energy expenditure. Simulation results demonstrate 123% EE improvement over fixed-trajectory baselines, highlighting the transformative potential of adaptive drone technology in edge computing. Future work will explore multi-UAV coordination and dynamic interference prediction for enhanced robustness.