Rapid technological advancements in IoT and 5G have intensified demand for real-time data processing. Mobile Edge Computing (MEC) addresses this by decentralizing computational tasks, yet traditional ground-based deployments face limitations in flexibility and signal degradation from obstacles. Unmanned Aerial Vehicles (UAVs) present transformative solutions with their mobility and Line-of-Sight (LoS) advantages, enabling dynamic edge computing services. However, integrating edge servers introduces critical challenges: increased UAV payload affecting flight energy, elevated computational energy consumption, and heightened system complexity. Ground-based interference sources like radio towers further degrade performance in UAV-assisted MEC systems. This work proposes a novel energy efficiency optimization framework using deep reinforcement learning to address these constraints.

System Model and Problem Formulation
Consider a UAV-assisted MEC system with $K$ ground users $\mathcal{T}$ at positions $q_k = [x_k, y_k, 0]$, $M$ ground interferers $\mathcal{M}$ at $q_m = [x_m, y_m]$, and a UAV at fixed altitude $H_u$ with coordinates $q_u = [x_u, y_u, H_u]$. The time horizon $T$ is discretized into $N$ slots $\mathcal{N} \triangleq \{1,\dots,N\}$ with duration $t_n$. UAV trajectory is defined as $Q \triangleq \{q_u[n]\}$ where $q_u[n] = [x_u[n], y_u[n], H_u]$.
Communication Model
The channel gain between user $k$ and UAV is:
$$g_{k,u}[n] = \beta_0 \left( \lVert q_k – q_u[n] \rVert^2 + H_u^2 \right)^{-1}$$
Interferer-to-UAV channel gain follows LoS propagation:
$$j_{m,u}[n] = \beta_0 \left( \lVert q_m – q_u[n] \rVert^2 + H_u^2 \right)^{-1}$$
The achievable uplink rate is:
$$r_{k,u}[n] = B \log_2 \left(1 + \frac{P_k g_{k,u}[n]}{\sum_{m=1}^M P_m j_{m,u}[n] + \sigma^2}\right)$$
where $B$ is bandwidth, $P_k$/$P_m$ are transmit powers, and $\sigma^2$ is noise power.
Computation Offloading Model
Let $b_k[n] \in \{0,1\}$ indicate user scheduling. Offloaded data received by UAV:
$$R_{k,u}[n] = b_k[n] t_n r_{k,u}[n]$$
Energy Consumption Model
Total energy combines flight and computation components. Horizontal flight energy:
$$E_{\text{level}}[n] = \frac{W}{\sqrt{2\rho A}} \cdot \frac{1}{\sqrt{\lVert (v_x[n], v_y[n]) \rVert^2 + \sqrt{\lVert (v_x[n], v_y[n]) \rVert^4 + 4V_h^4}}}$$
Blade drag energy:
$$E_{\text{drag}}[n] = \frac{1}{8} C_{D0} \rho A \lVert (v_x[n], v_y[n]) \rVert^3$$
Total flight energy per slot:
$$E_{u,f}[n] = \frac{W}{\sqrt{2\rho A}} \cdot \frac{1}{\sqrt{\lVert \mathbf{v}[n] \rVert^2 + \sqrt{\lVert \mathbf{v}[n] \rVert^4 + 4V_h^4}}} + \frac{C_{D0}\rho A}{8} \lVert \mathbf{v}[n] \rVert^3$$
Computation energy:
$$E_{u,c}[n] = \sum_{k \in \mathcal{T}} \gamma_u R_{k,u}[n] (f_{k,u})^2$$
Total system energy:
$$E_{\text{total}} = \sum_{n=1}^N E_{u,f}[n] + \sum_{n=1}^N E_{u,c}[n]$$
Energy Efficiency Maximization
System energy efficiency (EE) is defined as:
$$EE = \frac{\sum_{n=1}^N \sum_{k \in \mathcal{T}} R_{k,u}[n]}{E_{\text{total}}}$$
We formulate the joint optimization problem:
$$
\begin{aligned}
\max_{\mathbf{Q},\mathbf{B}} \quad & EE \\
\text{s.t.} \quad & \sum_{k \in \mathcal{T}} b_k[n] \leq 1, b_k[n] \in \{0,1\} \\
& \sum R_{k,u}[n] \geq D_k \quad \forall k \\
& x_{\min} \leq x_u[n] \leq x_{\max} \\
& y_{\min} \leq y_u[n] \leq y_{\max}
\end{aligned}
$$
| Parameter | Value |
|---|---|
| Area Dimensions | 1000m × 1000m |
| UAV Altitude ($H_u$) | 50m |
| Bandwidth ($B$) | 1MHz |
| Transmit Power ($P_k$, $P_m$) | 0.1W |
| Noise Power ($\sigma^2$) | -160dBm |
| Rotor Disk Area ($A$) | 0.18m² |
| Drag Coefficient ($C_{D0}$) | 0.08 |
D3QN-Based Joint Optimization Algorithm
The optimization problem is modeled as a Markov Decision Process (MDP) solved via Dueling Double Deep Q-Network (D3QN) to handle high-dimensional state spaces.
MDP Formulation
State Space: $s[n] = [q_u[n], \{q_k\}, \{q_m\}, \{D_k[n]\}, N-n, x_{\text{area}}, y_{\text{area}}]$
Action Space: $\mathcal{A} \triangleq \{(0, y_s), (0, -y_s), (x_s, 0), (-x_s, 0), (0,0)\}$ (5 movement directions)
Reward Function:
$r[n] = EE[n] – P_0 – P_1 – P_2 – P_3 + R_0$
where penalties $P_i$ enforce constraints and $R_0$ rewards task completion.
D3QN Architecture
The Q-value decomposition:
$$Q(s,a;\theta,\alpha,\beta) = V(s;\theta,\beta) + A(s,a;\theta,\alpha) – \frac{1}{|\mathcal{A}|} \sum_{a’} A(s,a’;\theta,\alpha)$$
Target Q-value calculation:
$$Q^{\text{target}}_n = \begin{cases}
r[n] + \gamma Q(s[n+1], a_{\max}; \theta^{-},\alpha^{-},\beta^{-}) & d[n]=0 \\
r[n] & d[n]=1
\end{cases}$$
Loss function for network updates:
$$\mathcal{L}(\theta,\alpha,\beta) = \mathbb{E}\left[ \left( Q^{\text{target}}_n – Q(s[n],a[n];\theta,\alpha,\beta) \right)^2 \right]$$
| Component | Specification |
|---|---|
| Neural Network | 4 hidden layers (128 neurons each) |
| Activation Function | LeakyReLU |
| Optimizer | Adam (LR=0.001) |
| Experience Replay Size | 1,000,000 |
| Discount Factor ($\gamma$) | 0.9 |
Simulation Results and Analysis
We compare our D3QN-based approach against three baselines: Fixed Trajectory with D3QN Scheduling, User-Sequential with D3QN Scheduling, and DDQN Joint Optimization.
Convergence Performance
The D3QN algorithm demonstrates superior convergence with higher cumulative rewards compared to DDQN, achieving stability after 12,000 training episodes. This indicates effective policy learning for interference-aware trajectory planning:
$$R_{\text{cumulative}} = \sum_{n=1}^N \gamma^{n-1} r[n]$$
Trajectory Optimization
Our D3QN-based Unmanned Aerial Vehicle trajectory shows intelligent avoidance of interference sources while minimizing flight distance to users. Compared to baseline paths, it eliminates redundant movements and maintains safer distances from interferers, significantly improving communication quality.
Energy Efficiency Comparison
Cumulative Distribution Function (CDF) analysis confirms our approach’s superiority:
$$F_{EE}(x) = P(EE \leq x)$$
The D3QN-based Unmanned Aerial Vehicle system achieves 38% higher median EE than DDQN and 72% improvement over fixed-trajectory approaches. This demonstrates how drone technology enables adaptive resource allocation in dynamic environments.
| Scheme | Median EE (Mbits/J) | Improvement |
|---|---|---|
| Fixed Trajectory + D3QN | 1.82 | Baseline |
| User-Sequential + D3QN | 2.15 | +18% |
| DDQN Joint Optimization | 2.94 | +62% |
| D3QN Joint Optimization | 4.06 | +123% |
Conclusion
This work presents a novel D3QN-based optimization framework for UAV-assisted MEC systems operating in ground interference environments. By jointly optimizing Unmanned Aerial Vehicle trajectories and user scheduling through deep reinforcement learning, we significantly enhance system energy efficiency. Our approach enables drones to intelligently perceive and avoid interference sources while minimizing flight energy expenditure. Simulation results demonstrate 123% EE improvement over fixed-trajectory baselines, highlighting the transformative potential of adaptive drone technology in edge computing. Future work will explore multi-UAV coordination and dynamic interference prediction for enhanced robustness.
