China UAV Path Planning Under AoI Constraints Using Maximum Entropy Deep Reinforcement Learning

As a researcher specializing in intelligent unmanned systems within China’s rapidly expanding UAV sector, I address the critical challenge of optimizing information freshness (Age of Information, AoI) and energy efficiency for China UAVs operating in highly dynamic environments. Traditional path planning methods fail in scenarios involving mobile nodes (e.g., ground vehicles), where AoI spikes and energy constraints severely limit mission duration. Here, I present a novel China UAV trajectory optimization framework leveraging the Soft Actor-Critic (SAC) algorithm, integrating maximum entropy reinforcement learning to balance AoI minimization with energy sustainability.


1. System Modeling for China UAV Operations

The China UAV acts as an aerial base station collecting data from M ground vehicles. The environment is discretized into time slots *t* ∈ {0,1,…,N−1}, each with duration Δ*t*.

1.1 Communication Model

The air-to-ground channel follows Rician fading. The achievable uplink rate from vehicle *m* to the China UAV is:Rm[t]=Blog⁡2(1+Puβ0dm−2[t]σ2)(1)Rm​[t]=Blog2​(1+σ2Puβ0​dm−2​[t]​)(1)

where:

  • $d_m[t] = \sqrt{(x_u[t]-x_m[t])^2 + (y_u[t]-y_m[t])^2 + H^2}$: UAV-vehicle distance
  • $B$: Bandwidth, $P_u$: UAV transmit power, $\sigma^2$: Noise power
  • $\beta_0$: Reference channel gain at 1m, $H$: UAV altitude

Table 1: Communication Parameters

ParameterSymbolValue
Bandwidth$B$4 MHz
UAV TX Power$P_u$1 W
Noise Power$\sigma^2$-100 dBm
UAV Altitude$H$10 m

1.2 Energy Consumption Model

China UAV propulsion power combines blade profile, induced, and parasitic components:P(v[t])=P0(1+3∥v[t]∥2utip2)+12z0ρsk∥v[t]∥3+Pi1+∥v[t]∥44v04−∥v[t]∥22v02(2)P(v[t])=P0​(1+utip2​3∥v[t]∥2​)+21​z0​ρskv[t]∥3+Pi​1+4v04​∥v[t]∥4​​−2v02​∥v[t]∥2​​(2)

Total energy over T is $E_{total} = \sum_{t=0}^{T} P(\mathbf{v}[t]) \Delta t$.

Table 2: Energy Model Coefficients

ParameterSymbolValue
Hover Power$P_0$79.86 W
Induced Power$P_i$88.63 W
Tip Speed$u_{tip}$120 m/s
Air Density$\rho$1.225 kg/m³
Rotor Solidity$s$0.05

1.3 AoI Dynamics

AoI for vehicle *m* at *t* is $I_m[t] = t – t_m’$, where $t_m’$ is the latest data receipt time. Total AoI:Itotal=∑t=0T∑m=1MIm[t](3)Itotal​=t=0∑Tm=1∑MIm​[t](3)


2. SAC-Based Path Optimization for China UAV

The optimization problem maximizes a reward R combining energy efficiency and AoI:max⁡∑m∈M{ξDtotalEtotal−ζItotal}(4)maxmM∑​{ξEtotalDtotal​​−ζItotal​}(4)

subject to:

  • $E_{total} \leq E_{max} \quad \forall t$
  • $\mathbf{q}u[0] = (x{orig}, y_{orig}, H)^\top$, $\mathbf{q}u[T] = (x{dest}, y_{dest}, H)^\top$

where $\xi=0.1$, $\zeta=0.002$ are trade-off weights.

2.1 Maximum Entropy SAC Framework

SAC maximizes expected reward and policy entropy:J(π)=E(s,a)∼ρπ[r(s,a)+αH(π(⋅∣s))](5)J(π)=E(s,a)∼ρπ​​[r(s,a)+αH(π(⋅∣s))](5)

where $\mathcal{H}(\pi(\cdot|s)) = -\log \pi(a|s)$ is entropy, and $\alpha$ is the temperature parameter.

2.2 Algorithm Architecture

Critic: Two Q-networks ($Q_{\theta_1}$, $Q_{\theta_2}$) minimize:JQ(θi)=E(s,a,r,s′)∼D[(Qθi(s,a)−y)2](6)JQ​(θi​)=E(s,a,r,s′)∼D​[(Qθi​​(s,a)−y)2](6)

with target:y=r+γ(min⁡j=1,2Qθˉj(s′,a′)−αlog⁡π(a′∣s′))y=r+γ(j=1,2min​Qθˉj​​(s′,a′)−αlogπ(a′∣s′))

Actor updates policy parameters $\phi$ via:∇ϕJπ(ϕ)=Es∼D,a∼πϕ[∇ϕlog⁡πϕ(a∣s)⋅(αlog⁡πϕ(a∣s)−Q(s,a))](7)∇ϕJπ​(ϕ)=Es∼D,aπϕ​​[∇ϕ​logπϕ​(as)⋅(αlogπϕ​(as)−Q(s,a))](7)

Table 3: SAC Hyperparameters

ParameterValue
Discount Factor ($\gamma$)0.99
Replay Buffer Size1,000,000
Batch Size256
Target Update Rate ($\tau$)0.005
Learning Rate3×10⁻⁴

3. Performance Evaluation

Simulations used Python 3.11/PyTorch 2.2.1 on an AMD Ryzen 5 3600/NVIDIA RTX 2060 platform.

3.1 Key Results

  • Convergence Speed: SAC achieves stable reward ~50% faster than DDPG/PPO.
  • Reward Improvement: SAC attains 22.5% higher cumulative reward versus DDPG.
  • Energy-AoI Trade-off: SAC reduces AoI by 18.3% while consuming 12.7% less energy than PPO.

Table 4: Algorithm Comparison

MetricSACDDPGPPO
Avg. Reward0.580.410.47
Convergence Episodes120190210
AoI (avg.)152186195
Energy Used (kJ)318365347

3.2 Robustness in Dynamic Environments

In SUMO-simulated urban mobility scenarios, SAC-enabled China UAVs:

  • Adapted to vehicle speeds following Gaussian distribution (μ=60 km/h, σ=15 km/h)
  • Maintained AoI ≤ 200 ms under 90% of channel fading conditions
  • Achieved 95% data collection completeness within energy budget $E_{max}$ = 350 kJ

4. Conclusion

This work demonstrates SAC’s superiority for China UAV path planning under AoI-energy constraints. By embedding maximum entropy principles and dual-Q critics, our framework delivers:

  1. High Sample Efficiency: 1.7× faster convergence than DDPG
  2. Optimal Exploration-Exploitation: Achieved via entropy regularization
  3. Real-World Viability: Validated in dynamic vehicle tracking scenarios
    Future efforts will extend this to multi-China UAV swarms and non-terrestrial networks. The integration of SAC underscores China UAV’s potential in 6G-era mobile data acquisition.
Scroll to Top