China UAV Path Planning Under AoI Constraints Using Maximum Entropy Deep Reinforcement Learning

As a researcher specializing in intelligent unmanned systems within China’s rapidly expanding UAV sector, I address the critical challenge of optimizing information freshness (Age of Information, AoI) and energy efficiency for China UAVs operating in highly dynamic environments. Traditional path planning methods fail in scenarios involving mobile nodes (e.g., ground vehicles), where AoI spikes and energy constraints severely limit mission duration. Here, I present a novel China UAV trajectory optimization framework leveraging the Soft Actor-Critic (SAC) algorithm, integrating maximum entropy reinforcement learning to balance AoI minimization with energy sustainability.

1. System Modeling for China UAV Operations

The China UAV acts as an aerial base station collecting data from M ground vehicles. The environment is discretized into time slots *t* ∈ {0,1,…,N−1}, each with duration Δ*t*.

1.1 Communication Model

The air-to-ground channel follows Rician fading. The achievable uplink rate from vehicle *m* to the China UAV is:Rm[t]=Blog⁡2(1+Puβ0dm−2[t]σ2)(1)Rm[t]=Blog2(1+σ2Puβ0dm−2[t])(1)

where:

$d_m[t] = \sqrt{(x_u[t]-x_m[t])^2 + (y_u[t]-y_m[t])^2 + H^2}$: UAV-vehicle distance
$B$: Bandwidth, $P_u$: UAV transmit power, $\sigma^2$: Noise power
$\beta_0$: Reference channel gain at 1m, $H$: UAV altitude

Table 1: Communication Parameters

Parameter	Symbol	Value
Bandwidth	$B$	4 MHz
UAV TX Power	$P_u$	1 W
Noise Power	$\sigma^2$	-100 dBm
UAV Altitude	$H$	10 m

1.2 Energy Consumption Model

China UAV propulsion power combines blade profile, induced, and parasitic components:P(v[t])=P0(1+3∥v[t]∥2utip2)+12z0ρsk∥v[t]∥3+Pi1+∥v[t]∥44v04−∥v[t]∥22v02(2)P(v[t])=P0(1+utip23∥v[t]∥2)+21z0ρsk∥v[t]∥3+Pi1+4v04∥v[t]∥4−2v02∥v[t]∥2(2)

Total energy over T is $E_{total} = \sum_{t=0}^{T} P(\mathbf{v}[t]) \Delta t$.

Table 2: Energy Model Coefficients

Parameter	Symbol	Value
Hover Power	$P_0$	79.86 W
Induced Power	$P_i$	88.63 W
Tip Speed	$u_{tip}$	120 m/s
Air Density	$\rho$	1.225 kg/m³
Rotor Solidity	$s$	0.05

1.3 AoI Dynamics

AoI for vehicle *m* at *t* is $I_m[t] = t – t_m’$, where $t_m’$ is the latest data receipt time. Total AoI:Itotal=∑t=0T∑m=1MIm[t](3)Itotal=t=0∑Tm=1∑MIm[t](3)

2. SAC-Based Path Optimization for China UAV

The optimization problem maximizes a reward R combining energy efficiency and AoI:max⁡∑m∈M{ξDtotalEtotal−ζItotal}(4)maxm∈M∑{ξEtotalDtotal−ζItotal}(4)

subject to:

$E_{total} \leq E_{max} \quad \forall t$
$\mathbf{q}u[0] = (x{orig}, y_{orig}, H)^\top$, $\mathbf{q}u[T] = (x{dest}, y_{dest}, H)^\top$

where $\xi=0.1$, $\zeta=0.002$ are trade-off weights.

2.1 Maximum Entropy SAC Framework

SAC maximizes expected reward and policy entropy:J(π)=E(s,a)∼ρπ[r(s,a)+αH(π(⋅∣s))](5)J(π)=E(s,a)∼ρπ[r(s,a)+αH(π(⋅∣s))](5)

where $\mathcal{H}(\pi(\cdot|s)) = -\log \pi(a|s)$ is entropy, and $\alpha$ is the temperature parameter.

2.2 Algorithm Architecture

Critic: Two Q-networks ($Q_{\theta_1}$, $Q_{\theta_2}$) minimize:JQ(θi)=E(s,a,r,s′)∼D[(Qθi(s,a)−y)2](6)JQ(θi)=E(s,a,r,s′)∼D[(Qθi(s,a)−y)2](6)

with target:y=r+γ(min⁡j=1,2Qθˉj(s′,a′)−αlog⁡π(a′∣s′))y=r+γ(j=1,2minQθˉj(s′,a′)−αlogπ(a′∣s′))

Actor updates policy parameters $\phi$ via:∇ϕJπ(ϕ)=Es∼D,a∼πϕ[∇ϕlog⁡πϕ(a∣s)⋅(αlog⁡πϕ(a∣s)−Q(s,a))](7)∇ϕJπ(ϕ)=Es∼D,a∼πϕ[∇ϕlogπϕ(a∣s)⋅(αlogπϕ(a∣s)−Q(s,a))](7)

Table 3: SAC Hyperparameters

Parameter	Value
Discount Factor ($\gamma$)	0.99
Replay Buffer Size	1,000,000
Batch Size	256
Target Update Rate ($\tau$)	0.005
Learning Rate	3×10⁻⁴

3. Performance Evaluation

Simulations used Python 3.11/PyTorch 2.2.1 on an AMD Ryzen 5 3600/NVIDIA RTX 2060 platform.

3.1 Key Results

Convergence Speed: SAC achieves stable reward ~50% faster than DDPG/PPO.
Reward Improvement: SAC attains 22.5% higher cumulative reward versus DDPG.
Energy-AoI Trade-off: SAC reduces AoI by 18.3% while consuming 12.7% less energy than PPO.

Table 4: Algorithm Comparison

Metric	SAC	DDPG	PPO
Avg. Reward	0.58	0.41	0.47
Convergence Episodes	120	190	210
AoI (avg.)	152	186	195
Energy Used (kJ)	318	365	347

3.2 Robustness in Dynamic Environments

In SUMO-simulated urban mobility scenarios, SAC-enabled China UAVs:

Adapted to vehicle speeds following Gaussian distribution (μ=60 km/h, σ=15 km/h)
Maintained AoI ≤ 200 ms under 90% of channel fading conditions
Achieved 95% data collection completeness within energy budget $E_{max}$ = 350 kJ

4. Conclusion

This work demonstrates SAC’s superiority for China UAV path planning under AoI-energy constraints. By embedding maximum entropy principles and dual-Q critics, our framework delivers:

High Sample Efficiency: 1.7× faster convergence than DDPG
Optimal Exploration-Exploitation: Achieved via entropy regularization
Real-World Viability: Validated in dynamic vehicle tracking scenarios
Future efforts will extend this to multi-China UAV swarms and non-terrestrial networks. The integration of SAC underscores China UAV’s potential in 6G-era mobile data acquisition.