Energy-Constrained Unmanned Aerial Vehicle Data Collection System Trajectory Optimization

Drone technology has revolutionized data acquisition across various domains, offering flexible deployment and enhanced efficiency compared to traditional ground-based methods. This research addresses the critical challenge of optimizing flight trajectories for Unmanned Aerial Vehicles with limited energy resources. We present a comprehensive framework where UAVs serve as aerial base stations collecting data from ground-based sources while constrained by onboard energy capacity. The trajectory optimization problem is formulated as a Markov Decision Process (MDP) and solved through reinforcement learning techniques, specifically Q-Learning, to maximize system throughput under energy constraints.

System Architecture and Mathematical Modeling

Consider a rectangular data collection area measuring $X_{\max} \times Y_{\max}$ meters containing $K$ randomly distributed data sources. The Unmanned Aerial Vehicle maintains constant altitude $H$ while executing five possible actions: eastward, westward, southward, northward movement, or hovering. Flight velocity remains constant at $V_0$ m/s, with time discretized into $\Delta t$-second slots, resulting in displacement $\Delta l = V_0 \Delta t$ per slot. The ground area is partitioned into $I_x \times I_y$ grid cells where $I_x = X_{\max}/\Delta l$ and $I_y = Y_{\max}/\Delta l$.

Communication Model

The channel gain between the UAV and data source $k$ at time $t$ is given by:

$$g_k(t) = \frac{\beta_0}{H^2 + \|\mathbf{z}(t) – \mathbf{z}_{S,k}\|^2}$$

where $\beta_0$ is reference channel gain at 1 meter, $\mathbf{z}_{S,k} = [x_{S,k}, y_{S,k}]^T$ denotes source coordinates, and $\mathbf{z}(t) = [x(t), y(t)]^T$ represents UAV ground projection. The communication rate with source $k$ using bandwidth $B$ is:

$$R_k(t) = B \log_2\left(1 + \frac{P_T g_k(t)}{\sigma_0^2}\right)$$

Total system throughput becomes:

$$R_{\sum}(t) = \sum_{k=1}^K R_k(t)$$

Data collected during $[t, t+\Delta t]$ is calculated as:

$$\Delta d(t) = \begin{cases}
R_{\sum}(t) \Delta t & \text{(hovering)} \\
\frac{\Delta t}{2} \left[R_{\sum}(t) + R_{\sum}(t+\Delta t)\right] & \text{(moving)}
\end{cases}$$

Energy Consumption Model

For rotary-wing Unmanned Aerial Vehicles, propulsion power dominates energy consumption. Flight power is modeled as:

$$P_{\text{fly}}(\mathbf{v}) = P_0 \left(1 + \frac{3\|\mathbf{v}\|^2}{V_{\text{tip}}^2}\right) + P_1 \left(\sqrt{1 + \frac{\|\mathbf{v}\|^4}{4v_0^4}} – \frac{\|\mathbf{v}\|^2}{2v_0^2}\right)^{\frac{1}{2}} + \frac{1}{2}d_0\rho s A \|\mathbf{v}\|^3$$

where $P_0$ and $P_1$ represent blade profile and induced powers during hover, $V_{\text{tip}}$ is tip speed, $v_0$ is induced velocity, $d_0$ is fuselage drag ratio, $s$ is rotor solidity, $\rho$ is air density, and $A$ is rotor disc area. Energy consumed during $\Delta t$ is:

$$\Delta e(t) = P_{\text{fly}}(v(t)) \Delta t$$

where $v(t) \in \{0, V_0\}$ for hovering or flight respectively.

Parameter	Symbol	Value
UAV weight	–	2 N (≈204g)
Rotor radius	–	0.1 m
Fuselage drag area	$d_0$	0.018 m²
Hover power	$P_{\text{fly}}(0)$	12 W
Cruise power	$P_{\text{fly}}(V_0)$	25 W
Flight speed	$V_0$	12 m/s

Problem Formulation

With initial energy $E_0$ and start position $\mathbf{z}_0 = [x_0, y_0]^T$, maximize total collected data before energy depletion:

$$\max_{\mathbf{z}(t)} \int_0^T R_{\sum}(t) dt$$
$$\text{subject to} \quad \int_0^T P_{\text{fly}}(v(t)) dt \leq E_0$$
$$x(t) \in [0, X_{\max}], y(t) \in [0, Y_{\max}]$$

Discretized version with $N_{\max} = T_{\max}/\Delta t$:

$$\max_{\{\mathbf{z}_n\}} \sum_{n=0}^N \Delta d_n$$
$$\text{subject to} \quad \sum_{n=0}^N \Delta e_n \leq E_0$$
$$x_n \in [0, X_{\max}], y_n \in [0, Y_{\max}], \quad n=0,1,\dots,N$$

Reinforcement Learning Framework

We model trajectory optimization as Markov Decision Process with these components:

State Space

$\mathcal{S} = \{(i\Delta l, j\Delta l) | i=0,1,\dots,I_x; j=0,1,\dots,I_y\}$ representing UAV ground coordinates

Action Space

$\mathcal{A} = \{\text{East}, \text{West}, \text{South}, \text{North}, \text{Hover}\}$ with boundary constraints

Reward Function

Designed to balance throughput and energy consumption:

$$r_{n+1} = w \left[ \frac{R_{\sum}(n\Delta t) + R_{\sum}((n+1)\Delta t)}{2c_R} \right] – (1-w) \left[ \frac{P_{\text{fly}}(v(n\Delta t)) – P_{\text{fly}}(0)}{c_E} \right] \Delta t$$

where $w \in [0,1]$ is weighting factor, $c_R = \max(R_{\sum})$ and $c_E = P_{\text{fly}}(V_0) – P_{\text{fly}}(0)$ normalize both terms.

Component	Description	Design Principle
Throughput term	$\frac{R_{\sum}(t) + R_{\sum}(t+\Delta t)}{2c_R}$	Encourages movement toward high-rate regions
Energy term	$-\frac{P_{\text{fly}}(v) – P_{\text{fly}}(0)}{c_E}\Delta t$	Penalizes movement energy overhead
Weight $w$	Balancing parameter	Critical for energy-throughput tradeoff

Q-Learning Trajectory Optimization Algorithm

We implement two algorithms: Algorithm 1 searches optimal $w$, while Algorithm 2 computes trajectory for given $w$.

Algorithm 1: Weight Optimization

Initialize step size $c_t \in (0,1)$, $w=0$
Call Algorithm 2 to obtain throughput $T_h(w)$, store in array $b$
Update $w := w + c_t$
If $w > 1$, exit loop; else return to Step 2
Find $w^{\text{OPT}} = \arg\max b$ and $T_h^{\text{OPT}}$
Execute Algorithm 2 with $w^{\text{OPT}}$ for optimal trajectory

Algorithm 2: Trajectory Learning (for fixed $w$)

Initialize exploration rate $\epsilon \in (0.5,1)$, decay $\beta\in(0.9,1)$, threshold $\epsilon_{\text{th}}$, discount $\gamma$, learning rate $\alpha$, $Q(\mathbf{z},a)=0 \ \forall \mathbf{z},a$
Initialize $n=0$, $\mathbf{z}_0 = [x_0,y_0]^T$, $E=E_0$, $T_h=0$
Choose action $a_n$ using $\epsilon$-greedy: $a_n = \begin{cases} \text{random action} & \text{prob } \epsilon \\ \arg\max_{a’} Q(\mathbf{z}_n,a’) & \text{otherwise} \end{cases}$
Execute $a_n$, observe $\mathbf{z}_{n+1}$, $r_{n+1}$, record trajectory
Update $Q(\mathbf{z}_n,a_n) \leftarrow (1-\alpha)Q(\mathbf{z}_n,a_n) + \alpha\left[r_{n+1} + \gamma \max_{a’} Q(\mathbf{z}_{n+1},a’)\right]$
Update $T_h \leftarrow T_h + \Delta d_n$, $E \leftarrow E – \Delta e_n$
If $E \leq 0$, terminate; else $n \leftarrow n+1$, go to Step 3
Decay exploration: $\epsilon \leftarrow \beta \epsilon$
If $\epsilon \leq \epsilon_{\text{th}}$, output $T_h$ and trajectory; else go to Step 2

Simulation Analysis

Experimental setup: $600\text{m} \times 600\text{m}$ area, $H=120\text{m}$, $\Delta t=0.5\text{s}$, $\Delta l=6\text{m}$, $K=6$ data sources, $B=1$ MHz/source, $P_T/\sigma_0^2=10$ dB, $\beta_0=41.6$ dB. Reinforcement learning parameters: $\gamma=0.5$, $\alpha=0.8$, $\beta=0.99$, $\epsilon_{\text{th}}=0.1$.

Algorithm Convergence

The proposed algorithm demonstrates robust convergence properties across different weight configurations. Cumulative rewards increase monotonically during training episodes, stabilizing after approximately 200 episodes. This convergence behavior validates our MDP formulation and reward design for drone technology applications.

Weight Parameter Impact

Figure 4 illustrates how weight $w$ affects collected data amount ($E_0=1500$ J). At $w \leq 0.2$, throughput remains constant at 1066 Mbits. As $w$ increases from 0.2 to 0.36, throughput rises to optimal 1317 Mbits. Further increasing $w$ reduces throughput, reaching 978 Mbits at $w=1$. This confirms $w$ critically balances energy-throughput tradeoffs in Unmanned Aerial Vehicle systems.

Trajectory Comparison

Figure 5 compares trajectories under optimal weighting ($w=0.36$) versus greedy approach ($w=1$). Both start identically from (0,0) to (54m,156m). The optimized Unmanned Aerial Vehicle then hovers, conserving energy, while greedy UAV continues to (216m,342m). This demonstrates how energy-aware trajectory optimization outperforms throughput-only approaches in energy-constrained drone technology applications.

Energy Capacity Analysis

Figure 6 examines initial energy $E_0$ versus collected data. Throughput increases with $E_0$ across all methods. The proposed algorithm consistently outperforms benchmarks: at $E_0=1500$ J, it achieves 1317 Mbits versus 978 Mbits for greedy approach (34.7% improvement). At $E_0=2000$ J, the improvement remains significant at 15.8%. Static hovering provides lowest performance, highlighting the necessity for intelligent trajectory planning in drone technology.

Energy $E_0$ (J)	Proposed (Mbits)	Greedy (Mbits)	Static (Mbits)	Improvement
500	527	410	512	28.5%
1000	905	692	887	30.8%
1500	1317	978	1066	34.7%
2000	1624	1402	1288	15.8%

Conclusion

This research presents a reinforcement learning framework for energy-constrained Unmanned Aerial Vehicle trajectory optimization in data collection systems. By formulating the problem as a Markov Decision Process with a balanced reward function, our Q-Learning approach effectively navigates the energy-throughput tradeoff. Simulations demonstrate significant throughput improvements over benchmark strategies: 34.7% enhancement versus greedy approach at 1500J energy capacity. The optimal weighting parameter ($w \approx 0.36$) proves critical for maximizing performance in drone technology applications. Future work will extend this framework to multi-UAV cooperative systems and dynamic environments. These advancements will further establish Unmanned Aerial Vehicles as indispensable tools for efficient data acquisition across scientific, commercial, and industrial domains.