Collaborative Deployment and Energy Optimization for UAV-Enabled Visible Light Communication Networks

Unmanned Aerial Vehicles (UAVs) have emerged as pivotal platforms for executing complex low-altitude missions due to their inherent mobility and dynamic deployment capabilities. However, traditional radio frequency (RF) links face limitations such as electromagnetic interference and scarce spectrum resources, which hinder their ability to meet the demands of high-speed communication. Visible Light Communication (VLC) technology, leveraging Light Emitting Diodes (LEDs), offers a promising green communication solution for short-range wireless access, thanks to its wide spectrum availability and cost-effectiveness. Integrating VLC nodes onto UAV platforms enables the creation of spatially adaptive optical communication environments, effectively supporting service requirements in scenarios like nighttime rescue operations, disaster response, and large-scale event coverage. Nonetheless, challenges such as UAV flight altitude constraints, energy limitations, line-of-sight (LoS) link visibility, and dynamic environmental changes pose significant hurdles to achieving efficient, stable, and service-differentiated network deployments. Particularly in multi-user access scenarios, simultaneously addressing the diverse Quality of Service (QoS) demands of User Equipment (UE) remains a critical challenge for enhancing system performance.

Existing research has explored various dimensions to tackle these issues. For instance, some studies propose three-dimensional UAV deployment methods based on fairness considerations, optimizing UAV positions while accounting for energy efficiency and user mobility to improve spatial coverage in UAV-VLC systems. Others introduce non-orthogonal multiple access (NOMA) techniques, jointly optimizing UAV deployment locations, QoS guarantees, and power allocation through algorithms like the Harris Hawks Optimization to enhance overall system rates. Additionally, the integration of reconfigurable intelligent surfaces (RIS) has been investigated to jointly optimize UAV deployment, phase control of intelligent surfaces, and user association strategies, aiming to reduce energy consumption and the number of required UAVs. Despite these advancements, many approaches primarily focus on static deployment scenarios, often overlooking the dynamic nature of UAVs during continuous flight and the resulting interference variations. Consequently, recent research has increasingly emphasized combining resource allocation with trajectory planning to improve system robustness and adaptability in dynamic environments. However, these optimization problems typically involve high-dimensional coupled variables and complex nonlinear constraints, leading to high computational complexity. Moreover, in multi-UAV cooperative deployment scenarios, resource competition and inter-cell interference coupling among individuals evolve with dynamic UE positions, further exacerbating the complexity of system modeling and optimization.

In recent years, Deep Reinforcement Learning (DRL) has gained significant attention in wireless communication and edge computing domains due to its exceptional online decision-making capabilities. DRL agents can learn optimal policies from reward feedback through environmental interactions without prior knowledge of channel models or user movement patterns, offering adaptive and efficient solutions to complex system optimization problems. In UAV-assisted mobile edge computing systems, DRL has been applied to optimize UAV deployment and content caching strategies, as well as to jointly optimize trajectory planning, task scheduling, and service deployment. Furthermore, DRL frameworks have been designed for trajectory control to achieve high-precision target tracking by minimizing the Cramér-Rao lower bound of joint measurement likelihood functions. Although DRL demonstrates strong potential in handling complex resource optimization tasks, its direct application to UAV-VLC systems presents notable challenges. Specifically, the state and action spaces expand dramatically when performing multi-dimensional resource allocation and trajectory planning over continuous time slots. Coupled with frequently changing UE distributions and mutual interference effects in multi-UAV cooperative deployments, the training convergence and decision stability of agents face immense pressure, adversely affecting the final optimization outcomes.

This paper addresses the need for high-energy-efficient UAV-VLC network deployment by proposing a system energy consumption optimization strategy based on user association, UAV trajectory planning, and power allocation. Given the highly non-convex and high-dimensional coupled nature of the original optimization problem, an efficient two-stage sequential optimization framework is designed. This framework combines the K-means clustering algorithm and DRL to sequentially solve optimization subproblems within each time window. Through multiple iterative updates, a low-energy suboptimal solution is obtained for complex scenarios while ensuring system performance, demonstrating good scalability and computational efficiency. The proposed approach effectively decomposes the solution difficulty of the original problem, showcasing stronger adaptability and flexibility in handling dynamic user distributions, interference coupling complexity, and high-dimensional decision spaces. On one hand, user association dimensionality is reduced via K-means clustering, alleviating state space expansion; on the other hand, DRL facilitates temporal optimization of local subproblems, enhancing decision efficiency and convergence performance in dynamic scenarios.

The transmission model of the UAV-VLC network comprises a set of UAVs, denoted as $\mathcal{J} = \{1, 2, \dots, J\}$, and a set of ground users, denoted as $\mathcal{U} = \{1, 2, \dots, U\}$. Each lighting UAV flies at a fixed altitude $H$ and is equipped with high-power LEDs to provide communication and illumination services to ground users. UEs are randomly distributed within a geographical area and are equipped with photodetectors (PDs) to receive optical signals. The UAV flight mission is divided into time slots, represented by $\mathcal{T} = \{1, 2, \dots, T\}$, where $T$ is the total number of time slots.

At time slot $t$, the position of UAV $j$ and user $u$ are given by $\mathbf{w}_{j,t} = \{x_{j,t}, y_{j,t}, H\}$ and $\mathbf{v}_{u,t} = \{x_{u,t}, y_{u,t}, 0\}$, respectively. The distance between user $u$ and UAV $j$ is calculated as:

$$d_{u,j,t} = \sqrt{(x_{j,t} – x_{u,t})^2 + (y_{j,t} – y_{u,t})^2 + H^2}$$

Let $\mathbf{W}_t = \{\mathbf{w}_{1,t}, \mathbf{w}_{2,t}, \dots, \mathbf{w}_{J,t}\}$ represent the coordinates of all UAVs at time $t$, and $\mathbf{W} = \{\mathbf{W}_1, \mathbf{W}_2, \dots, \mathbf{W}_T\}$ denote the coordinates over the entire flight mission. Similarly, $\mathbf{V}_t = \{\mathbf{v}_{1,t}, \mathbf{v}_{2,t}, \dots, \mathbf{v}_{U,t}\}$ and $\mathbf{V} = \{\mathbf{V}_1, \mathbf{V}_2, \dots, \mathbf{V}_T\}$ represent the UE coordinates.

In outdoor environments, non-line-of-sight (NLoS) paths in VLC are relatively weak; thus, only LoS links are considered. Let $A_{PD}$ denote the physical area of the PD, $R_{PD}$ the responsivity, $\psi_{u,j,t}$ the incidence angle, and $\phi_{u,j,t}$ the irradiance angle. When $0 \leq \psi_{u,j,t} \leq \Psi_C$, the LoS channel gain between UAV $j$ and user $u$ at time $t$ is expressed as:

$$h_{u,j,t} = \frac{(m+1) R_{PD} A_{PD} g_f(\psi_{u,j,t})}{2\pi d_{u,j,t}^2} \cos^m(\phi_{u,j,t}) \cos(\psi_{u,j,t})$$

where $m = -\frac{1}{\log_2(\cos(\Phi_{1/2}))}$ is the Lambertian order, $\Phi_{1/2}$ is the LED semi-angle at half power, $\Psi_C$ is the field of view (FOV) semi-angle of the UE, and $g_f(\psi_{u,j,t})$ is the gain of the optical concentrator. Assuming the LED and PD are vertically downward and upward, respectively, we have $\cos(\phi_{u,j,t}) = \cos(\psi_{u,j,t}) = \frac{H}{d_{u,j,t}}$. Thus, the signal-to-interference-plus-noise ratio (SINR) for the link between UAV $j$ and user $u$ is given by:

$$\gamma_{u,j,t} = \frac{\left( \frac{\zeta P_{j,t} h_{u,j,t}}{\pi e} \right)^2}{\omega_n^2 + I_{u,t}^2}$$

where $e$ is Euler’s number, $\zeta$ is the illumination response factor, $\omega_n$ is the additive noise and background illumination noise, $P_{j,t}$ is the transmit power of UAV $j$ at time $t$, and $I_{u,t}$ is the interference from other neighboring UAVs $i$ to user $u$ at time $t$. According to the Shannon formula, the achievable data rate provided by UAV $j$ to user $u$ is:

$$R_{u,j,t} = \frac{1}{2} \log_2(1 + \gamma_{u,j,t})$$

The interference $I_{u,t}$ is calculated as:

$$I_{u,t} = \sum_{i=1, i \neq j}^{J} \zeta P_{i,t} h_{u,i,t}$$

The problem formulation aims to minimize the total energy consumption while satisfying user illumination and communication requirements. Each UE is associated with only one UAV, and the association, trajectory planning, and power allocation need to be optimized. The binary variable $\beta_{u,j,t} \in \{0,1\}$ indicates the association, where $\beta_{u,j,t} = 1$ if UAV $j$ serves user $u$ at time $t$, and $0$ otherwise. The total power consumption at time $t$ is:

$$P_t(\mathbf{P}, \mathbf{W}, \boldsymbol{\beta}) = \sum_{j=1}^{J} P_{j,t}$$

where $\boldsymbol{\beta} \in \mathbb{R}^{U \times J \times T}$ and $\mathbf{P} \in \mathbb{R}^{J \times T}$ represent the user association matrix and power allocation matrix, respectively. The average minimum total power problem over the entire period is formulated as:

$$\min_{\mathbf{P}, \mathbf{W}, \boldsymbol{\beta}} \frac{1}{T} \sum_{t=1}^{T} P_t(\mathbf{P}, \mathbf{W}, \boldsymbol{\beta})$$

subject to:

C1: $0 \leq P_{j,t} \leq P_{\max}, \quad \forall j \in \mathcal{J}, t \in \mathcal{T}$

C2: $P_{j,t} h_{u,j,t} \zeta + I_{u,t} \geq \Theta_1 d_{u,j,t}^{m+3}, \quad \forall j \in \mathcal{J}, u \in \mathcal{U}, t \in \mathcal{T}$

C3: $P_{j,t} h_{u,j,t} \zeta \geq \Theta_2 d_{u,j,t}^{m+3}, \quad \forall j \in \mathcal{J}, u \in \mathcal{U}, t \in \mathcal{T}$

C4: $d_{i,j,t} \geq d_{\text{th}}, \quad \forall i,j \in \mathcal{J}, i \neq j, t \in \mathcal{T}$

C5: $\beta_{u,j,t} \in \{0,1\}, \quad \forall j \in \mathcal{J}, u \in \mathcal{U}, t \in \mathcal{T}$

C6: $\sum_{j=1}^{J} \beta_{u,j,t} = 1, \quad \forall u \in \mathcal{U}, t \in \mathcal{T}$

where $\Theta_1$, $\Theta_2$, and $I_{\text{th}}$ are derived from illumination and communication thresholds. This problem is a mixed-integer non-convex optimization problem due to the coupling of continuous and discrete variables.

The proposed sequential collaborative optimization algorithm decomposes the problem into two stages using a sliding window approach. The total time $T$ is divided into $L$ overlapping short time periods with window length $T_L$ and sliding step $\tau$. The flight mission is redefined as $\mathcal{T} = \{T_1, T_2, \dots, T_L\}$. In each time window, user association is first determined using K-means clustering, followed by joint optimization of UAV trajectory and power allocation via DRL.

For user association in the $l$-th time window, the problem is:

$$\min_{\boldsymbol{\beta}^l} \frac{1}{T_L} \sum_{t \in T_l} \sum_{j=1}^{J} P_{j,t}$$

subject to constraints similar to C2-C6. The K-means clustering algorithm is employed to group users into clusters based on their spatial distribution, with the number of clusters equal to the number of UAVs. The cluster centers are initialized randomly, and users are assigned to the nearest center. The association variable $\beta_{u,j,t}^l$ is set to 1 if user $u$ is in the cluster of UAV $j$, and 0 otherwise. This reduces the dimensionality of the association problem.

After obtaining the user association $\boldsymbol{\beta}^l$, the joint optimization of UAV deployment and power allocation is formulated as:

$$\min_{\mathbf{P}^l, \mathbf{W}^l} \frac{1}{T_L} \sum_{t \in T_l} \sum_{j=1}^{J} P_{j,t}$$

subject to constraints C1-C4. This is solved using a DRL framework, where each UAV acts as an agent. The state space includes UAV coordinates, the action space defines movement directions and power adjustments, and the reward function encourages energy efficiency and constraint satisfaction. The DRL network consists of evaluation and target networks, with experience replay and $\epsilon$-greedy policy for exploration. The Q-value is updated as:

$$Q(s_t, a_t) \leftarrow Q(s_t, a_t) + \alpha \left[ r_t + \gamma \max_{a_{t+1}} Q(s_{t+1}, a_{t+1}) – Q(s_t, a_t) \right]$$

where $\alpha$ is the learning rate and $\gamma$ is the discount factor. The loss function for network parameter updates is:

$$L(\theta) = \mathbb{E} \left[ (Q_{\text{target}} – Q(s,a;\theta))^2 \right]$$

The sequential optimization algorithm iterates over time windows, updating variables and leveraging frozen variables in other windows to reduce complexity.

Simulations are conducted in a $60 \times 60$ m area with $U=12$ users and $J=4$ UAVs. The rate threshold $R_{\text{th}}$ and illumination threshold $\eta_{\text{th}}$ are set to 3.75 bps/Hz and $1.55 \times 10^{-6}$, respectively. The DRL network uses two hidden layers with 256 and 128 neurons, learning rate $\alpha=0.0006$, batch size $B=256$, and discount factor $\gamma=0.95$. Traditional schemes include fixed UAV positions at cluster centers (Benchmark 1) and fixed positions at sub-region centers (Benchmark 2).

The convergence performance of the proposed algorithm under different action spaces and learning rates is evaluated. With action space $\{+1, -1, 0\}$ and $\alpha=0.0006$, the algorithm achieves fast convergence and stable rewards, outperforming other configurations. The inclusion of a “no change” action allows UAVs to maintain optimal positions, enhancing stability.

The total power consumption under varying UAV hover heights and FOVs is analyzed. For $\Psi_C = 45^\circ$ and $\Psi_C = 55^\circ$, power consumption increases with height due to longer transmission distances. The proposed scheme consistently consumes less power than benchmarks, saving at least 74.39% and 67.62% in total transmit power. Wider FOVs lead to higher power consumption as they require more power to meet communication rates over increased distances.

Table 1 summarizes the simulation parameters.

Table 1: Simulation Parameters
Parameter	Value
PD Detection Area	1 cm²
Optical Filter Gain	0.9
Euler’s Number	1
FOV	45°
Maximum Power	5 W
Refractive Index	1.5
LED Semi-Radiation Angle	45°
Data Rate Threshold	3.75 bps/Hz

Power consumption versus rate threshold $R_{\text{th}}$ is examined for $H=10$ m and $\Psi_C=45^\circ$. As $R_{\text{th}}$ increases, total power rises, but the proposed scheme maintains lower consumption than benchmarks. Benchmark 1 exhibits higher power usage due to fixed deployment near users, while Benchmark 2 suffers from incomplete coverage in random user distributions.

UAV deployment positions under different schemes are compared. The proposed scheme adaptively positions lighting drones at optimal locations with allocated powers of 0.13 W, 0.23 W, 0.24 W, and 0.26 W, achieving a balance between communication and illumination needs. Trajectory planning in a fixed time period shows UAVs moving to best positions with low power consumption.

In conclusion, this paper presents a dual-stage sequential optimization scheme for UAV-VLC networks, integrating user association, trajectory planning, and power allocation to minimize energy consumption. The combination of K-means clustering and DRL efficiently handles dynamic user distributions and interference coupling. Simulation results demonstrate significant energy savings and improved performance compared to traditional approaches, highlighting the scheme’s effectiveness in real-world applications. Future work could explore multi-objective optimization and real-time adaptability in more complex environments.