Collaborative Content Caching Optimization in UAV-Assisted Internet of Vehicle Based on NOMA

The rapid evolution of Intelligent Transportation Systems (ITS) and connected vehicles has ushered in an era of unprecedented convenience and advanced services. However, this progress is accompanied by significant challenges within the Internet of Vehicles (IoV) ecosystem. The proliferation of computation-intensive and latency-sensitive applications, such as high-definition navigation, in-vehicle entertainment, and autonomous driving, generates massive data traffic and imposes stringent requirements on network resources and responsiveness. Traditional cloud-centric architectures often struggle to meet these demands due to the inherent latency of long-haul transmissions to remote data centers.

Vehicular Edge Computing (VEC) has emerged as a promising paradigm to address these issues by decentralizing computation and storage resources to the network edge, closer to the vehicles. By leveraging roadside units (RSUs) and other edge devices, VEC aims to reduce latency and backhaul load. Nevertheless, the highly dynamic and unpredictable nature of vehicular networks, coupled with the high cost and inflexibility of deploying permanent roadside infrastructure, limits the effectiveness of ground-based VEC solutions.

In this context, unmanned aerial vehicles (UAVs), or drones, present a revolutionary opportunity. Their inherent mobility, rapid deployment capability, and line-of-sight (LoS) propagation advantages make them ideal candidates for serving as agile aerial edge nodes. An unmanned drone can dynamically position itself to provide coverage where it is most needed, effectively complementing or even replacing fixed infrastructure in complex urban environments. By deploying a swarm of these unmanned drones, we can create a flexible and resilient aerial network layer capable of providing computational offloading and, critically, content caching services directly to vehicles on the ground.

However, merely using an unmanned drone as a simple relay or cache is insufficient. The radio spectrum is a scarce resource, and traditional Orthogonal Multiple Access (OMA) techniques, which assign exclusive resource blocks to users, become highly inefficient in dense vehicular scenarios with sporadic traffic. To overcome this fundamental limitation, we integrate Non-Orthogonal Multiple Access (NOMA) technology into our unmanned drone-assisted framework. NOMA allows multiple vehicles to share the same time/frequency resource block simultaneously through power-domain multiplexing, significantly improving spectral efficiency and system throughput. This is particularly powerful when an unmanned drone serves a cluster of vehicles.

Furthermore, effective caching in such a dynamic system requires intelligent coordination. Each unmanned drone in the swarm operates with a limited cache capacity and partial observation of the global environment (e.g., local vehicle requests, neighbor states). Making independent, myopic caching decisions leads to redundancy (multiple drones caching the same popular content) and missed opportunities for collaboration (failing to cache complementary content). Therefore, the core challenge shifts to designing a sophisticated, collaborative caching strategy that enables the unmanned drone swarm to learn and adapt its collective behavior over time, maximizing long-term performance metrics like cache hit rate and minimizing content retrieval latency.

This article presents a comprehensive solution to this challenge. We propose a novel collaborative content caching framework for NOMA-based unmanned drone-assisted IoV. Our contributions are multi-faceted: First, we design a dynamic vehicle clustering mechanism using K-Means++ to periodically group vehicles and optimize the deployment location of each unmanned drone serving as a cluster head. Second, we model the unmanned drone swarm as a graph and employ a Graph Convolutional Network (GCN) to aggregate topological relationships, caching states, and content popularity features across nodes, enhancing cross-node information awareness. Third, we formulate the cooperative caching problem as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) and introduce an attention mechanism into a multi-agent deep reinforcement learning algorithm (QMIX). This Attention-enhanced QMIX (which we term GCQM) allows each unmanned drone to make intelligent caching decisions by attentively weighting the states of its neighboring drones, ultimately striving to maximize the long-term system-wide cache hit rate. Extensive simulations validate that our proposed scheme significantly outperforms traditional and state-of-the-art caching strategies across various performance metrics under different network conditions.

System Model and Problem Formulation

We consider an urban IoV scenario comprising a hierarchical network architecture. A High Altitude Platform (HAP), such as a satellite or a high-altitude unmanned drone, provides overarching network management and serves as a content source. A swarm of $U$ unmanned drones, denoted by the set $\mathcal{U} = \{1, 2, …, u, …, U\}$, operates at a fixed altitude $h$. These unmanned drones function as mobile edge nodes with both caching and computational capabilities. A set of $\mathcal{V} = \{1, 2, …, v, …, V\}$ vehicles move on the ground according to realistic mobility patterns. The vehicles are dynamically clustered, and each cluster is associated with one unmanned drone that acts as its serving node. The system operates on a dual-time scale. A large timescale $T_B$ governs the deployment and trajectory adjustment of the unmanned drones. Within this, we define smaller caching periods indexed by $t_c$, each of duration $T_c$, during which the caching strategy is evaluated and potentially updated. Vehicle re-clustering occurs at the end of each $T_c$ period based on updated spatial distributions.

Mobility and Dynamic Clustering Model

Vehicle mobility is modeled using a combination of random waypoint and directed movement patterns to simulate complex urban traffic. The position of vehicle $v$ at time $t_c$ is $\phi_v(t_c) = (x_v(t_c), y_v(t_c), 0)$. The movement of each unmanned drone $u$ follows a Gauss-Markov mobility model to ensure smooth and realistic trajectories, with its 3D position given by $\omega_u(t_c) = (x_u(t_c), y_u(t_c), h)$. The velocity $v_u$ and heading $\theta_u$ are updated as:

$$
v_u(t_c+1) = \alpha \cdot v_u(t_c) + (1-\alpha) \cdot \bar{v}_u + \epsilon(t_c),
$$
$$
\theta_u(t_c+1) = \alpha \cdot \theta_u(t_c) + (1-\alpha) \cdot \bar{\theta}_u + \epsilon(t_c),
$$

where $\alpha$ is the inertia factor, $\bar{v}_u$ and $\bar{\theta}_u$ are the mean velocity and heading, and $\epsilon(t_c)$ is Gaussian noise.

At the beginning of each caching period $t_c$, the ground vehicles are partitioned into $U$ clusters using a dynamic K-Means++ algorithm, where the number of clusters equals the number of unmanned drones. The clustering aims to minimize the total Euclidean distance between vehicles and their assigned cluster head (unmanned drone), thereby optimizing the initial U2V link quality. The set of vehicles served by unmanned drone $u$ in period $t_c$ is denoted as $\mathcal{C}_u^{t_c}$.

Communication and NOMA Transmission Model

The communication model involves three types of links: HAP-to-Unmanned drone (H2U), Unmanned drone-to-Unmanned drone (U2U), and Unmanned drone-to-Vehicle (U2V). The H2U and U2U links are modeled with free-space path loss, while the U2V link follows the 3GPP UMi (Urban Micro) channel model, incorporating probabilistic LoS/NLoS conditions and small-scale fading.

The downlink transmission from an unmanned drone $u$ to its associated vehicle cluster $\mathcal{C}_u^{t_c}$ employs NOMA. Let $m_u = |\mathcal{C}_u^{t_c}|$ be the number of vehicles in the cluster. The unmanned drone transmits a superposition coded signal:

$$
d_u(t_c) = \sum_{v=1}^{m_u} \alpha_{u,v}(t_c) \sqrt{P_{u,v}(t_c)} d_{u,v}(t_c),
$$

where $\alpha_{u,v}(t_c) \in \{0,1\}$ is the association indicator, $P_{u,v}(t_c)$ is the power allocated to vehicle $v$, and $d_{u,v}(t_c)$ is the message signal. The received signal at vehicle $v$ is:

$$
y_{u,v}(t_c) = g_{u,v}(t_c) d_u(t_c) + I_{\text{intra}}^v(t_c) + I_{\text{inter}}^{u,v}(t_c) + \sigma(t_c),
$$

where $g_{u,v}(t_c)$ is the composite channel gain, $I_{\text{intra}}^v$ is intra-cluster interference from other vehicles served by the same unmanned drone, $I_{\text{inter}}^{u,v}$ is inter-cluster interference from other unmanned drones, and $\sigma(t_c)$ is additive white Gaussian noise.

Successive Interference Cancellation (SIC) is applied at the vehicle receivers. The unmanned drone, possessing global channel state information (CSI) for its cluster, determines the decoding order. Vehicles decode and subtract signals from others with weaker channel gains before decoding their own. The Signal-to-Interference-plus-Noise Ratio (SINR) for a vehicle $\varepsilon(v)$ (the $v$-th vehicle in the decoding order) is:

$$
\gamma_{u}^{\varepsilon(v)}(t_c) = \frac{\alpha_{u,\varepsilon(v)} P_{u,\varepsilon(v)}(t_c) |g_{u,\varepsilon(v)}(t_c)|^2}{I_{\text{intra}}^{\varepsilon(v)}(t_c) + I_{\text{inter}}^{u,\varepsilon(v)}(t_c) + \sigma^2}.
$$

The achievable data rate for the link between unmanned drone $u$ and vehicle $v$ is then:

$$
R_{u,v}(t_c) = B \log_2\left(1 + \gamma_{u}^{\varepsilon(v)}(t_c)\right),
$$

where $B$ is the system bandwidth allocated to the unmanned drone.

Collaborative Content Caching Model

Consider a library of $W$ popular content items, $\mathcal{F} = \{1, …, w, …, W\}$, each of size $\lambda_w$. Each unmanned drone $u$ has a finite cache capacity $\Psi_u$. The binary caching decision variable for content $w$ on unmanned drone $u$ at period $t_c$ is $r_u^w(t_c) \in \{0,1\}$, subject to the capacity constraint $\sum_{w \in \mathcal{F}} r_u^w(t_c) \lambda_w \leq \Psi_u$.

Content requests from vehicles follow a Zipf popularity distribution. The local popularity of content $w$ for unmanned drone $u$ in a period is estimated as $P_{u,w} = \kappa_{u,w} / \sum_{w’} \kappa_{u,w’}$, where $\kappa_{u,w}$ is the historical request count.

When a vehicle requests content $w$, it can be retrieved via one of three paths, each with associated latency:

Local Cache Hit ($\chi_1$): Retrieved directly from the serving unmanned drone $u$. Latency: $\tau^1 = \lambda_w / R_{u,v}$.
Neighbor Cache Hit ($\chi_2$): Retrieved from a neighboring unmanned drone $u’$ via a U2U link, then to the vehicle. Latency: $\tau^2 = \lambda_w / R_{u’\to u} + \lambda_w / R_{u,v}$.
Remote Miss ($\chi_3$): Retrieved from the HAP via H2U link, then to the vehicle. Latency: $\tau^3 = \lambda_w / R_{HAP \to u} + \lambda_w / R_{u,v}$.

Clearly, $\tau^1 < \tau^2 < \tau^3$. The system aims to maximize local and neighbor hits to minimize average content retrieval latency.

Problem Formulation

Our objective is to minimize the long-term average content retrieval latency for all vehicles by jointly optimizing vehicle clustering, unmanned drone deployment, power allocation for NOMA, and collaborative caching decisions. The optimization problem can be summarized as:

$$
\min_{\{\alpha_{u,v}\},\{P_{u,v}\},\{r_u^w\}} \lim_{T \to \infty} \frac{1}{T} \sum_{t_c=1}^{T} \sum_{u \in \mathcal{U}} \sum_{v \in \mathcal{C}_u^{t_c}} \left( \chi_1 \tau^1 + \chi_2 \tau^2 + \chi_3 \tau^3 \right)
$$

Subject to:

$$
\begin{aligned}
&\text{(C1): } \sum_{v} \alpha_{u,v}(t_c) = 1, \quad \forall u, t_c. \quad \text{(Vehicle association)} \\
&\text{(C2): } \sum_{v \in \mathcal{C}_u^{t_c}} P_{u,v}(t_c) \leq P_u^{max}, \quad \forall u, t_c. \quad \text{(Power constraint)} \\
&\text{(C3): } g_{u}^{\varepsilon(v)} \leq g_{u}^{\varepsilon(o)} \text{ for } o > v. \quad \text{(SIC decoding order)} \\
&\text{(C4): } \sum_{w} r_u^w(t_c) \lambda_w \leq \Psi_u, \quad \forall u, t_c. \quad \text{(Cache capacity)} \\
&\text{(C5): } r_u^w(t_c) \in \{0,1\}, \quad \forall u, w, t_c. \quad \text{(Binary decision)} \\
&\text{(C6): } \chi_1 + \chi_2 + \chi_3 = 1. \quad \text{(Delivery mode)}
\end{aligned}
$$

This is a complex mixed-integer non-linear programming (MINLP) problem involving coupled decisions across timescales. The dynamic nature of the network makes finding a centralized optimal solution intractable. Therefore, we decompose the problem and propose a data-driven, learning-based solution.

Proposed Solution: GCQM Framework

Our proposed solution, the Graph Convolutional network and attention-enhanced QMIX (GCQM) framework, tackles the problem through a combination of algorithmic techniques.

Phase 1: Dynamic Vehicle Clustering and Unmanned Drone Deployment

At the start of each large timescale $T_B$, or when vehicle distribution changes significantly, we execute a K-Means++ based clustering algorithm. The algorithm takes the set of vehicle locations $\{\phi_v\}$ and the number of unmanned drones $U$ as input. Unlike standard K-Means, K-Means++ carefully initializes cluster centroids (future unmanned drone positions) by selecting points with probability proportional to their squared distance from the nearest existing centroid. This prevents poor clustering outcomes and leads to faster convergence. The algorithm outputs $U$ vehicle clusters $\{\mathcal{C}_1, …, \mathcal{C}_U\}$ and the optimal 2D coordinates for each unmanned drone $(x_u, y_u)$. The unmanned drone then flies to this location at altitude $h$. This process ensures that each unmanned drone is centrally positioned relative to its served vehicles, minimizing average path loss for the U2V links.

Clustering Quality (Silhouette Score) vs. Vehicle Fleet Size
Number of Vehicles	Silhouette Coefficient
20	0.68
40	0.64
60	0.63
80	0.55
100	0.54

Phase 2: Graph-Based Collaborative Caching with Attention-Enhanced Deep Reinforcement Learning

Once unmanned drones are deployed, the focus shifts to the real-time collaborative caching decision problem within each caching period $t_c$. We model this as a Dec-POMDP.

1. State, Observation, and Action: The global state $s_{t_c}$ is partially observable by each unmanned drone agent. The local observation $o_u^{t_c}$ for unmanned drone $u$ includes: its own cache state vector $\mathbf{E}_u(t_c)$, the local content popularity vector $\mathbf{P}_u(t_c)$ aggregated over a time window, its remaining cache capacity $\psi_u^{cap}(t_c)$, and the cached content identifiers from its neighboring unmanned drones $\mathcal{N}_u(t_c)$. The action $a_u^{t_c}$ is a binary vector representing the new caching decision for all contents in the library.

2. Reward Design: The immediate reward for unmanned drone $u$ encourages both local hits and cooperative hits via neighbors:
$$
r_u^{t_c} = \omega_1 \cdot r_u^{\text{self}, t_c} + \omega_2 \cdot r_u^{\text{neig}, t_c},
$$
where $r_u^{\text{self}, t_c}$ is the hit rate within its own cluster, $r_u^{\text{neig}, t_c}$ is the hit rate achieved by its neighbors for its cluster’s requests (encouraging content diversity), and $\omega_1, \omega_2$ are weighting coefficients. The global reward is $r^{t_c} = \sum_{u} r_u^{t_c}$.

3. Network Architecture with Graph Convolution: The unmanned drone swarm is naturally represented as a dynamic graph $\mathcal{G}_{t_c} = (\mathcal{U}, \mathcal{E}_{t_c})$, where nodes are unmanned drones and edges represent potential communication/collaboration links. An adjacency matrix $\mathbf{A}_{t_c}$ is constructed based on a composite score $S_{u,j}$ combining physical distance and cache complementarity (Jaccard similarity) between unmanned drones $u$ and $j$:
$$
S_{u,j}(t_c) = \omega_d \cdot \frac{1}{d_{u,j}(t_c)} + \omega_c \cdot \left(1 – \frac{|\mathbf{E}_u \cap \mathbf{E}_j|}{|\mathbf{E}_u \cup \mathbf{E}_j|}\right).
$$

This allows the graph to encode both physical topology and semantic caching relationships.

Each unmanned drone’s observation $o_u^{t_c}$ is first encoded into a feature vector $\mathbf{h}_u^{(0)}$. A two-layer Graph Convolutional Network (GCN) then aggregates features from the node’s neighborhood:

$$
\mathbf{H}^{(l+1)} = \sigma\left( \tilde{\mathbf{A}}_{t_c} \mathbf{H}^{(l)} \mathbf{W}^{(l)} \right), \quad l \in \{0, 1\},
$$

where $\tilde{\mathbf{A}}_{t_c}$ is the normalized adjacency matrix with self-loops, $\mathbf{H}^{(l)}$ is the matrix of node features at layer $l$, $\mathbf{W}^{(l)}$ is a trainable weight matrix, and $\sigma$ is a non-linear activation function (e.g., ReLU). This operation allows each unmanned drone to incorporate features from its multi-hop neighbors, creating a rich representation $\mathbf{h}_u^{\prime\prime}$ that captures the swarm’s caching context.

4. Attention-Enhanced QMIX Learning: The refined features $\mathbf{h}_u^{\prime\prime}$ are fed into a multi-agent reinforcement learning framework based on QMIX. Each unmanned drone has a local Q-network that outputs Q-values for its possible actions based on its enhanced feature vector. To further improve coordination, we introduce a multi-head attention mechanism on top of the GCN outputs. For each attention head, the importance of neighbor $j$ to unmanned drone $i$ is computed as:

$$
\alpha_{i,j}^{h} = \frac{\exp\left(\tau \cdot (\mathbf{W}_Q^h \mathbf{h}_i^{\prime\prime})^\top (\mathbf{W}_K^h \mathbf{h}_j^{\prime\prime}) \right)}{\sum_{k \in \mathcal{N}_i} \exp\left(\tau \cdot (\mathbf{W}_Q^h \mathbf{h}_i^{\prime\prime})^\top (\mathbf{W}_K^h \mathbf{h}_k^{\prime\prime}) \right)},
$$

where $\mathbf{W}_Q^h, \mathbf{W}_K^h$ are learnable projection matrices for query and key, and $\tau$ is a scaling factor. The final aggregated feature for node $i$ is a concatenation of outputs from all heads:
$$
\mathbf{h}_i^{\prime\prime\prime} = \sigma\left( \text{Concat}\left[ \sum_{j \in \mathcal{N}_i} \alpha_{i,j}^{h} \mathbf{W}_V^h \mathbf{h}_j^{\prime\prime} \right]_{h=1}^{H} \right).
$$

This feature $\mathbf{h}_i^{\prime\prime\prime}$, now containing attentively-weighted information from relevant neighbors, is used as input to the local Q-network. A central mixing network, whose weights are generated by a hypernetwork conditioned on the global state $s_{t_c}$, combines the local Q-values $Q_u(o_u, a_u)$ into a joint action-value function $Q_{tot}(\mathbf{s}, \mathbf{a})$ that satisfies the monotonicity constraint ($\partial Q_{tot}/\partial Q_u \geq 0$). This enables centralized training of decentralized policies. The networks are trained end-to-end to minimize the temporal-difference (TD) error loss:

$$
\mathcal{L}(\theta) = \mathbb{E}_{(\mathbf{s}, \mathbf{a}, r, \mathbf{s}’) \sim \mathcal{D}} \left[ \left( r + \gamma \max_{\mathbf{a}’} Q_{tot}(\mathbf{s}’, \mathbf{a}’; \theta^-) – Q_{tot}(\mathbf{s}, \mathbf{a}; \theta) \right)^2 \right],
$$

where $\mathcal{D}$ is a replay buffer, $\gamma$ is the discount factor, and $\theta^-$ are the parameters of a target network.

Performance Evaluation and Analysis

We conduct extensive simulations in a Manhattan grid scenario following 3GPP standards to validate the performance of our proposed GCQM scheme. The simulation parameters are summarized below.

Simulation Parameters for UAV-Assisted Caching Network
Parameter	Value	Parameter	Value
Area Size	1 km × 1 km	Number of UAVs ($U$)	5
Number of Vehicles ($V$)	20 – 100	UAV Altitude ($h$)	100 m
UAV Cache Capacity ($\Psi_u$)	100 – 300 MB	Content Library Size ($W$)	30
Content Size ($\lambda_w$)	8 – 20 MB	Zipf Parameter ($\alpha$)	1.0
UAV Max Power ($P_u^{max}$)	30 dBm	System Bandwidth ($B$)	20 MHz
Carrier Frequency (U2V)	2 GHz	Caching Period ($T_c$)	10 s

Benchmark Schemes: We compare GCQM against several baseline strategies:
Random Caching, Least Frequently Used (LFU), First-In-First-Out (FIFO), a single-agent Dueling DQN (DDQN) where each unmanned drone learns independently, a game-theoretic approach GT-SSA, and a multi-agent deep deterministic policy gradient method MADDPG.

Key Results and Discussion

1. Benefit of NOMA over OMA: The integration of NOMA is fundamental to our framework. The table below demonstrates its superiority over traditional OMA in terms of aggregate system throughput and average latency per content delivery. NOMA’s ability to serve multiple vehicles simultaneously within the same resource block yields significant gains, especially as vehicle density increases.

Performance Comparison: NOMA vs. OMA
Number of Vehicles	Throughput (Gbps) – NOMA	Throughput (Gbps) – OMA	Latency (ms) – NOMA	Latency (ms) – OMA
20	8.1	6.2	120	181
60	12.3	7.5	145	382
100	15.6	8.9	195	620

2. Impact of Vehicle Density: As the number of vehicles scales from 20 to 100, network congestion increases. Our GCQM scheme consistently outperforms all benchmarks in cache hit rate, maintaining a hit rate above 35% even at high density, whereas traditional policies like Random and FIFO drop below 15%. This high hit rate directly translates to lower average latency (GCQM: ~275 ms at 100 vehicles vs. >500 ms for others) and significantly reduced backhaul load to the HAP, saving core network bandwidth. The efficient collaboration learned by GCQM ensures content diversity across the unmanned drone swarm, preventing redundant caching and maximizing the utility of the collective cache space.

3. Impact of Unmanned Drone Cache Capacity: Increasing the cache capacity $\Psi_u$ from 100 MB to 300 MB naturally improves performance for all schemes. However, GCQM exhibits the most substantial gain, with its hit rate increasing by over 60%, compared to ~44% for DDQN. This highlights the effectiveness of the GCN and attention mechanism in understanding global content popularity trends and strategically placing content. The gap between GCQM and frequency-based LFU widens with capacity, showing that learning dynamic popularity is superior to relying on simple historical counts.

4. Ablation Study on GCN and Attention: To validate the core components of our learning architecture, we compare the full GCQM model against two variants: GCQM-MeanPool (replaces GCN with simple mean-pooling of neighbor features) and GCQM-NoGCN (removes neighbor feature aggregation entirely). The learning curves show that the full GCQM model achieves a significantly higher and more stable average reward, converging faster. This proves that the structured feature aggregation through GCN and the adaptive weighting through attention are crucial for learning effective collaborative policies in the dynamic graph environment of an unmanned drone swarm.

Conclusion and Future Work

In this work, we have presented a holistic framework for optimizing content caching in dynamic vehicular networks using a swarm of unmanned drones empowered by NOMA. By addressing the intertwined challenges of dynamic topology, spectral efficiency, and decentralized decision-making, our proposed GCQM scheme demonstrates significant performance improvements. The key innovation lies in the synergistic integration of a K-Means++ based dynamic clustering mechanism for efficient unmanned drone deployment, a Graph Convolutional Network for aggregating swarm-wide caching context, and an attention-enhanced multi-agent deep reinforcement learning algorithm for making intelligent, collaborative caching decisions. Simulation results confirm that our approach substantially enhances cache hit rates, reduces content retrieval latency, lowers backhaul load, and improves energy efficiency compared to existing schemes, validating its efficacy and robustness in high-density IoV scenarios.

For future work, several promising directions exist. First, the joint optimization of unmanned drone trajectory and caching decisions in real-time could be explored to adapt to fast-changing request hotspots. Second, integrating heterogeneous networks, where unmanned drones coexist with terrestrial RSUs and macro base stations, would provide a more comprehensive solution. Third, investigating dynamic adjustment of the unmanned drone swarm size based on real-time traffic demand could lead to more efficient resource utilization. Finally, addressing security and privacy concerns in such a collaborative caching system, possibly through blockchain or federated learning techniques, is an important area for practical deployment.