AI-Enhanced Routing for Unmanned Drone Swarm Networks

The proliferation of unmanned drone technology has catalyzed a paradigm shift across numerous civilian and industrial sectors, driving the nascent field of the low-altitude economy. Individual unmanned drones, however, are inherently constrained by limited energy, payload capacity, and operational scope. To execute complex, large-scale missions—such as wide-area surveillance, precision agriculture, infrastructure inspection, and emergency response—coordinated unmanned drone swarms are essential. The efficacy of such swarms is fundamentally dependent on robust, low-latency, and self-organizing communication networks. This necessity has given rise to Flying Ad-Hoc Networks (FANETs), a distributed networking architecture where each unmanned drone acts simultaneously as a communication node and a router, enabling multi-hop data relay without reliance on fixed infrastructure.

Designing efficient routing protocols for FANETs presents unique and formidable challenges that distinguish them from their terrestrial counterparts like Mobile Ad-Hoc Networks (MANETs) or Vehicular Ad-Hoc Networks (VANETs). The defining characteristics of unmanned drone swarms—namely, high mobility in three-dimensional space, rapidly changing network topology, stringent latency requirements for real-time coordination, and severe energy constraints due to battery-powered flight—render traditional routing protocols largely ineffective. Proactive protocols incur excessive control overhead from frequent topology updates, while reactive protocols suffer from high route discovery latency. Moreover, traditional methods often optimize for a single metric, lacking the adaptability to perform dynamic trade-offs among conflicting objectives like delay, reliability, and energy consumption.

The rapid advancement of Artificial Intelligence (AI) offers a promising avenue to overcome these hurdles. AI algorithms, with their inherent capabilities for learning, adaptation, and multi-objective optimization, are increasingly being leveraged to design intelligent, resilient, and efficient routing strategies for unmanned drone swarms. This paper provides a comprehensive review of the application of AI in FANET routing. We begin by outlining FANET architectures and their associated routing challenges. Subsequently, we construct a dual-dimensional taxonomy—centered on AI algorithms and underpinned by routing architecture—to systematically categorize and analyze state-of-the-art methods. These are broadly classified into topology-based and cluster-based routing, with detailed examinations of applied techniques such as Q-learning, Fuzzy Logic (FL), Reinforcement Learning (RL), and Bio-Inspired Algorithms (BIA). We summarize their optimization effects on key performance indicators. Following this analysis, we discuss critical technical challenges from multiple perspectives: algorithmic deployment, communication security, network resource management, routing scalability, and cross-layer optimization. Finally, we conclude by envisioning future research trajectories, including cross-layer collaborative routing, edge AI deployment, multi-objective specialization for unmanned drone networks, and the potential of Graph Neural Networks (GNNs).

FANET Architectures and Foundational Routing Challenges

Network Organizational Structures

FANETs organize unmanned drones into decentralized structures where communication occurs via single or multi-hop links. A critical element in many architectures is the “backbone” unmanned drone, which serves as a gateway to ground infrastructure or the internet. This approach alleviates bandwidth pressure on base stations, enabling support for larger unmanned drone swarms. As swarm size increases, single-backbone architectures face bottlenecks, necessitating more sophisticated hierarchical designs. Consequently, FANET structures can be categorized into flat and clustered architectures.

1. Flat Architecture: In this simple structure, one or a few designated backbone unmanned drones handle all communication with the ground station. All other unmanned drones are peers, communicating only with each other or through the backbone node. This architecture offers robustness, simplicity, and lower hardware requirements for non-backbone nodes, making it suitable for cost-effective, small to medium-scale unmanned drone deployments (typically under 40-50 nodes) for tasks like localized reconnaissance or environmental monitoring.

2. Clustered Architecture: To manage scalability, the network is partitioned into clusters. This can be single-layer or multi-layer.

Single-Layer Clustering: Each cluster elects a Cluster Head (CH) that acts as a local backbone. CHs communicate directly with the base station, but inter-cluster communication must be routed through it, which can increase latency and load on the base station.
Multi-Layer Clustering: This hierarchical extension allows CHs from different clusters to communicate directly, forming a higher-level network. A top-tier CH then connects to the base station. This significantly enhances scalability for very large unmanned drone swarms (theoretically supporting hundreds of nodes) for applications like wide-area coverage or emergency communication, albeit at the cost of increased system complexity and potential routing path length.

The choice between flat and clustered architectures depends on specific mission parameters: swarm size, node density, latency tolerance, and task duration.

Core Routing Challenges in FANETs

While sharing the ad-hoc principle with MANETs and VANETs, FANETs introduce distinct challenges that preclude the direct adoption of protocols from these domains:

High and 3D Mobility: Unmanned drones operate in 3D space at speeds that can exceed 100 km/h (and up to ~460 km/h for certain platforms), far surpassing typical MANET or road-constrained VANET node speeds. This extreme agility directly impacts link stability and end-to-end delay.
Highly Dynamic Topology: The combination of high speed and 3D movement leads to frequent and unpredictable topological changes. Links are established and broken rapidly, and nodes may join or leave the swarm dynamically. This volatility poses a severe challenge to routing stability and necessitates constant route maintenance.
Stringent Latency Constraints: Many unmanned drone swarm applications, such as collision avoidance, coordinated maneuvering, and real-time data fusion for disaster response, demand ultra-low latency communication. Delays can lead to mission failure or catastrophic collisions.
Severe Energy Limitations: Unlike many MANET/VANET devices, unmanned drones are primarily powered by onboard batteries, with a significant portion of energy dedicated to propulsion. The energy available for communication and computation is severely limited, making energy-efficient routing and minimal control overhead paramount for extended mission endurance.

These challenges collectively demand routing solutions that are adaptive to a highly dynamic environment and capable of multi-objective optimization. Traditional protocols fall short, creating a compelling need for AI-driven approaches.

A Dual-Dimensional Taxonomy of AI-Driven FANET Routing Methods

To systematically review the landscape, we propose a taxonomy based on two primary dimensions: the underlying routing architecture and the core AI algorithm employed. The architectural dimension divides methods into Topology-Based Routing and Cluster-Based Routing. Within each category, specific AI families have found predominant application, as visualized in the classification tree and detailed in the following sections.

AI in Topology-Based Routing for Unmanned Drone Swarms

Topology-based protocols use network addresses (e.g., IP) to identify nodes and make forwarding decisions based on the perceived network graph. The primary challenge here is making intelligent, adaptive next-hop decisions amidst rapid topology changes. Reinforcement Learning (RL), particularly Q-learning, has emerged as the cornerstone AI technique for this problem due to its model-free nature and ability to learn optimal policies through interaction with the dynamic FANET environment.

Foundational Q-Learning for Routing

In the context of an unmanned drone swarm, routing can be modeled as a Markov Decision Process (MDP). Each data packet (or the forwarding node itself) acts as an agent. At a given state $s_i$ (e.g., the current unmanned drone node), the agent takes an action $a_i$ (selecting a neighboring unmanned drone as the next hop). The environment (the network) then transitions to a new state $s_{i+1}$ and provides a reward $r_i$. The agent’s goal is to learn a policy that maximizes the cumulative discounted reward, represented by the Q-value. The standard Q-learning update rule, central to many routing protocols, is given by:

$$
Q(s_i, a_i) \leftarrow Q(s_i, a_i) + \alpha \left[ r_i + \gamma \max_{a’} Q(s_{i+1}, a’) – Q(s_i, a_i) \right]
$$

where $\alpha$ is the learning rate and $\gamma$ is the discount factor. Each unmanned drone maintains a Q-table that estimates the long-term utility of taking action $a$ (choosing a neighbor) from state $s$ (its own ID).

State Space Optimized Q-Learning

A key issue in applying Q-learning to large unmanned drone swarms is the potentially large state-action space, which can slow convergence and increase decision latency. Recent research focuses on intelligently filtering this space. For instance, some protocols use metrics like geographical proximity, link expiration time, or residual energy to pre-filter candidate next-hop neighbors before applying the Q-learning decision, thereby reducing computational overhead and improving route discovery speed.

Multi-Objective and Advanced Q-Learning Variants

Early Q-learning routing protocols often optimized for a single metric (e.g., hop count). Subsequent work has evolved to incorporate multiple objectives into the reward function $r_i$. Reward functions now commonly combine terms for delay, link quality, node energy, and traffic load. To address the overestimation bias inherent in standard Q-learning, Double Q-learning and its variants have been adopted. These methods maintain two separate Q-value estimators to decouple action selection from value estimation, leading to more stable and reliable learning in the complex environment of an unmanned drone network.

Integration with Fuzzy Logic (FL)

FL is highly effective at handling imprecise and uncertain information, such as “good” link quality or “high” node mobility. Hybrid approaches integrate FL with Q-learning, where FL is used to fuzzify inputs (e.g., signal strength, neighbor density) and compute composite metrics that feed into the Q-learning reward function or state representation. This synergy allows the routing protocol to make robust decisions based on linguistic variables that are natural for the dynamic unmanned drone environment.

Deep Reinforcement Learning (DRL) and Multi-Agent Systems

For more complex scenarios, DRL uses deep neural networks to approximate the Q-function (Deep Q-Networks – DQN) or policy, enabling handling of high-dimensional state spaces. Multi-Agent RL (MARL) frameworks model each unmanned drone as an independent agent learning a cooperative routing policy. Agents learn to anticipate the actions of others to avoid congestion and optimize global network performance, which is crucial for collaborative tasks in an unmanned drone swarm.

The following table summarizes representative AI algorithms in topology-based routing and their reported performance improvements.

Routing Method	Core AI Algorithm	Key Optimized Metrics	Primary Contribution
TARRAQ	Adaptive Q-learning	Reliability, Delay, Overhead	Adaptive topology sensing interval using queueing theory.
QFAN / QRF	Q-learning with State Filtering	Delay, Overhead, Energy	Intelligent filtering of state space to improve efficiency.
DQMR / 2k-adaDQL	Double / Stochastic Double Q-learning	Reliability, Delay	Reduces overestimation bias; improves convergence and stability.
QFRP	Q-learning + Fuzzy Logic	Delay, Packet Delivery, Energy	FL enhances exploration and handles uncertainty in metrics.
MARL-based Routing	Multi-Agent Reinforcement Learning	Throughput, Delay	Global network optimization through cooperative agent learning.
DRL-MLsA	Deep Reinforcement Learning	Link Stability, Overhead	Dynamically adjusts control message intervals for link maintenance.

AI in Cluster-Based Routing for Unmanned Drone Swarms

As the scale of the unmanned drone swarm increases, flat architectures become unscalable. Cluster-based routing introduces hierarchy, improving manageability and reducing broadcast overhead. The core problems here are Cluster Head (CH) election and cluster formation. AI techniques are extensively used to solve these optimization problems.

Bio-Inspired Algorithms (BIA) for CH Optimization

BIAs, which mimic natural processes like foraging, flocking, or hunting, are exceptionally well-suited for the distributed, self-organizing nature of cluster formation in unmanned drone swarms. The CH election problem is formulated as an optimization problem where a fitness function—incorporating factors like node centrality, residual energy, mobility, and neighbor count—must be maximized or minimized. Algorithms such as Whale Optimization Algorithm (WOA), Moth-Flame Optimization (MFO), Grey Wolf Optimizer (GWO), and Harris Hawks Optimization (HHO) are employed to iteratively search for the optimal set of CHs. A growing trend is the use of hybrid BIAs (e.g., combining two algorithms) to overcome limitations like premature convergence and to achieve a better balance between exploration and exploitation in the search space.

K-Means and Unsupervised Clustering

Geographic proximity is a fundamental principle for cluster formation. The K-Means algorithm and its variants provide a straightforward, computationally lightweight method to partition unmanned drones into clusters based on their spatial coordinates. The number of clusters (K) may be predetermined or estimated. While simple, K-Means can serve as an efficient preprocessing step or form the basis of a clustering protocol, especially when integrated with a second step for CH selection based on other metrics like energy. Its main drawback is the need to specify K and less adaptability to highly dynamic movements compared to learning-based methods.

Learning-Based Clustering Decisions

RL and DL approaches can learn adaptive clustering policies. Q-learning can be used where the action is to declare oneself a CH or join a specific cluster, with rewards promoting stable, balanced, and energy-efficient clusters. Deep learning models, particularly when combined with other technologies like blockchain for security, can be trained to identify optimal CHs or detect malicious nodes attempting to become CHs. These methods offer high adaptability but often require more computational resources for training and inference.

The table below highlights key AI-driven approaches in cluster-based routing for unmanned drone networks.

Routing Method	Core AI Algorithm	Key Optimized Metrics	Primary Contribution
EECP-MFO / ICW	MFO / WOA (BIA)	Energy, Cluster Stability, Lifetime	Uses bio-inspired optimization to select energy-efficient and stable CHs.
DCFH / HMGOC	FHO / Hybrid MGO-JAYA (BIA)	Load Balancing, Lifetime, Overhead	Employs advanced or hybrid BIAs for robust and balanced clustering.
IKAD	K-Means + Density Peaking Clustering	Reliability, Energy, Overhead	Combines simple K-Means with a density-based method for refined CH selection.
QSCR / RLC	Q-learning / Reinforcement Learning	Cluster Stability, Energy, Delay	Uses RL to dynamically form and maintain clusters based on real-time network states.
DLSMR	Deep Learning	Security, Multicast Efficiency	Uses DL for secure cluster formation and routing, mitigating wormhole attacks.

Technical Challenges in Deploying AI for Unmanned Drone Swarm Routing

Despite promising results in simulations, the practical deployment of AI-enhanced routing in real unmanned drone swarms faces significant hurdles.

Algorithmic Deployment and Computational Overhead: There is a stark trade-off between algorithm sophistication and the limited computational resources (CPU, memory) on an unmanned drone. Lightweight algorithms like table-based Q-learning and FL are more deployable. In contrast, complex DRL models or large-population BIAs may require computational offloading or specialized hardware, challenging real-time operation on small unmanned drones.
Communication Security and Robustness: Most AI routing research focuses on performance, often overlooking security. Unmanned drone FANETs are vulnerable to unique attacks (e.g., jamming, spoofing, wormhole, sleep deprivation). AI itself can be used for intrusion detection, but designing routing protocols inherently resilient to malicious nodes within the learning process remains a critical challenge. The integration of AI with security primitives like blockchain and federated learning is an emerging but complex area.
Network Resource and Traffic Management: AI routing decisions must consider overall network health beyond a single path. Issues like load balancing, congestion avoidance, and fair bandwidth allocation are essential for large-scale unmanned drone swarms. Most protocols are evaluated in moderate traffic scenarios; their behavior under heavy, heterogeneous data flows (e.g., from high-resolution sensors on multiple unmanned drones) requires further investigation. Cross-layer integration with traffic engineering is necessary.
Routing Scalability: While cluster-based methods aim for scalability, the AI algorithms themselves must scale efficiently with the number of unmanned drones. The convergence time and message complexity of BIAs or MARL can increase substantially with swarm size. Designing AI protocols that maintain low overhead and quick adaptation in ultra-large-scale unmanned drone networks (100+ nodes) is a non-trivial challenge.
Cross-Layer Joint Optimization: Current AI routing largely operates at the network layer, using simplified abstractions of lower layers (e.g., binary link status). True optimization requires a cross-layer approach. AI models need to incorporate and adapt to physical-layer conditions (e.g., rapid channel fading, Doppler shift, interference) and MAC-layer dynamics (contention, scheduling) in real-time. Developing such tightly integrated, cross-layer AI optimization frameworks is highly complex but essential for peak performance.

Future Trends and Research Directions

The evolution of AI-driven routing for unmanned drone swarms is poised to follow several key trajectories:

Cross-Layer Collaborative AI Frameworks: Future protocols will tightly couple AI decision-making across the protocol stack. A unified AI agent or a coordinated set of agents will jointly optimize physical layer parameters (power, modulation), MAC layer scheduling, and network layer routing based on a holistic view of the unmanned drone network state, dramatically improving resource efficiency and QoS.
Edge AI and Model Lightweighting: To enable deployment on resource-constrained unmanned drones, there will be a strong push towards ultra-efficient AI models. Techniques like model pruning, quantization, distillation, and the use of tiny neural networks will be crucial. Furthermore, edge computing paradigms, where heavier models run on ground stations or powerful leader unmanned drones, will coordinate with lighter models on follower drones.
Mission-Specific and Multi-Objective Specialization: Rather than seeking a universal protocol, future research will focus on tailoring AI routing solutions for specific unmanned drone mission profiles (e.g., search-and-rescue, precision agriculture, persistent surveillance). The reward functions and state representations will be carefully crafted to optimize the particular trade-offs (e.g., coverage vs. latency, data fidelity vs. energy) demanded by that mission.
Graph Neural Networks (GNNs) for Topology Reasoning: GNNs are a natural fit for FANETs, which are inherently graph-structured. GNNs can explicitly model the topological relationships between unmanned drones, learning to predict link stability, network connectivity, and optimal paths directly from the graph structure. Combining GNNs with RL (Graph-based RL) holds immense potential for learning highly efficient and context-aware routing policies that generalize across different network topologies.

Conclusion

The integration of Artificial Intelligence into routing protocols for Flying Ad-Hoc Networks represents a transformative approach to managing the inherent complexities of unmanned drone swarm communication. By moving beyond static, rule-based protocols, AI enables adaptive, resilient, and multi-objectively optimized data forwarding in highly dynamic 3D environments. This review has structured the field through a dual-dimensional taxonomy, examining how algorithms like Q-learning, Fuzzy Logic, Bio-Inspired Optimization, and Deep Reinforcement Learning are applied to both topology-based and cluster-based routing architectures. While these methods show superior performance in simulations across metrics like delivery ratio, end-to-end delay, and network lifetime, significant challenges remain in the realms of practical deployment, security, scalability, and cross-layer design. The future of unmanned drone swarm networking lies in overcoming these hurdles through the development of collaborative cross-layer AI, edge-deployable lightweight models, mission-specialized solutions, and advanced graph-based learning techniques. As the low-altitude economy expands, continued innovation in AI-powered FANET routing will be fundamental to unlocking the full potential of autonomous, collaborative unmanned drone swarms.