Research Progress on Key Technologies of Hierarchical Cooperation for Low-Altitude Logistics UAV Drones

The hierarchical cooperative technology for low-altitude logistics UAV drones represents a core paradigm for overcoming the bottlenecks in logistics distribution. Its essence lies in the hierarchical decoupling and cross-layer coordination of a three-tier architecture: “collaborative task allocation → trajectory planning → dynamic adjustment.” In recent years, to break through distribution bottlenecks, academia has focused on the iterative upgrade of scenario demands and core pain points, driving a paradigm shift in each task layer from “single-task optimization” to “collaborative optimization exploration” and finally to “large-scale hierarchical coordination.” Each generation of technology addresses the core contradictions unresolved by the previous generation, ultimately forming a spiral upward process of “demand upgrade → bottleneck emergence → technological response.” The evolution of logistics UAV drone delivery technology has gradually enabled a breakthrough in delivery scale from single-UAV drone to multi-UAV drone to swarm operations, with each stage accompanied by breakthroughs in core models and the deepening of scenario requirements.

1. Analysis of the Paradigm Shift in Hierarchical Framework

The evolution of logistics UAV drone hierarchical cooperation can be divided into distinct stages, as summarized in Table 1, highlighting the shifting focus from simple pathfinding to complex, coordinated operations.

Table 1. Analysis of Algorithm Comparisons and Paradigm Shifts in the Evolutionary Stages of Logistics UAV Drones
Evolution Stage	Complexity Level	Applicable Scenarios	Core Optimization Objective	Representative Algorithms/Models
Single-Task Optimization	Polynomial	Point-to-point, single UAV drone, low frequency	Shortest path, minimal energy for a single delivery	Dijkstra, A*, basic heuristic algorithms
Collaborative Inquiry	Sub-exponential to Exponential	Multi-node delivery within a district, heterogeneous UAV drone fleets	Matching heterogeneous models, coordinated path planning for multiple UAV drones	Improved A*+TSP, GWO+Bayes, MAHACO, ES-mGA
Hierarchical Synchronization	Polynomial to Sub-exponential	Large-scale clusters, highly dynamic and complex urban environments	System-wide optimization via decoupled layers (allocation, planning, replanning), balancing global and local search	C-SPPO, POMCP-GO, MSGOA, EMSDBO, MONRL+LSTM

The core framework for hierarchical cooperation in logistics UAV drones is illustrated logically in Figure 1, comprising three main layers.

Collaborative Task Allocation Layer: This layer focuses on order data, efficiency, and priority-based objective functions to cluster orders and match them optimally to the available UAV drone fleet.
Collaborative Trajectory Planning Layer: Based on geographic information and operational constraints (e.g., no-fly zones, obstacles), this layer generates safe and efficient three-dimensional flight paths for each UAV drone.
Dynamic Trajectory Replanning Layer: Utilizing onboard sensors for environmental perception, this layer enables real-time adjustment of flight paths in response to unexpected obstacles, weather changes, or new task instructions.

The interaction between these layers is iterative. The task allocation layer issues instructions. The planning layer receives these and generates initial flight paths. During execution, if the replanning layer encounters local obstacles, it sends adjusted trajectory requirements back to the planning layer. If a delivery order itself changes, this feedback is sent back to the task allocation layer for a possible re-allocation.

2. Collaborative Task Allocation Layer

Collaborative task allocation is the core decision-making环节 for efficient resource scheduling and task execution in UAV drone logistics systems. Its technological evolution revolves around the main thread of progressing from “static constraint solving” to “dynamic intelligent adaptation.”

2.1 Task Allocation Based on Optimization Models

Optimization models form the technical foundation for task allocation. Their evolution stems from the continuous博弈 between “allocation accuracy” and “scenario complexity.”

2.1.1 Exact Optimization Models

Targeting early scenarios characterized by “single batch, few constraints, small scale,” exact models like Integer Programming and Dynamic Programming ensured optimal allocation. For instance, to manage computational complexity in multi-UAV drone settings, a Partial Differential Equation (PDE)-based collision-free trajectory method was introduced. The core formulation often involves a Mixed-Integer Linear Programming (MILP) model to minimize total cost or time:

$$ \min \sum_{i \in \mathcal{T}} \sum_{j \in \mathcal{V}} c_{ij} x_{ij} $$
subject to:
$$ \sum_{j \in \mathcal{V}} x_{ij} = 1 \quad \forall i \in \mathcal{T} $$
$$ \sum_{i \in \mathcal{T}} w_i x_{ij} \leq W_j \quad \forall j \in \mathcal{V} $$
$$ x_{ij} \in \{0,1\} $$
where $\mathcal{T}$ is the set of tasks, $\mathcal{V}$ is the set of UAV drones, $c_{ij}$ is the cost for UAV drone $j$ to perform task $i$, $x_{ij}$ is the binary decision variable, $w_i$ is the task weight/demand, and $W_j$ is the capacity of UAV drone $j$.

However, as scenarios shifted to high-density urban environments, the computational burden of seeking exact optimal solutions became prohibitive for real-time调度.

2.1.2 Heuristic and Meta-heuristic Algorithms

To overcome the efficiency bottleneck of exact models, heuristic and meta-heuristic algorithms emerged with the core advantage of “near-optimal + fast convergence.” Their evolution progressed from single-algorithm optimization to multi-algorithm fusion and multi-objective synergy. For example, an improved Genetic Algorithm (GA) might integrate greedy strategies to solve resource coupling and deadlock problems. A common meta-heuristic approach involves iterative improvement of a solution population $P$:

Initialize population $P$
While termination condition not met:
   Evaluate fitness $f(s)$ for each solution $s \in P$
   Select parents from $P$ based on fitness
   Apply crossover and mutation to create offspring $O$
   $P \leftarrow$ Select survivors from $P \cup O$

Multi-objective versions, like the Multi-objective Cat Swarm Optimization based on a Two-Archive mechanism (MOCSO_TA), aim to find a Pareto-optimal front balancing conflicting objectives like cost, time, and resource utilization.

2.2 Task Allocation Based on Market Mechanisms

While optimization models achieve technical “task-resource” matching, they often fail to balance the dynamic interests of multiple autonomous agents. Market mechanisms address this gap.

Auction and Bidding Mechanisms: In environments with unstable network connectivity, a “bidirectional selection negotiation mechanism” can be used. Each UAV drone $j$ bids on a task $i$ based on its perceived cost $b_{ij}$. The task is awarded to the UAV drone with the most favorable bid (e.g., lowest cost), fostering公平 and efficient matching in dynamic networks.

Game Theory and Coalition Optimization: These methods mathematically model conflicts and cooperation among multiple agents. For instance, a Compromised Dynamic Performance Impact (CDPI) model can be used to adapt task allocation when new dynamic tasks appear, balancing individual UAV drone utility with the global system performance, preventing resource wastage.

2.3 Task Allocation Based on Swarm Intelligence and Learning

The static nature of optimization models and the complexity limits of market mechanisms pushed task allocation towards “dynamic intelligent adaptation.”

Swarm Intelligence Algorithms: Inspired by biological群体 behavior, algorithms like Particle Swarm Optimization (PSO) or improved Ripple-Spreading Algorithms (RSA) handle complex coupling through local interactions. The velocity update in a Binary Hybrid PSO (BHPSO) for task assignment might be:

$$ v_{jd}^{t+1} = w v_{jd}^{t} + c_1 r_1 (pbest_{jd} – x_{jd}^t) + c_2 r_2 (gbest_d – x_{jd}^t) $$
where $x_{jd}^t$ represents the position (task assignment decision) of particle UAV drone $j$ in dimension $d$ at iteration $t$, and $pbest$/$gbest$ are personal and global best positions.

Reinforcement Learning (RL) for Dynamic Allocation: RL frames allocation as a Markov Decision Process (MDP). A UAV drone agent interacts with the environment (state $s_t$), takes an action $a_t$ (e.g., selecting a task), receives a reward $r_t$, and transitions to a new state $s_{t+1}$. The goal is to learn a policy $\pi(a|s)$ that maximizes the cumulative reward. Deep RL algorithms like Deep Deterministic Policy Gradient (DDPG) are suited for continuous action spaces (e.g., precise trajectory coordinates), while Multi-Agent RL (MARL) frameworks like those using a centralized critic enable协调 among multiple UAV drones, effectively resolving action conflicts and scaling to large-scale scenarios.

Table 2. Summary of the Collaborative Task Allocation Layer
Technical Approach	Core Principle	Strengths	Limitations / Evolution Driver
Exact Optimization	Mathematical guarantee of optimal solution under clear constraints.	High precision for well-defined, small-scale problems.	Exponential time complexity; inefficient for large-scale, real-time scenarios.
Heuristic/Meta-heuristic	Approximate optimal solutions through guided search.	Fast convergence, handles multiple objectives and larger scales.	Risk of local optima; led to multi-algorithm fusion for robustness.
Auction/Bidding	Market-based matching via cost/utility bids.	Fair, efficient in unstable networks, good for simple dynamic匹配.	May waste system resources if only individual utility is maximized.
Game Theory/Coalition	Balances individual agent利益 with system-wide performance.	Solves complex multi-agent conflicts, promotes stable cooperation.	Modeling complexity; may not adapt well to highly uncertain environments.
Swarm Intelligence	Decentralized coordination模仿 biological群体 behavior.	Robust, handles complex coupling, good population diversity.	Passive adaptation; weak in highly dynamic, unseen scenarios.
Reinforcement Learning	Learns optimal policy through trial-and-error interaction.	Excellent dynamic adaptation, handles high-dimensional state spaces, active learning.	Requires extensive training, convergence can be slow in large MARL settings.

3. Collaborative Trajectory Planning Layer

This layer is responsible for generating executable flight paths. Its evolution centers on progressing from “static path generation” to “dynamic collaborative optimization.”

3.1 Trajectory Planning Based on Precise Optimization

Early work focused on ensuring collision-free, shortest paths in static environments for single or a few UAV drones.

Precise Algorithms: Algorithms like A* search were extended to 3D, using a cost function $f(n) = g(n) + h(n)$, where $g(n)$ is the cost from the start node to node $n$, and $h(n)$ is a heuristic estimate to the goal. Probabilistic Roadmaps (PRM) were used to sample the configuration space and build a graph for feasible path search, reducing node count.

Multi-Constraint Planning: As scenarios became more complex, planning had to satisfy multiple constraints like flight time, smoothness, and threat avoidance simultaneously. This is often formulated as a Multi-Objective Optimization Problem (MOOP):

$$ \min_{\mathbf{p}} \quad [f_1(\mathbf{p}), f_2(\mathbf{p}), …, f_k(\mathbf{p})]^T $$
subject to:
$$ g_j(\mathbf{p}) \leq 0, \quad j=1,…,m $$
where $\mathbf{p}$ is the path variable, $f_i$ are objective functions (e.g., length $f_L$, threat cost $f_T$, energy $f_E$), and $g_j$ are constraints. Multi-objective evolutionary algorithms (MOEAs) like NSGA-II are commonly used to find the Pareto front of non-dominated solutions.

3.2 Trajectory Planning Based on Swarm Intelligence

To address the limitations of centralized planning for大规模, dynamic UAV drone swarms, decentralized swarm intelligence algorithms gained prominence. These algorithms, inspired by flocks, schools, or colonies, enable robust, scalable planning. For example, an improved Whale Optimization Algorithm (WOA) might simulate the bubble-net feeding behavior for local search, combined with random search for exploration. The position update for a “whale” (representing a candidate path solution) can be modeled as:

$$ \vec{X}(t+1) = \begin{cases}
\vec{X}^*(t) – \vec{A} \cdot \vec{D} & \text{if } p < 0.5 \\
\vec{D}’ \cdot e^{bl} \cdot \cos(2\pi l) + \vec{X}^*(t) & \text{if } p \geq 0.5
\end{cases} $$
where $\vec{X}^*$ is the current best solution, $\vec{A}$, $\vec{D}$ are coefficient vectors, $p$ and $l$ are random numbers, and $b$ is a constant. Hybrid algorithms, like MAHACO which融合 Ant Colony Optimization (ACO) with Differential Evolution (DE), further enhance search efficiency and path quality in 3D environments.

3.3 Trajectory Planning Based on Intelligent Fusion

Swarm intelligence, while decentralized, often lacks strong dynamic perception and struggles with highly uncertain environments. This led to the development of intelligent fusion frameworks combining learning and planning.

Reinforcement Learning for Dynamic Planning: RL agents can learn to navigate complex dynamic environments directly from sensor inputs. A common framework is the Actor-Critic, where the Actor network $\pi_\theta(a|s)$ selects actions (e.g., heading, velocity changes), and the Critic network $Q_\phi(s,a)$ estimates the value of those actions. The policy is updated to maximize the expected cumulative reward $J(\theta) = \mathbb{E}_{\tau \sim \pi_\theta}[\sum_t \gamma^t r_t]$. For multi-UAV drone scenarios, Multi-Agent RL (MARL) with a centralized training / decentralized execution paradigm is effective. An example is optimizing the parameters of a Rotational Artificial Potential Field (RAPF) using MARL, where the reward function might combine terms for goal reaching, collision avoidance, and energy efficiency: $r_t = r_{goal} + \alpha r_{collision} + \beta r_{energy}$.

Hierarchical Hybrid-Driven Planning: This approach decomposes the complex planning problem. A high-level planner (e.g., using a Temporal Graph Convolution Network, T-GCN, for trajectory prediction) sets coarse waypoints, while a low-level planner (e.g., using a modified Sparrow Search Algorithm, LASSA, with logarithmic spiral策略) performs fine-grained, real-time adjustment satisfying动力学 constraints like maximum curvature $\kappa_{max}$:

$$ \kappa(\mathbf{p}(t)) = \frac{|\dot{x}\ddot{y} – \dot{y}\ddot{x}|}{(\dot{x}^2+\dot{y}^2)^{3/2}} \leq \kappa_{max} $$

Table 3. Summary of the Collaborative Trajectory Planning Layer
Technical Stage	Core Methodology	Typical Scenario	Key Contribution / Solved Pain Point
Precise Optimization	Graph search (A*, PRM), Mathematical programming.	Static/small-scale, known environment.	Guarantees collision-free, optimal/short paths under clear constraints.
Multi-Constraint Planning	Multi-Objective Evolutionary Algorithms (MOEAs).	Heterogeneous UAV drones, multiple competing objectives.	Balances convergence and solution diversity; handles smoothness, threat, energy trade-offs.
Swarm Intelligence	PSO, GWO, ACO, and their hybrids.	Suburban, semi-dynamic, medium-to-large swarms.	Decentralized coordination, reduces无效 search, scalable planning.
RL-based Planning	Deep RL, MARL (e.g., PPO, DDPG, MADDPG).	Large-scale, highly dynamic, position-uncertain environments.	Active adaptation, learns complex policies from interaction, improves robustness.
Hierarchical Hybrid	High-level (coarse) + Low-level (fine) planners fusion.	Full dynamic distribution for large-scale UAV drone clusters.	Balances global exploration with local optimization; integrates prediction and execution.

4. Dynamic Trajectory Replanning Layer

This layer ensures real-time operational safety and efficiency by adjusting plans in response to unforeseen changes.

4.1 Dynamic Replanning Based on Search and Traversal

These algorithms quickly find alternative paths when the original plan is blocked.

Deterministic Graph Search: Algorithms like D* Lite or focused D* dynamically update path costs on a graph as new obstacle information is received, allowing efficient re-planning from the current state.

Random Sampling (Hierarchical Search): The Rapidly-exploring Random Tree (RRT) algorithm and its variants (like RRT-Connect) are highly effective in high-dimensional spaces. They incrementally build a search tree by randomly sampling the state space and connecting samples to the nearest tree node. The process can be summarized as:
1. $T$.init(start)
2. **for** $i=1$ to $N$ **do**
3.    $q_{rand} \leftarrow$ RandomSample()
4.    $q_{near} \leftarrow$ NearestNeighbor($T$, $q_{rand}$)
5.    $q_{new} \leftarrow$ Steer($q_{near}$, $q_{rand}$)
6.    **if** CollisionFree($q_{near}$, $q_{new}$) **then**
7.       $T$.addVertex($q_{new}$); $T$.addEdge($q_{near}$, $q_{new}$)
8.       **if** $q_{new}$ == goal **then** return Path($T$)
9. **return** Failure
Hybrid methods like PSO-RRT combine the exploratory power of RRT with the optimizing ability of PSO to smooth the resulting path.

4.2 Dynamic Replanning Based on Physical Modeling

These methods incorporate the UAV drone‘s physical dynamics for locally smooth and feasible adjustments.

Artificial Potential Field (APF): The UAV drone moves under the influence of an attractive potential $U_{att}(\mathbf{q})$ towards the goal and repulsive potentials $U_{rep,i}(\mathbf{q})$ from obstacles. The total force is the negative gradient: $\mathbf{F}_{total}(\mathbf{q}) = -\nabla U_{att}(\mathbf{q}) – \sum_i \nabla U_{rep,i}(\mathbf{q})$. The UAV drone‘s motion is governed by $\dot{\mathbf{q}} \propto \mathbf{F}_{total}$. Improved APF methods address local minima issues by modifying the potential functions or adding rotational components.

Model Predictive Control (MPC): This is a powerful framework for dynamic replanning under constraints. At each time step $t$, it solves a finite-horizon optimal control problem online:
$$ \min_{\mathbf{u}_{t:t+N-1}} \sum_{k=t}^{t+N-1} \ell(\mathbf{x}_k, \mathbf{u}_k) + V_f(\mathbf{x}_{t+N}) $$
subject to:
$$ \mathbf{x}_{k+1} = f(\mathbf{x}_k, \mathbf{u}_k), \quad \mathbf{x}_k \in \mathcal{X}, \quad \mathbf{u}_k \in \mathcal{U} $$
where $\mathbf{x}_k$ is the state (position, velocity), $\mathbf{u}_k$ is the control input, $f$ is the UAV drone dynamics model, $\ell$ is the stage cost (e.g., tracking error, control effort), $V_f$ is the terminal cost, and $\mathcal{X}, \mathcal{U}$ are state and input constraints (e.g., obstacle avoidance, actuator limits). Only the first control input is applied before re-solving at $t+1$.

4.3 Dynamic Replanning Based on Intelligent Optimization and Learning

These are state-of-the-art approaches for handling high uncertainty and complex global re-optimization.

Intelligent Optimization: Algorithms like enhanced Dung Beetle Optimization (EMSDRO) are used for global path re-optimization when a significant portion of the path is invalidated. They can optimize high-level waypoints considering新的 obstacle configurations.

Reinforcement and Deep Learning: RL agents can be trained specifically for local obstacle avoidance, reacting in milliseconds to dynamic obstacles. For example, a Soft Actor-Critic (SAC) agent with Hindsight Experience Replay (HER) can learn robust avoidance policies even in sparse reward settings. Deep learning models, particularly Convolutional Neural Networks (CNNs) or Graph Neural Networks (GNNs), can be used to predict collision risks or directly output steering commands from raw sensor data (LiDAR, camera), enabling extremely fast reactive replanning.

Table 4. Summary of the Dynamic Trajectory Replanning Layer
Technical Approach	Core Mechanism	Response Time	Best For
*Graph Search (D, A)*	Recomputes optimal path on an updated cost map.	Medium	Global re-routing when map changes are significant but infrequent.
*Random Sampling (RRT)**	Rapidly explores state space to find a new feasible path.	Fast	High-dimensional spaces, cluttered environments, real-time obstacle discovery.
Artificial Potential Field (APF)	Local force-based reactive control.	Very Fast	Immediate, smooth local obstacle avoidance; simple dynamic environments.
Model Predictive Control (MPC)	Online求解 of a constrained finite-horizon optimization.	Medium to Fast (depends on model complexity)	Dynamic replanning with explicit handling of动力学 and input constraints.
Learning-based (RL/DL)	Direct policy or command output from perception data.	Very Fast (after training)	Complex, unpredictable environments; end-to-end perception-to-action pipeline.

5. Core Challenges and Future Outlook

5.1 Core Algorithmic Challenges

Despite significant progress, key challenges remain for the widespread application of hierarchical cooperative UAV drone logistics systems.

Strong Coupling of Multiple Constraints: Real-world constraints like battery energy $E$, payload $L$, time windows $T$, and airspace regulations are deeply intertwined. For instance, energy consumption is a nonlinear function of payload, speed, and wind: $E \approx \int (c_1 \cdot v^3 + c_2 \cdot \frac{L}{v} + c_3) dt$. Decoupling these in分层 optimization often leads to suboptimal or infeasible global solutions.
Insufficient Dynamic Environment Adaptability: Many algorithms struggle with the “reality gap” between simulation and real-world unpredictability (e.g., sudden gusts, moving obstacles like birds, transient communication dropouts). The trade-off between computational speed for real-time response and the optimality/reliability of the replanned path is difficult to balance.
Low Efficiency in Large-Scale Swarm Coordination: Centralized algorithms face exponential complexity growth $O(n^k)$. Distributed approaches suffer from communication overhead and consensus delays. Efficiently managing hundreds of UAV drones in shared urban airspace, avoiding mutual interference while meeting individual mission goals, is a major scalability challenge.
Disconnection from Practical Deployment Scenarios: Algorithms often assume perfect localization, communication, and known dynamics. In practice, UAV drone failures, GPS-denied environments, battery degradation, and variable payloads degrade performance. Robustness to these real-world imperfections is often lacking.

5.2 Future Innovation Directions

Future research should focus on integrative and practical advancements to transition UAV drone logistics from pilots to large-scale commercialization.

Cross-layer Co-design and Optimization: Moving beyond串行分层, future frameworks should feature tight coupling between layers. For example, the task allocation could output not just task-UAV drone assignments but also initial energy-time profiles, which directly inform the trajectory planner’s cost function. End-to-end learning architectures that jointly optimize allocation and continuous control policies are a promising direction.
Scenario-Specific Customization and Digital Twin Integration: Developing specialized algorithms for niche applications (e.g., medical delivery, agricultural surveying) is crucial. Integrating with Digital Twin technology will enable high-fidelity simulation, testing, and continuous optimization of algorithms against a virtual replica of the real-world operating environment, accelerating deployment and improving safety.
Advanced Swarm Intelligence for Large-Scale Operations: Research into bio-inspired emergent coordination mechanisms, possibly combined with lightweight blockchain or consensus protocols for secure and efficient task/空域 negotiation among massive UAV drone swarms, will be key. Investigating hybrid centralized-distributed architectures where a central controller manages airspace flow (like air traffic control) while swarms self-organize for local path planning is another viable path.
Robustness and Resilient Design: Algorithms must be designed with robustness as a first principle. This includes adversarial training for RL agents, robust MPC formulations that account for bounded uncertainties, and graceful degradation strategies where the swarm can reconfigure itself in case of individual UAV drone failures.
Green and Energy-Aware Optimization: With sustainability as a global priority, future algorithms must deeply integrate energy models. This involves planning energy-efficient trajectories considering wind fields, optimizing charging station placement and visitation schedules (Electric UAV drone Routing Problem), and developing load-balancing task allocation to minimize the total energy consumption of the UAV drone fleet.

In conclusion, the hierarchical cooperation technology for low-altitude logistics UAV drones has evolved through distinct paradigms, each addressing the limitations of the last. The current state-of-the-art combines elements of optimization, market mechanisms, swarm intelligence, and machine learning across the allocation, planning, and replanning layers. Overcoming the remaining challenges in multi-constraint coupling, dynamic adaptability, large-scale coordination, and practical robustness will require interdisciplinary innovations. The future lies in tightly integrated, scenario-aware, and resiliently designed systems that can safely and efficiently unlock the full potential of autonomous UAV drone logistics on a massive scale.