The evolution of modern warfare, as evidenced by conflicts from Nagorno-Karabakh to Ukraine, has fundamentally altered the role of UAV drones. They have transitioned from auxiliary assets to indispensable components of the battlespace. Their mission profiles have expanded from simple, pre-programmed tasks to complex, dynamic operations, while their operational mode is rapidly evolving from single-platform “sensor-to-shooter” loops to coordinated multi-agent systems. Confronted with diverse targets, heterogeneous task requirements, and stringent temporal constraints, traditional manual task allocation methods are increasingly inadequate for the demands of swarming warfare. Consequently, developing intelligent task allocation methodologies for UAV drones clusters—capable of satisfying sequential constraints, rationally assigning targets and missions, and scientifically planning flight paths—has become a critical enabler for the intelligent upgrade of unmanned swarm combat systems, holding significant theoretical and practical value.

The problem of task allocation for UAV drones swarms is a well-known NP-hard challenge. Its core difficulties lie in the high uncertainty of the environment, tasks, and platforms, as well as the tight coupling between task assignment and path planning. To address this complex issue, researchers have proposed various solutions, including hierarchical planning architectures, metaheuristic algorithms like improved Ant Colony or Grey Wolf Optimizer adapted for dynamic scenarios, and methods based on cooperative-competitive mechanisms. A common observation from existing literature is that many approaches ultimately decompose the complex mapping between a swarm of UAV drones and multiple targets into a set of “one drone to one target” assignments. However, for swarms of UAV drones equipped to perform composite reconnaissance-strike missions, scenarios involving “one drone to multiple targets” or “multiple drones cooperatively engaging a single target” are prevalent and underexplored.
Therefore, this work addresses this gap by first constructing a comprehensive task allocation model incorporating “target classification — capability constraints — temporal linkage”. Subsequently, an improved hybrid optimization algorithm based on Discrete Particle Swarm Optimization (DPSO) and Genetic Algorithm (GA), referred to as DPSO-GA, is proposed. Finally, comparative simulations against standard DPSO and GA algorithms are conducted to validate the superiority of the proposed method in terms of global exploration capability and convergence speed and accuracy towards optimal solutions.
1. Modeling for Reconnaissance-Strike Integrated UAV Drones Swarm Task Allocation
1.1 Problem Description
Consider a mission area where a swarm of N UAV drones (U = {U1, U2, …, Ui, …, UN}) is tasked to execute reconnaissance and strike missions against M ground targets (T = {T1, T2, …, Tj, …, TM}). Each drone Ui carries a payload resource vector Ri = < Rzci, Rdji >, where Rzci and Rdji denote the number of reconnaissance and strike payloads (e.g., cameras and missiles) carried by Ui, respectively. Targets are categorized into four classes based on operational importance: core, key, important, and regular. The task requirement for target Tj is represented by a vector Tj = < Rzcj, Rdjj >, where Rzcj and Rdjj indicate the required number of reconnaissance and strike actions on Tj. Based on typical kill-chain operations, a strict temporal constraint is enforced: for any target requiring both reconnaissance and strike, the reconnaissance mission must be completed before the strike mission can begin. Leveraging efficient information-sharing mechanisms within the swarm, the reconnaissance and strike tasks on the same target can be performed by different UAV drones to minimize the sensor-to-shooter timeline, as long as the “reconnaissance-first” sequence is obeyed. A central command intelligently schedules the swarm’s resources to devise an optimal task allocation plan.
1.2 Objective Function
The timeliness of mission completion is a direct metric for evaluating the quality of a task allocation scheme. The ideal outcome is for the swarm to finish all assigned tasks in the shortest possible time. Therefore, the objective is to minimize the maximum mission completion time among all UAV drones in the swarm. The objective function F is defined as follows:
$$ \min F = F_1 + \lambda \sum_{g=1}^{4} r_g $$
where F1 is the flight-time cost function, λ is a penalty coefficient, and rg (g=1,2,3,4) are penalty terms for four constraint categories (temporal, task completion, range, and payload). If a constraint is satisfied, rg = 0; otherwise, rg = 1. The flight-time cost for drone Ui is calculated as:
$$ F_1 = \frac{D_{u_i}}{V_u} + \alpha_1 \delta_1 + \alpha_2 \delta_2 $$
Here, Dui is the total flight distance of drone Ui, Vu is its cruising speed, α1 and α2 are the number of reconnaissance and strike tasks it performs, and δ1 and δ2 are the time durations for a single reconnaissance and strike action, respectively.
1.3 Constraint Formulation
The optimization is subject to the following constraints:
1.3.1 Temporal Constraint: For any target Tj requiring both reconnaissance and strike, the start time of its reconnaissance mission must precede the start time of its strike mission.
$$ t^{zc}_{T_j} < t^{dj}_{T_j} $$
The execution times are defined as:
$$ t^{zc}_{T_j} = x^{zc}_{ij} \left( \frac{D^{1 \to j}_{i}}{V_u} + \sum_{i=1}^{N} \sum_{k=1}^{j-1} t^{zc}_{M_{ik}} + \sum_{i=1}^{N} \sum_{k=1}^{j-1} t^{dj}_{M_{ik}} \right) $$
$$ t^{dj}_{T_j} = x^{dj}_{ij} \left( \frac{D^{1 \to j}_{i}}{V_u} + \sum_{i=1}^{N} \sum_{k=1}^{j-1} t^{zc}_{M_{ik}} + \sum_{i=1}^{N} \sum_{k=1}^{j-1} t^{dj}_{M_{ik}} \right) $$
where xzcij and xdjij are binary decision variables (1 if drone Ui performs the respective task on Tj, 0 otherwise), and D1→ji is the path length for Ui to reach the location of task Tj.
1.3.2 Task Completion Constraints:
a) Reconnaissance Task Fulfillment: All required reconnaissance tasks must be assigned.
$$ \sum_{i=1}^{N} \sum_{j=1}^{M} x^{zc}_{ij} = M_{zc} $$
b) Strike Task Fulfillment: The required damage level Hi for each target class must be achieved. Given a single-strike damage probability α, the minimum required number of strikes Cti for a target in class i is derived from:
$$ 1 – (1 – \alpha)^{C^t_i} \ge H_i $$
The assignment must satisfy:
$$ \sum_{i=1}^{N} x^{dj}_{ij} \cdot U^{dj}_{ij} \ge C^t_i \quad \text{(for target class i)} $$
$$ \sum_{i=1}^{N} \sum_{j=1}^{M} x^{dj}_{ij} \cdot U^{dj}_{ij} = M_{dj} $$
where Udjij is the number of strike tasks drone Ui performs on target Tj.
1.3.3 Range Constraint: Each drone’s flight distance cannot exceed its maximum range.
$$ D_{u_i} \le D_{max} $$
1.3.4 Payload Constraint: The number of strike tasks assigned to a drone cannot exceed its carried strike payloads (e.g., missiles). Reconnaissance payloads are assumed reusable.
$$ \sum_{j=1}^{M} x^{dj}_{ij} \cdot U^{dj}_{ij} \le R^{dj}_i $$
2. The Proposed DPSO-GA Hybrid Optimization Algorithm
Task planning for UAV drones swarms presents distinct challenges: 1) Symmetry in homogeneous swarms leads to solution space degeneracy, causing traditional algorithms to get trapped in local optima. 2) Evaluation shifts towards system-wide efficiency and load balancing, requiring multi-objective optimization capability. 3) The need for real-time, dynamic response to changes in the number of UAV drones demands algorithms with good scalability and fast convergence. 4) The presence of diverse and complex constraints necessitates robust constraint-handling mechanisms. To address these, we propose a DPSO-GA hybrid optimizer, combining the global exploration strength of GA with the rapid convergence of DPSO.
The standard PSO update equations for velocity and position are:
$$ v^{t+1}_i = \omega v^t_i + c_1 r_1 (P_i(t) – x^t_i) + c_2 r_2 (P_g(t) – x^t_i) $$
$$ x^{t+1}_i = x^t_i + v^{t+1}_i $$
where ω is inertia weight, c1 and c2 are cognitive and social learning factors, and Pi(t) and Pg(t) are the personal and global best positions.
2.1 Particle Encoding Scheme
To model the complex “UAV-to-Task” mappings, a one-dimensional, two-layer particle encoding is employed. The first layer is the Task Sequence Code, representing an ordered list of all reconnaissance and strike tasks. The second layer is the Executor Code, indicating the index of the UAV drone assigned to each corresponding task in the first layer. This encoding naturally represents “one drone to multiple tasks” and allows multiple drones to be assigned to tasks for the same target.
2.2 Proposed Particle Update Mechanism
Drawing inspiration from PSO’s velocity-position update, we design a discrete update formula comprising three probabilistic operations:
$$ x^{t+1}_i = c_2 \otimes f_3 \left( c_1 \otimes f_2 \left( \omega \otimes f_1(x^t_i), P_i(t) \right), P_g(t) \right) $$
The update is executed sequentially as follows:
$$ E^t_i = \omega \otimes f_1(x^t_i) = \begin{cases} f_1(x^t_i), & \text{if } r < \omega \\ x^t_i, & \text{otherwise} \end{cases} $$
$$ F^t_i = c_1 \otimes f_2(E^t_i, P_i(t)) = \begin{cases} f_2(E^t_i, P_i(t)), & \text{if } r < c_1 \\ E^t_i, & \text{otherwise} \end{cases} $$
$$ x^{t+1}_i = c_2 \otimes f_3(F^t_i, P_g(t)) = \begin{cases} f_3(F^t_i, P_g(t)), & \text{if } r < c_2 \\ F^t_i, & \text{otherwise} \end{cases} $$
Here, f1, f2, f3 represent mutation, learning from personal best, and learning from global best operations, respectively. The symbol ⊗ denotes a conditional application based on a comparison with a random number r ∈ (0,1).
2.2.1 Mutation Operator (f1): With probability ω, a mutation occurs. It has two modes chosen with equal probability: 1) Task Sequence Mutation: Randomly reshuffle the Task Sequence Code while keeping the Executor Code fixed. This facilitates large-scale global search. 2) Executor Mutation: Randomly select and change one value in the Executor Code while keeping the Task Sequence fixed. This enables local fine-tuning.
2.2.2 Learning from Personal Best Operator (f2): With probability c1, a crossover-like operation is performed between the current particle Eti and its personal best Pi(t). A subset of task positions is randomly selected. The corresponding Task-Executor pairs from Eti are copied to the offspring. The remaining task positions are filled in order with the Task-Executor pairs from Pi(t) that were not in the selected subset.
2.2.3 Learning from Global Best Operator (f3): With probability c2, the particle learns from the global best Pg(t). The Task Sequence Code is copied from the current particle Fti. A threshold probability ρ, which increases with iterations, determines which executor assignments from Pg(t) are copied. For each task position, if a random number is below ρ, the executor from Pg(t) is adopted; otherwise, the executor from Fti is retained. This adaptive ρ ensures initial exploration (low ρ, less copying) and later exploitation (high ρ, strong convergence guidance).
$$ \rho = \rho_{min} + (\rho_{max} – \rho_{min}) \frac{t}{T} $$
2.3 Adaptive Parameter Adjustment
2.3.1 Inertia Weight (ω): To balance global exploration and local exploitation, ω is non-linearly adjusted using a cosine function:
$$ \omega = \omega_{min} + (\omega_{max} – \omega_{min}) \times \cos^2\left( \frac{\pi}{2} \times \frac{t}{T} \right) $$
This keeps ω relatively high in early iterations for exploration and allows it to decay rapidly later for convergence.
2.3.2 Learning Factors (c1, c2): The cognitive and social factors are linearly adapted:
$$ c_1 = c_{1_{max}} – (c_{1_{max}} – c_{1_{min}}) \times \frac{t}{T} $$
$$ c_2 = c_{2_{min}} + (c_{2_{max}} – c_{2_{min}}) \times \frac{t}{T} $$
This strategy prioritizes individual particle experience (higher c1) early on and shifts emphasis to swarm knowledge (higher c2) as the search progresses.
2.4 Fitness Evaluation
Each particle (allocation scheme) is decoded to assign tasks to UAV drones. For each drone, its assigned tasks are clustered by target location. A greedy path planning strategy is then used to sequence these task locations, calculating the total mission time (flight + action time). The fitness is the maximum mission time among all UAV drones. Solutions violating constraints are penalized with a large fitness value via the λ coefficient in the objective function, effectively filtering them out.
| Parameter | Description | Value / Range |
|---|---|---|
| Mission Area | Operational Airspace | 300 km × 300 km |
| Number of UAV Drones (N) | Homogeneous reconnaissance-strike platforms | 8 |
| Number of Targets (M) | Ground targets of varying importance | 12 |
| Target Classes | Core, Key, Important, Regular | 2, 3, 3, 4 targets |
| UAV Speed (Vu) | Cruising speed | Constant value |
| Max Drone Range (Dmax) | Operational endurance limit | Predefined distance |
| Single Strike Damage Prob. (α) | Weapon effectiveness | 0.75 |
| Damage Requirements (Hi) | For Core, Key, Important, Regular classes | 0.9, 0.8, 0.6, 0.4 |
| Action Durations (δ1, δ2) | Time for single recon / strike action | 5 min, 3 min |
3. Simulation Experiments and Analysis
3.1 Simulation Setup and Parameters
The scenario involves 8 UAV drones with randomly distributed initial positions. Their payload configurations vary, as shown in simulations. There are 12 randomly located ground targets with their categories and task demands predefined. The parameters for the DPSO-GA algorithm are set as follows.
| Parameter | Symbol | Value |
|---|---|---|
| Population Size | – | 200 |
| Maximum Iterations | T | 1000 |
| Inertia Weight Bounds | ωmin, ωmax | 0.3, 0.95 |
| Cognitive Factor Bounds | c1min, c1max | 0.3, 0.9 |
| Social Factor Bounds | c2min, c2max | 0.3, 0.9 |
| Global Learn Threshold Bounds | ρmin, ρmax | 0.1, 0.9 |
| Penalty Coefficient | λ | 1000 |
3.2 Results and Discussion
Executing the proposed DPSO-GA algorithm yielded a viable and efficient task allocation plan. Analysis of the best solution confirms its validity: 1) The complex mappings were captured, with most UAV drones handling multiple tasks and several targets being engaged cooperatively by multiple drones. 2) All hard constraints, including the critical “reconnaissance-before-strike” sequence, were strictly satisfied. 3) The workload was balanced across the swarm, and the primary objective of minimizing the makespan (longest individual mission time) was effectively achieved.
To rigorously evaluate the performance of the DPSO-GA algorithm, it was compared against the standard Discrete PSO (DPSO) and Genetic Algorithm (GA) over 50 independent simulation runs. The key statistical metrics of the final best fitness values (lower is better) are summarized below.
| Metric | DPSO-GA (Proposed) | Standard DPSO | Standard GA |
|---|---|---|---|
| Average Fitness | 1.399 | 2.797 | 1.566 |
| Maximum Fitness | 1.533 | 3.791 | 2.279 |
| Minimum Fitness | 1.223 | 2.036 | 1.200 |
| Variance | 0.0069 | 0.1618 | 0.0345 |
| Standard Deviation | 0.083 | 0.402 | 0.186 |
| 95% Confidence Interval Width | ~0.047 | ~0.229 | ~0.105 |
The comparative data clearly demonstrates the superiority of the hybrid DPSO-GA approach:
Convergence Accuracy: The proposed algorithm achieved an average fitness value 50.0% lower than DPSO and 10.7% lower than GA, indicating it consistently finds higher-quality solutions.
Stability and Robustness: The variance of DPSO-GA results is 95.7% lower than DPSO and 79.9% lower than GA. Similarly, its standard deviation is drastically reduced by 79.3% and 55.2% compared to DPSO and GA, respectively. This signifies remarkably lower result dispersion and higher run-to-run consistency.
Reliability: The width of the 95% confidence interval for the mean fitness of DPSO-GA is only 20.7% of DPSO’s and 44.8% of GA’s, reflecting a much more precise and reliable performance estimate.
In summary, the DPSO-GA hybrid algorithm successfully achieves a dual optimization of “precision-stability.” It overcomes the poor solution quality and high variability of standard DPSO while also mitigating the performance fluctuations associated with GA. This makes it particularly suitable for solving the complex, constrained task allocation problems inherent to swarms of UAV drones.
4. Conclusion
This paper addressed the challenging problem of task allocation for reconnaissance-strike integrated UAV drones swarms under realistic operational constraints. A detailed mathematical model was formulated, incorporating target prioritization, resource limitations, and crucially, the temporal dependency between reconnaissance and strike actions. To solve this model effectively, a novel hybrid DPSO-GA optimization algorithm was proposed. Its key innovations include a two-layer particle encoding scheme that naturally represents complex multi-drone, multi-task mappings, a discrete update mechanism blending mutation and crossover operators from GA with the guided search of PSO, and adaptive parameter tuning strategies for inertia weight and learning factors.
Simulation results confirm that the algorithm can generate feasible and efficient allocation plans that satisfy all hard constraints. More importantly, comparative statistical analysis against standard DPSO and GA algorithms demonstrates that the DPSO-GA hybrid achieves significantly superior performance in terms of convergence accuracy, solution stability, and operational reliability. It provides a robust and effective methodological reference for solving multi-objective, multi-constraint task planning problems in the context of intelligent UAV drones swarm operations. Future work may involve extending the model to handle dynamic target appearances, heterogeneous drone capabilities, and more sophisticated path planning integration under threat environments.
