A Hybrid DPSO-GA Optimization Algorithm for Task Allocation in UAV Drones Swarms

The evolution of modern warfare, as evidenced by conflicts from Nagorno-Karabakh to Ukraine, has fundamentally altered the role of UAV drones. They have transitioned from auxiliary assets to indispensable components of the battlespace. Their mission profiles have expanded from simple, pre-programmed tasks to complex, dynamic operations, while their operational mode is rapidly evolving from single-platform “sensor-to-shooter” loops to coordinated multi-agent systems. Confronted with diverse targets, heterogeneous task requirements, and stringent temporal constraints, traditional manual task allocation methods are increasingly inadequate for the demands of swarming warfare. Consequently, developing intelligent task allocation methodologies for UAV drones clusters—capable of satisfying sequential constraints, rationally assigning targets and missions, and scientifically planning flight paths—has become a critical enabler for the intelligent upgrade of unmanned swarm combat systems, holding significant theoretical and practical value.

The problem of task allocation for UAV drones swarms is a well-known NP-hard challenge. Its core difficulties lie in the high uncertainty of the environment, tasks, and platforms, as well as the tight coupling between task assignment and path planning. To address this complex issue, researchers have proposed various solutions, including hierarchical planning architectures, metaheuristic algorithms like improved Ant Colony or Grey Wolf Optimizer adapted for dynamic scenarios, and methods based on cooperative-competitive mechanisms. A common observation from existing literature is that many approaches ultimately decompose the complex mapping between a swarm of UAV drones and multiple targets into a set of “one drone to one target” assignments. However, for swarms of UAV drones equipped to perform composite reconnaissance-strike missions, scenarios involving “one drone to multiple targets” or “multiple drones cooperatively engaging a single target” are prevalent and underexplored.

Therefore, this work addresses this gap by first constructing a comprehensive task allocation model incorporating “target classification — capability constraints — temporal linkage”. Subsequently, an improved hybrid optimization algorithm based on Discrete Particle Swarm Optimization (DPSO) and Genetic Algorithm (GA), referred to as DPSO-GA, is proposed. Finally, comparative simulations against standard DPSO and GA algorithms are conducted to validate the superiority of the proposed method in terms of global exploration capability and convergence speed and accuracy towards optimal solutions.

1. Modeling for Reconnaissance-Strike Integrated UAV Drones Swarm Task Allocation

1.1 Problem Description

Consider a mission area where a swarm of N UAV drones (U = {U₁, U₂, …, U_i, …, U_N}) is tasked to execute reconnaissance and strike missions against M ground targets (T = {T₁, T₂, …, T_j, …, T_M}). Each drone U_i carries a payload resource vector R_i = < R^zc_i, R^dj_i >, where R^zc_i and R^dj_i denote the number of reconnaissance and strike payloads (e.g., cameras and missiles) carried by U_i, respectively. Targets are categorized into four classes based on operational importance: core, key, important, and regular. The task requirement for target T_j is represented by a vector T_j = < R^zc_j, R^dj_j >, where R^zc_j and R^dj_j indicate the required number of reconnaissance and strike actions on T_j. Based on typical kill-chain operations, a strict temporal constraint is enforced: for any target requiring both reconnaissance and strike, the reconnaissance mission must be completed before the strike mission can begin. Leveraging efficient information-sharing mechanisms within the swarm, the reconnaissance and strike tasks on the same target can be performed by different UAV drones to minimize the sensor-to-shooter timeline, as long as the “reconnaissance-first” sequence is obeyed. A central command intelligently schedules the swarm’s resources to devise an optimal task allocation plan.

1.2 Objective Function

The timeliness of mission completion is a direct metric for evaluating the quality of a task allocation scheme. The ideal outcome is for the swarm to finish all assigned tasks in the shortest possible time. Therefore, the objective is to minimize the maximum mission completion time among all UAV drones in the swarm. The objective function F is defined as follows:

$$ \min F = F_1 + \lambda \sum_{g=1}^{4} r_g $$

where F₁ is the flight-time cost function, λ is a penalty coefficient, and r_g (g=1,2,3,4) are penalty terms for four constraint categories (temporal, task completion, range, and payload). If a constraint is satisfied, r_g = 0; otherwise, r_g = 1. The flight-time cost for drone U_i is calculated as:

$$ F_1 = \frac{D_{u_i}}{V_u} + \alpha_1 \delta_1 + \alpha_2 \delta_2 $$

Here, D_{u_i} is the total flight distance of drone U_i, V_u is its cruising speed, α₁ and α₂ are the number of reconnaissance and strike tasks it performs, and δ₁ and δ₂ are the time durations for a single reconnaissance and strike action, respectively.

1.3 Constraint Formulation

The optimization is subject to the following constraints:

1.3.1 Temporal Constraint: For any target T_j requiring both reconnaissance and strike, the start time of its reconnaissance mission must precede the start time of its strike mission.

$$ t^{zc}_{T_j} < t^{dj}_{T_j} $$

The execution times are defined as:

$$ t^{zc}_{T_j} = x^{zc}_{ij} \left( \frac{D^{1 \to j}_{i}}{V_u} + \sum_{i=1}^{N} \sum_{k=1}^{j-1} t^{zc}_{M_{ik}} + \sum_{i=1}^{N} \sum_{k=1}^{j-1} t^{dj}_{M_{ik}} \right) $$
$$ t^{dj}_{T_j} = x^{dj}_{ij} \left( \frac{D^{1 \to j}_{i}}{V_u} + \sum_{i=1}^{N} \sum_{k=1}^{j-1} t^{zc}_{M_{ik}} + \sum_{i=1}^{N} \sum_{k=1}^{j-1} t^{dj}_{M_{ik}} \right) $$

where x^zc_ij and x^dj_ij are binary decision variables (1 if drone U_i performs the respective task on T_j, 0 otherwise), and D^1→j_i is the path length for U_i to reach the location of task T_j.

1.3.2 Task Completion Constraints:

a) Reconnaissance Task Fulfillment: All required reconnaissance tasks must be assigned.

$$ \sum_{i=1}^{N} \sum_{j=1}^{M} x^{zc}_{ij} = M_{zc} $$

b) Strike Task Fulfillment: The required damage level H_i for each target class must be achieved. Given a single-strike damage probability α, the minimum required number of strikes C^t_i for a target in class i is derived from:

$$ 1 – (1 – \alpha)^{C^t_i} \ge H_i $$

The assignment must satisfy:

$$ \sum_{i=1}^{N} x^{dj}_{ij} \cdot U^{dj}_{ij} \ge C^t_i \quad \text{(for target class i)} $$
$$ \sum_{i=1}^{N} \sum_{j=1}^{M} x^{dj}_{ij} \cdot U^{dj}_{ij} = M_{dj} $$

where U^dj_ij is the number of strike tasks drone U_i performs on target T_j.

1.3.3 Range Constraint: Each drone’s flight distance cannot exceed its maximum range.

$$ D_{u_i} \le D_{max} $$

1.3.4 Payload Constraint: The number of strike tasks assigned to a drone cannot exceed its carried strike payloads (e.g., missiles). Reconnaissance payloads are assumed reusable.

$$ \sum_{j=1}^{M} x^{dj}_{ij} \cdot U^{dj}_{ij} \le R^{dj}_i $$

2. The Proposed DPSO-GA Hybrid Optimization Algorithm

Task planning for UAV drones swarms presents distinct challenges: 1) Symmetry in homogeneous swarms leads to solution space degeneracy, causing traditional algorithms to get trapped in local optima. 2) Evaluation shifts towards system-wide efficiency and load balancing, requiring multi-objective optimization capability. 3) The need for real-time, dynamic response to changes in the number of UAV drones demands algorithms with good scalability and fast convergence. 4) The presence of diverse and complex constraints necessitates robust constraint-handling mechanisms. To address these, we propose a DPSO-GA hybrid optimizer, combining the global exploration strength of GA with the rapid convergence of DPSO.

The standard PSO update equations for velocity and position are:

$$ v^{t+1}_i = \omega v^t_i + c_1 r_1 (P_i(t) – x^t_i) + c_2 r_2 (P_g(t) – x^t_i) $$
$$ x^{t+1}_i = x^t_i + v^{t+1}_i $$

where ω is inertia weight, c₁ and c₂ are cognitive and social learning factors, and P_i(t) and P_g(t) are the personal and global best positions.

2.1 Particle Encoding Scheme

To model the complex “UAV-to-Task” mappings, a one-dimensional, two-layer particle encoding is employed. The first layer is the Task Sequence Code, representing an ordered list of all reconnaissance and strike tasks. The second layer is the Executor Code, indicating the index of the UAV drone assigned to each corresponding task in the first layer. This encoding naturally represents “one drone to multiple tasks” and allows multiple drones to be assigned to tasks for the same target.

2.2 Proposed Particle Update Mechanism

Drawing inspiration from PSO’s velocity-position update, we design a discrete update formula comprising three probabilistic operations:

$$ x^{t+1}_i = c_2 \otimes f_3 \left( c_1 \otimes f_2 \left( \omega \otimes f_1(x^t_i), P_i(t) \right), P_g(t) \right) $$

The update is executed sequentially as follows:

$$ E^t_i = \omega \otimes f_1(x^t_i) = \begin{cases} f_1(x^t_i), & \text{if } r < \omega \\ x^t_i, & \text{otherwise} \end{cases} $$
$$ F^t_i = c_1 \otimes f_2(E^t_i, P_i(t)) = \begin{cases} f_2(E^t_i, P_i(t)), & \text{if } r < c_1 \\ E^t_i, & \text{otherwise} \end{cases} $$
$$ x^{t+1}_i = c_2 \otimes f_3(F^t_i, P_g(t)) = \begin{cases} f_3(F^t_i, P_g(t)), & \text{if } r < c_2 \\ F^t_i, & \text{otherwise} \end{cases} $$

Here, f₁, f₂, f₃ represent mutation, learning from personal best, and learning from global best operations, respectively. The symbol ⊗ denotes a conditional application based on a comparison with a random number r ∈ (0,1).

2.2.1 Mutation Operator (f₁): With probability ω, a mutation occurs. It has two modes chosen with equal probability: 1) Task Sequence Mutation: Randomly reshuffle the Task Sequence Code while keeping the Executor Code fixed. This facilitates large-scale global search. 2) Executor Mutation: Randomly select and change one value in the Executor Code while keeping the Task Sequence fixed. This enables local fine-tuning.

2.2.2 Learning from Personal Best Operator (f₂): With probability c₁, a crossover-like operation is performed between the current particle E^t_i and its personal best P_i(t). A subset of task positions is randomly selected. The corresponding Task-Executor pairs from E^t_i are copied to the offspring. The remaining task positions are filled in order with the Task-Executor pairs from P_i(t) that were not in the selected subset.

2.2.3 Learning from Global Best Operator (f₃): With probability c₂, the particle learns from the global best P_g(t). The Task Sequence Code is copied from the current particle F^t_i. A threshold probability ρ, which increases with iterations, determines which executor assignments from P_g(t) are copied. For each task position, if a random number is below ρ, the executor from P_g(t) is adopted; otherwise, the executor from F^t_i is retained. This adaptive ρ ensures initial exploration (low ρ, less copying) and later exploitation (high ρ, strong convergence guidance).

$$ \rho = \rho_{min} + (\rho_{max} – \rho_{min}) \frac{t}{T} $$

2.3 Adaptive Parameter Adjustment

2.3.1 Inertia Weight (ω): To balance global exploration and local exploitation, ω is non-linearly adjusted using a cosine function:

$$ \omega = \omega_{min} + (\omega_{max} – \omega_{min}) \times \cos^2\left( \frac{\pi}{2} \times \frac{t}{T} \right) $$

This keeps ω relatively high in early iterations for exploration and allows it to decay rapidly later for convergence.

2.3.2 Learning Factors (c₁, c₂): The cognitive and social factors are linearly adapted:

$$ c_1 = c_{1_{max}} – (c_{1_{max}} – c_{1_{min}}) \times \frac{t}{T} $$
$$ c_2 = c_{2_{min}} + (c_{2_{max}} – c_{2_{min}}) \times \frac{t}{T} $$

This strategy prioritizes individual particle experience (higher c₁) early on and shifts emphasis to swarm knowledge (higher c₂) as the search progresses.

2.4 Fitness Evaluation

Each particle (allocation scheme) is decoded to assign tasks to UAV drones. For each drone, its assigned tasks are clustered by target location. A greedy path planning strategy is then used to sequence these task locations, calculating the total mission time (flight + action time). The fitness is the maximum mission time among all UAV drones. Solutions violating constraints are penalized with a large fitness value via the λ coefficient in the objective function, effectively filtering them out.

Table 1: Problem Parameters and Assumptions
Parameter	Description	Value / Range
Mission Area	Operational Airspace	300 km × 300 km
Number of UAV Drones (N)	Homogeneous reconnaissance-strike platforms	8
Number of Targets (M)	Ground targets of varying importance	12
Target Classes	Core, Key, Important, Regular	2, 3, 3, 4 targets
UAV Speed (V_u)	Cruising speed	Constant value
Max Drone Range (D_max)	Operational endurance limit	Predefined distance
Single Strike Damage Prob. (α)	Weapon effectiveness	0.75
Damage Requirements (H_i)	For Core, Key, Important, Regular classes	0.9, 0.8, 0.6, 0.4
Action Durations (δ₁, δ₂)	Time for single recon / strike action	5 min, 3 min

3. Simulation Experiments and Analysis

3.1 Simulation Setup and Parameters

The scenario involves 8 UAV drones with randomly distributed initial positions. Their payload configurations vary, as shown in simulations. There are 12 randomly located ground targets with their categories and task demands predefined. The parameters for the DPSO-GA algorithm are set as follows.

Table 2: DPSO-GA Algorithm Parameter Settings
Parameter	Symbol	Value
Population Size	–	200
Maximum Iterations	T	1000
Inertia Weight Bounds	ω_min, ω_max	0.3, 0.95
Cognitive Factor Bounds	c_1min, c_1max	0.3, 0.9
Social Factor Bounds	c_2min, c_2max	0.3, 0.9
Global Learn Threshold Bounds	ρ_min, ρ_max	0.1, 0.9
Penalty Coefficient	λ	1000

3.2 Results and Discussion

Executing the proposed DPSO-GA algorithm yielded a viable and efficient task allocation plan. Analysis of the best solution confirms its validity: 1) The complex mappings were captured, with most UAV drones handling multiple tasks and several targets being engaged cooperatively by multiple drones. 2) All hard constraints, including the critical “reconnaissance-before-strike” sequence, were strictly satisfied. 3) The workload was balanced across the swarm, and the primary objective of minimizing the makespan (longest individual mission time) was effectively achieved.

To rigorously evaluate the performance of the DPSO-GA algorithm, it was compared against the standard Discrete PSO (DPSO) and Genetic Algorithm (GA) over 50 independent simulation runs. The key statistical metrics of the final best fitness values (lower is better) are summarized below.

Table 3: Performance Comparison of Algorithms (50 Runs)
Metric	DPSO-GA (Proposed)	Standard DPSO	Standard GA
Average Fitness	1.399	2.797	1.566
Maximum Fitness	1.533	3.791	2.279
Minimum Fitness	1.223	2.036	1.200
Variance	0.0069	0.1618	0.0345
Standard Deviation	0.083	0.402	0.186
95% Confidence Interval Width	~0.047	~0.229	~0.105

The comparative data clearly demonstrates the superiority of the hybrid DPSO-GA approach:

Convergence Accuracy: The proposed algorithm achieved an average fitness value 50.0% lower than DPSO and 10.7% lower than GA, indicating it consistently finds higher-quality solutions.

Stability and Robustness: The variance of DPSO-GA results is 95.7% lower than DPSO and 79.9% lower than GA. Similarly, its standard deviation is drastically reduced by 79.3% and 55.2% compared to DPSO and GA, respectively. This signifies remarkably lower result dispersion and higher run-to-run consistency.

Reliability: The width of the 95% confidence interval for the mean fitness of DPSO-GA is only 20.7% of DPSO’s and 44.8% of GA’s, reflecting a much more precise and reliable performance estimate.

In summary, the DPSO-GA hybrid algorithm successfully achieves a dual optimization of “precision-stability.” It overcomes the poor solution quality and high variability of standard DPSO while also mitigating the performance fluctuations associated with GA. This makes it particularly suitable for solving the complex, constrained task allocation problems inherent to swarms of UAV drones.

4. Conclusion

This paper addressed the challenging problem of task allocation for reconnaissance-strike integrated UAV drones swarms under realistic operational constraints. A detailed mathematical model was formulated, incorporating target prioritization, resource limitations, and crucially, the temporal dependency between reconnaissance and strike actions. To solve this model effectively, a novel hybrid DPSO-GA optimization algorithm was proposed. Its key innovations include a two-layer particle encoding scheme that naturally represents complex multi-drone, multi-task mappings, a discrete update mechanism blending mutation and crossover operators from GA with the guided search of PSO, and adaptive parameter tuning strategies for inertia weight and learning factors.

Simulation results confirm that the algorithm can generate feasible and efficient allocation plans that satisfy all hard constraints. More importantly, comparative statistical analysis against standard DPSO and GA algorithms demonstrates that the DPSO-GA hybrid achieves significantly superior performance in terms of convergence accuracy, solution stability, and operational reliability. It provides a robust and effective methodological reference for solving multi-objective, multi-constraint task planning problems in the context of intelligent UAV drones swarm operations. Future work may involve extending the model to handle dynamic target appearances, heterogeneous drone capabilities, and more sophisticated path planning integration under threat environments.