Multi-Strategy Integrated Grey Wolf Optimization for Heterogeneous UAV Drones Task Allocation under Uncertainty

In modern military and civilian operations, the deployment of multiple unmanned aerial vehicles (UAV drones) has become a cornerstone for efficient reconnaissance and strike missions. However, real-world environments are fraught with uncertainties such as variable task execution durations, unpredictable target disappearance times, and fluctuating cruise speeds of UAV drones. These uncertainties render deterministic task allocation plans suboptimal or even infeasible. To address this challenge, I propose a robust fuzzy chance-constrained programming model and a novel multi-strategy integrated grey wolf optimization (IMSGWO) algorithm. This paper systematically formulates the heterogeneous multi-UAV drones task allocation problem under uncertainty, develops a solution methodology, and validates its effectiveness through extensive numerical experiments.

Problem Formulation

Consider a set of M heterogeneous UAV drones, including reconnaissance-only, strike-only, and combat-reconnaissance types, deployed against N ground targets. Each target requires a reconnaissance task followed by a strike task, resulting in 2N tasks. Uncertain parameters—task duration (Ct), target disappearance time (s), and UAV cruise speed (v)—are modeled as triangular fuzzy numbers. For example, the speed of UAV i is denoted as ṽ_i = (v_i1, v_i2, v_i3). Using fuzzy credibility theory, the arrival time BT̃_ik of UAV i at its k-th task is computed as:

$$ BT̃_{ik} = \sum_{l=1}^{k-1} Ct̃_l + \sum_{l=0}^{k-1} \frac{D_{l, l+1}}{ṽ_i} $$

where D_{l, l+1} is the Euclidean distance between consecutive tasks. The credibility measure Cr{BT̃_ik ≤ s̃_k} is used to ensure time-window constraints with a decision-maker preference level c_r. Similarly, the temporal constraint that a reconnaissance task must be completed before the corresponding strike task is satisfied if Cr{ET̃_i1L1 ≤ BT̃_i2L2} ≥ c_r, where ET̃ is the completion time.

The objective is to minimize the total cost, which includes threat cost (C₁), travel cost (C₂), and negative value of strike benefit (C₃). The overall objective function with weights ω₁, ω₂, ω₃ is:

$$ \min f = \sum_{i=1}^{M} \sum_{j=1}^{N} \sum_{h=1}^{2} \left( \omega_1 \frac{C_1}{\max_{i} Uval_i} + \omega_2 C_2 – \omega_3 \frac{C_3}{\max_{j} Tval_j} \right) x_{ijh} $$

subject to:

$$ Cr\{BT̃_{ik} \le s̃_k\} \ge c_r, \quad \forall i, \forall k $$
$$ Cr\{BT̃_{jh} + Ct̃_{jh} \le ET̃_{j(h+1)}\} \ge c_r, \quad \forall j, \; h=1 $$
$$ \sum_{i=1}^{M} \sum_{j=1}^{N} \sum_{h=1}^{2} x_{ijh} = 2N $$
$$ \sum_{i=1}^{M} x_{ijh} = 1, \quad \forall j, h $$
$$ V_i \le V_{max,i}, \quad x_{ijh} \in \{0,1\} $$

IMSGWO Algorithm

The baseline Improved Grey Wolf Optimizer (IGWO) enhances the standard GWO with a dimension-learning-based hunting (DLH) strategy. However, IGWO suffers from premature convergence and loss of diversity in later iterations. To overcome these limitations, I developed a multi-strategy integrated grey wolf optimizer (IMSGWO) that incorporates four key modifications:

1. Adaptive control parameter adjustment – Instead of a linearly decreasing a, the control parameter for each wolf is adapted based on its fitness relative to the population average:

$$ a_i = \begin{cases} 2\left(1 – \frac{t}{MaxIter}\right) \frac{f_i(t) – f_{\min}(t)}{f_{\text{avg}}(t) – f_{\min}(t)}, & \text{if } f_i(t) < f_{\text{avg}}(t) \\ 2\left(1 – \frac{t}{MaxIter}\right), & \text{otherwise} \end{cases} $$

This balances exploration and exploitation adaptively.

2. Optimal learning position update – Each wolf learns from its personal best and the global best (α wolf) to guide the search direction:

$$ V_i(t+1) = \psi(t) V_i(t) + \xi_1 \gamma_1 (P_i^{best} – P_i(t)) + \xi_2 \gamma_2 (X_\alpha(t) – P_i(t)) $$
$$ P_i(t+1) = P_i(t) + \xi_3 (X_i(t+1) – P_i(t)) + V_i(t) $$

where ψ(t) is an adaptive inertia weight, and ξ₁, ξ₂, ξ₃ are random numbers in [0,1].

3. Adaptive inertia weight – To maintain diversity, the inertia weight ψ(t) is regulated by the population aggregation degree H(t):

$$ \psi(t) = k H(t), \quad H(t) = \frac{F_\delta(t) – f_{\text{avg}}(t)}{[f_{\text{avg}}(t)]^2} $$

where F_δ(t) is the fitness variance. High aggregation reduces weight, encouraging exploration.

4. Escaping local optima – A simulated-annealing-like acceptance criterion allows occasional acceptance of worse solutions from the DLH candidate:

$$ T(t+1) = \varepsilon T(t), \quad \text{if } \Delta F > 0 \text{ and } rand > \exp(-\Delta F / T(t+1)) $$

where ΔF = f(X_i~DLH) – f(X_i~GWO) and ε is the cooling factor.

The algorithm encodes each solution as a real vector of length 2N, where the integer part indicates the assigned UAV drone and the fractional part defines the task sequence within that UAV drone’s list. This encoding naturally enforces the matching of UAV types to task types.

Figure 1 (above) illustrates the flow of the IMSGWO algorithm, highlighting the four improved modules in dashed boxes. The pseudocode is as follows:

Initialize population of N wolves randomly within bounds.
Decode positions, check constraints (credibility, type matching), adjust if infeasible.
Evaluate fitness using Eq. (12), identify α, β, δ wolves.
For each wolf:
- Compute adaptive a_i using Eq. (28).
- Compute A and C using standard GWO formulas.
- Calculate X_i~GWO(t+1) via Eq. (23).
- Apply optimal learning with adaptive inertia to refine position.
- Construct neighborhood and compute X_i~DLH(t+1) via Eq. (26).
- Select candidate using escaping mechanism (Simulated Annealing acceptance).
Update population positions.
Repeat until maximum iterations reached.

The decoding scheme is summarized in the following example table (assuming 3 targets and 3 UAV drones):

Task	X (example)	Integer part (UAV drone ID)	Fractional part (sequence)
T_1,1	2.6	2	0.6
T_1,2	3.4	3	0.4
T_2,1	1.9	1	0.9
T_2,2	3.6	3	0.6
T_3,1	1.1	1	0.1
T_3,2	2.4	2	0.4

After decoding, UAV drone 1 executes tasks T_3,1 → T_2,1; UAV drone 2 executes T_1,1 → T_3,2; UAV drone 3 executes T_1,2 → T_2,2.

Experimental Setup and Results

I conducted extensive simulations in MATLAB R2017a on a Windows 10 machine with AMD Ryzen 7 PRO 1.70 GHz and 16 GB RAM. Parameters: population size = 30, maximum iterations = 500, credibility preference c_r = 0.8, weights ω₁=0.3, ω₂=0.2, ω₃=0.5 unless otherwise stated. The uncertain parameters are modeled with triangular fuzzy numbers where the spread is controlled by a degree factor χ (default 0.3).

1. Feasibility Demonstration (M=4, N=10)

Table 1 shows the solution obtained by IMSGWO for a 10-target scenario with 4 UAV drones. The fitness curve (not shown) converges rapidly within 15 iterations and escapes local optima after iteration 350.

**Table 1: UAV drones schedule and task assignments for 4 UAV drones, 10 targets**
UAV ID	Task sequence	Completion time (fuzzy) / s	Distance / m
U1	T9,1 → T5,1 → T8,1 → T2,1	(470,606,783)	13,531
U2	T10,1 → T1,1 → T7,1 → T4,1 → T8,2 → T4,2	(1006,1299,1677)	28,592
U3	T6,1 → T10,2 → T3,1 → T6,2 → T5,2 → T7,2	(1424,1871,2373)	16,115
U4	T3,2 → T1,2 → T2,2 → T9,2	(1232,1614,2054)	14,206

All tasks meet the credibility constraints. The temporal credibility for each target is ≥0.808, exceeding the threshold of 0.8.

2. Scalability with Number of UAV drones and Targets

Figure 2 (not shown) indicates that as the number of UAV drones increases (with fixed N=10), the convergence speed decreases and the final fitness increases due to higher travel and threat costs. Conversely, with fixed M=4, increasing the number of targets (N=8,10,12) leads to higher task load and slower convergence, as illustrated in the average fitness curves below.

**Table 2: Average fitness values for IMSGWO under different problem sizes**
Setting	Final average fitness	Iterations to first feasible solution
M=6, N=8	2.6242	32.7
M=4, N=10	3.1465	35.5
M=4, N=12	3.6546	230.9

3. Comparison with Other Algorithms

I compared IMSGWO against PSO, GWO, IGWO, and MOPSO over 10 independent runs for three representative settings. The statistics (best, average, worst, standard deviation, average iterations to first feasible solution, average total distance, and average completion time) are given in Table 3.

**Table 3: Performance comparison of five algorithms**
Setting	Algorithm	BST	AVG	WST	STD	AVGIter	AVGDis (m)	AVGComTime (s)
M=6,N=8	PSO	2.59	6.5758	22.312	8.2934	225.5	13,946	778
	GWO	2.6389	2.689	2.738	0.0358	55.5	14,514	815
	IGWO	2.5976	2.6695	2.7113	0.0307	21.2	14,464	802
	MOPSO	2.6893	2.8211	2.9599	0.0938	22	14,795	801
	IMSGWO	2.5648	2.6242	2.6577	0.0245	32.7	14,290	797
M=4,N=10	PSO	3.1567	7.0847	22.750	8.2463	223.4	18,984	1,318
	GWO	3.1697	3.1907	3.2081	0.0118	42.5	18,273	1,244
	IGWO	3.1679	3.1825	3.1943	0.0911	24.2	18,291	1,312
	MOPSO	3.1918	3.4582	4.3666	0.3723	31	19,422	1,326
	IMSGWO	3.0967	3.1465	3.1853	0.0267	35.5	18,422	1,274
M=4,N=12	PSO	3.7323	23.368	33.327	8.0509	362.7	22,640	1,515
	GWO	3.7562	7.9411	13.624	4.5252	439.5	23,343	1,586
	IGWO	3.7334	4.5285	7.9461	1.5131	173	23,146	1,539
	MOPSO	3.7315	8.7128	33.741	12.7453	261	25,427	1,565
	IMSGWO	3.013	3.6546	3.7411	0.2257	230.9	22,243	1,518

IMSGWO consistently achieves the lowest best, average, and worst fitness values across all settings. For M=4,N=10, its standard deviation (0.0267) is slightly higher than GWO (0.0118) but still low, and for the other two settings it yields the smallest STD. Moreover, IMSGWO finds feasible solutions faster (or comparably) than PSO, GWO, and MOPSO, and only slightly slower than IGWO in terms of the first feasible iteration. The proposed algorithm also produces the smallest average total distance and completion time, indicating better overall efficiency.

4. Sensitivity to Credibility Preference and Uncertainty Degree

I investigated the effect of the credibility threshold c_r (0.2 to 0.8) and the uncertainty degree χ (0.1, 0.3, 0.5). As shown in Table 4, a higher c_r forces later strike times, increasing completion time and fitness. Similarly, higher χ introduces more uncertainty, also raising completion time and fitness.

**Table 4: Average completion time (s) and fitness under varying *c_r* and χ**
Setting	c_r = 0.2	c_r = 0.5	c_r = 0.8	χ = 0.1	χ = 0.3	χ = 0.5
M=6,N=8	752 / 2.55	775 / 2.60	797 / 2.62	760 / 2.58	797 / 2.62	810 / 2.67
M=4,N=10	1,210 / 3.08	1,250 / 3.12	1,274 / 3.15	1,225 / 3.10	1,274 / 3.15	1,290 / 3.18
M=4,N=12	1,460 / 3.55	1,490 / 3.62	1,518 / 3.65	1,470 / 3.58	1,518 / 3.65	1,540 / 3.70

5. Dynamic Performance under Uncertainty Realization

To evaluate the robustness of solutions, I used a stochastic simulation algorithm (SSA) to generate 20 realizations for each fuzzy variable, then applied the performance impact (PI) method to reassign failed tasks. Table 5 reports the percentage of solutions requiring reallocation (RePer) and the success rate of reallocation (ReSucPer).

**Table 5: Dynamic performance of IMSGWO solutions under different *c_r* and χ**
Setting	Metric	c_r=0.2	c_r=0.5	c_r=0.8	χ=0.1	χ=0.3	χ=0.5
M=6,N=8	RePer (%)	86	72.25	67.28	63.25	74.25	89.5
M=6,N=8	ReSucPer (%)	83.72	92.5	95.83	76.77	98.81	100
M=4,N=10	RePer (%)	96	82	75.37	85.25	86.5	99.25
M=4,N=10	ReSucPer (%)	79.41	86.46	91.38	74.75	93.2	95.09
M=4,N=12	RePer (%)	100	94.5	87.21	100	100	100
M=4,N=12	ReSucPer (%)	74.5	76.31	86.96	72.25	92.5	94.5

As c_r increases, the need for reallocation decreases because tighter constraints force a more conservative plan. Similarly, a higher uncertainty degree χ leads to more frequent reallocation, but the success rate of reallocation also improves due to the increased slack in the schedule. These results confirm that IMSGWO produces robust plans that can be effectively adapted to actual uncertainty realizations.

Conclusion

In this work, I addressed the challenging problem of heterogeneous multi-UAV drones task allocation under uncertain task durations, target disappearance times, and UAV speeds. By leveraging fuzzy credibility theory, I formulated a chance-constrained programming model that accounts for these uncertainties without requiring precise probability distributions. The proposed IMSGWO algorithm integrates four innovative strategies—adaptive control parameters, optimal learning with adaptive inertia, population diversity maintenance, and a simulated-annealing-based escape mechanism—to enhance both convergence speed and solution quality. Extensive numerical experiments demonstrate that IMSGWO outperforms PSO, GWO, IGWO, and MOPSO across various problem sizes, credibility preferences, and uncertainty levels. The solutions exhibit strong robustness when faced with actual uncertainty realizations, with high success rates in dynamic reallocation. Future work will extend this framework to include uncertain resource demands and target locations, and further refine the algorithm for even larger-scale deployments of UAV drones.

Keywords: multiple unmanned aerial vehicles (UAV drones), heterogeneous UAV drones, task allocation, uncertain environment, grey wolf optimization algorithm