A Game Task Assignment Algorithm for Heterogeneous UAV Swarms

We propose a novel task assignment algorithm for heterogeneous drone swarms, specifically designed to address the challenges of multi-UAV cooperation in complex and resource‑limited airspace environments. Our method integrates task region screening, preference computation, and stable matching theory, and is crafted with the growing needs of China drone operations in mind. Through systematic analysis and extensive simulations, we demonstrate that our approach significantly improves allocation efficiency, coalition stability, and overall mission benefits compared to traditional methods.

1. Introduction

The rapid advancement of unmanned aerial vehicle (UAV) technology has enabled a wide range of applications, from environmental monitoring to military reconnaissance. In many practical scenarios, especially within China drone ecosystems, single‑UAV systems often fall short due to limited coverage and low fault tolerance. Multi‑UAV systems, on the other hand, can collaborate through dynamic coalition formation to achieve complex tasks. However, the allocation of heterogeneous UAVs to tasks remains a critical and challenging problem. Existing methods, such as auction‑based or marginal‑benefit approaches, often suffer from high computational complexity, frequent coalition restructuring, and suboptimal global reward.

In this work, we propose a game‑theoretic task assignment method that combines task‑region screening, preference calculation, and stable matching based on the Gale‑Shapley algorithm. Our contributions are threefold:

  • We design a task‑region screening mechanism that filters candidate UAVs based on environmental difficulty and task requirements, reducing the problem size and improving computational efficiency.
  • We introduce a preference‑based initial assignment followed by a Gale‑Shapley stable matching to ensure stability, and then optimize coalition composition using Shapley value‑based pruning and secondary reallocation.
  • We evaluate our method against two benchmark algorithms (random marginal benefit and greedy auction) under various scales. The results show that our method achieves over 90% of the optimal coalition reward, with fewer iterations and shorter runtime, making it well suited for large‑scale China drone applications.

2. Problem Model

Consider a three‑dimensional airspace containing a set of heterogeneous UAVs \(U = \{u_1, u_2, \dots, u_n\}\) and a set of tasks \(R = \{R_1, R_2, \dots, R_m\}\), with \(n > m\). Each task \(R_i\) is defined as a spherical region centered at \((x_i, y_i, z_i)\) with radius \(r_i\):

$$
R_i = \{(x, y, z) \ | \ \|(x,y,z)-(x_i,y_i,z_i)\| \le r_i,\ z \ge 0\}.
$$

The environment may contain obstacles; their quantity and positions influence the task difficulty. Let the environmental cost function be:

$$
H(x,y,z) = \sum_{i=1}^{N} \frac{\kappa_i}{(x-x_i)^2 + (y-y_i)^2 + (z-z_i)^2},
$$

where \(\kappa_i\) is the intensity factor of the \(i\)-th obstacle and \(N\) is the number of obstacles.

2.1 Task Region Screening

For each task \(R_j\), we compute a dynamic screening radius:

$$
d_j = D(d_j) = G(f_j)\cdot t, \qquad G(f_j) = \frac{1}{1+H(P_j)},
$$

where \(t\) is an initial radius parameter and \(P_j\) is the task center. UAVs within distance \(d_j\) form an initial candidate set \(U_j = \{u_i \in U \ |\ \|u_i – P_j\| \le d_j\}\). If \(|U_j|\) exceeds the maximum capacity \(C_{\max}(R_j)\), we select the top \(k\) UAVs according to their capability function \(p(u_i)\):

$$
U’_j = \text{top}_k\{p(u_i)\ |\ u_i \in U_j\}.
$$

If \(|U_j|\) is too small, we expand the radius by a factor \(\varepsilon\):

$$
d_j = (1+\varepsilon)\, d_j.
$$

The global initial UAV set is the union over all tasks:

$$
U_{\text{int}} = \bigcup_{j=1}^{m} U’_j.
$$

2.2 Preference Calculation

Each UAV \(u_i\) in \(U_{\text{int}}\) computes a preference value for each task \(R_j\):

$$
Y(u_i, R_j) = T\big(P(u_i,R_j) + H(p_j) + F(u_i,R_j)\big),
$$

where \(T\) denotes normalization, and

$$
P(u_i,R_j) = w_1\frac{E(u_i)}{E_{\max}} + w_2\frac{v(u_i)}{v_{\max}} + w_3 A(u_i,R_j),
$$
$$
F(u_i,R_j) = F_{\text{reward}}(R_j) – F_{\text{cost}}(u_i,R_j),
$$
$$
F_{\text{cost}}(u_i,R_j) = c_1\,d(u_i,R_j) + c_2\frac{d(u_i,R_j)}{E(u_i)}.
$$

Here \(E(u_i)\) is the endurance, \(v(u_i)\) velocity, \(A(u_i,R_j)\) an adaptability factor (function of historical assignment), and \(F_{\text{reward}}(R_j)\) the task reward. The coefficients satisfy \(w_1 \ge w_2 \ge w_3\).

2.3 Coalition Utility Function

The total utility of a coalition \(C_j\) assigned to task \(R_j\) is:

$$
S(R_j) = Q(R_j) + V(R_j) – K(C_j,R_j) – \gamma H(p_j).
$$

The cooperation reward term is:

$$
V(R_j) =
\begin{cases}
\alpha\cdot\beta(R_j)\cdot(\text{Cap}(C_j) – Q(R_j)), & \text{Cap}(C_j) > Q(R_j),\\
0, & \text{otherwise},
\end{cases}
$$
$$
\beta(R_j) = \lambda_1 Q(R_j) + \lambda_2 H(p_j),
$$

and the penalty for excessively large capacity is:

$$
K(C_j,R_j) = \mu\cdot\max(0,\ \text{Cap}(C_j) – \lambda_3 Q(R_j))^2.
$$

The total coalition capability is:

$$
\text{Cap}(C_i) = \sum_{u_j \in C_i} P(u_j,R_i) – H(p_i).
$$

Combining all, we obtain the final utility function:

$$
S(R_j) = Q(R_j) + \alpha\max(0,\beta(R_j)\cdot\text{Cap}(C_j)-Q(R_j)) – \mu\max(0,\text{Cap}(C_j)-\lambda_3 Q(R_j))^2 + \gamma H(p_j).
$$

3. Algorithm

3.1 Task Coalition Screening

Algorithm 1 performs the screening phase, as described in Section 2.1.

Table 1: Task Coalition Screening Algorithm
Step Description
1 Initialize task set \(T\) and UAV set \(U\).
2 For each task \(s=1\) to \(T_n\):
3 Compute screening radius \(d_s\) using (4) and (5). Obtain \(U_s = \{u_i \in D_s\}\).
4 If \(|U_s| > U_{\max}\): select top \(k\) UAVs via (6) → \(U’_s\).
5 Else if \(|U_s| \le U_{\min}\): expand radius using (7), update set \(U’_s\).
6 \(U_{\text{int}} = \bigcup_{s=1}^m U’_s\).

3.2 Stable Matching Allocation

Algorithm 2 performs the preference‑based assignment and coalition optimization.

Table 2: Stable Matching Allocation Algorithm
Step Description
1 Input \(U_{\text{int}}\) and tasks \(T_{\text{int}}\).
2 For each UAV \(u_i\) in \(U_{\text{int}}\): compute preference list \(Y_i\) using (9); assign to task with highest preference.
3 For each task \(R_j\): evaluate coalition utility \(S(R_j)\); if constraints violated, compute Shapley values \(\phi(u_i)\) for all members.
4 Remove UAV with smallest \(\phi(u_i)\) and add to leisure list \(U_{\text{leisure}}\).
5 While \(U_{\text{leisure}} \ne \emptyset\): pick a random UAV, try to assign it to the next preferred task; accept if coalition utility increases; update coalitions.

The Shapley value of a UAV \(u_i\) in coalition \(C_j\) is defined as:

$$
\phi(u_i) = \sum_{S \subseteq C_j \setminus \{u_i\}} \frac{|S|!\,(|C_j|-|S|-1)!}{|C_j|!} \cdot \Delta S(u_i,S),
$$

with \(\Delta S(u_i,S) = S(R_j,\,S\cup\{u_i\}) – S(R_j,S)\).

3.3 Stability Analysis

We prove that under strict preference lists, the Gale‑Shapley algorithm always yields a stable matching even in the many‑to‑one setting with capacity constraints. UAVs propose to tasks in decreasing order of preference; tasks temporarily hold the best proposals. The iterative removal of low‑contribution UAVs (via Shapley value) and reallocation preserves stability. The process terminates at a Nash equilibrium where no UAV can unilaterally improve its payoff by switching tasks.

4. Experimental Results

We conducted simulations using a Python environment. The airspace contained 10–120 UAVs and 5–60 tasks. Parameters were set as: coalition reward coefficient \(\alpha = 0.5\), penalty coefficient \(\mu = 0.2\), screening radius scaling \(\varepsilon = 0.1\). Three algorithms were compared: our proposed Optimized Stable Matching (OSM), Random Marginal Benefit (RMB), and Greedy Auction (GA).

4.1 Small‑Scale Scenario

We first tested with 10 UAVs and 3 tasks. The optimal coalition structure was \(C_1 = \{u_1,u_5\},\ C_2 = \{u_3,u_4\},\ C_3 = \{u_6,u_8\}\) yielding total reward 123.69. Table 3 and Table 4 show initial and final allocations.

Table 3: Initial Distribution (Small Scale)
Algorithm Task Initial Coalition Reward
OSM 1 {1,2,4,5,6,7,8,9,10} -18.38
OSM 2 {2} 18.53
OSM 3 0
RMB 1 {4,5} 27.8
RMB 2 {3,6,7,8,10} 12.96
RMB 3 {1,2,9} 30.20
GA 1 0
GA 2 0
GA 3 0
Table 4: Final Distribution (Small Scale)
Algorithm Task Final Coalition Reward
OSM 1 {4,5} 29.32
OSM 2 {3,8} 47.30
OSM 3 {6,7} 42.13
RMB 1 {1,5} 35.83
RMB 2 {3,4} 47.25
RMB 3 {6,8} 40.61
GA 1 {1} 11.75
GA 2 {3} 15.81
GA 3 {6,7} 42.07

Our OSM algorithm achieved total reward 118.75 (96% of optimal), RMB reached 123.69 (optimal) but required many iterations, while GA only obtained 69.63. OSM also had the lowest structural change count (5 changes) versus RMB (14) and GA (4). This demonstrates OSM’s stability and near‑optimal performance.

4.2 Large‑Scale Experiments

We tested with 25–120 UAVs and 5–60 tasks. Table 5 summarizes average coalition changes and runtime over 20 runs per configuration.

Table 5: Multi‑Scale Performance
Scenario Algorithm Changes Runtime (s)
U25‑T5 OSM 7 0.31
RMB 261.89 0.31
GA 8 0.27
U40‑T10 OSM 22 1.37
RMB 54 6.31
GA 10 1.38
U90‑T30 OSM 31 3.67
RMB 61 12.27
GA 22 1.91
U120‑T60 OSM 43 4.61
RMB 105 29.34
GA 38 2.66

OSM consistently required fewer coalition changes and maintained low runtime, especially in large‑scale scenarios. RMB’s change count grew dramatically, and GA, although fast, produced much lower total rewards. Figure 1 (below) illustrates the total reward comparison across multiple scales.

The figure confirms that OSM’s total reward remains above 90% of the optimal (computed by exhaustive search for small cases). For large problems with 120 UAVs and 60 tasks, OSM outperformed both GA and RMB in reward, while maintaining the best stability and runtime.

5. Conclusion

We have presented a novel game‑theoretic task assignment algorithm for heterogeneous UAV swarms. By combining task‑region screening, preference‑based Gale‑Shapley matching, and Shapley‑value coalition optimization, our method achieves efficient, stable, and near‑optimal allocations. Extensive experiments demonstrate that our algorithm significantly reduces computational overhead and coalition restructuring, making it highly suitable for large‑scale dynamic environments. As China drone technologies continue to evolve, such intelligent allocation mechanisms will be crucial for real‑world applications including disaster response, surveillance, and logistics. Future work will incorporate deep learning for online preference learning and consider communication constraints to further enhance practicality.

Scroll to Top