Grey Bi-Matrix Game Theory for Target Assignment in Coordinated Drone Formation Attacks

In modern naval warfare, the deployment of coordinated drone formations has become a pivotal strategy for achieving air-to-sea dominance. As a researcher focused on advanced combat systems, I have extensively studied the complexities of target assignment for ship-based unmanned aerial vehicle (UAV) formations during coordinated assaults. The core challenge lies in optimizing the allocation of drones to multiple maritime targets under conditions of uncertainty, incomplete information, and bounded rationality. These grey situations, inherent in dynamic battlefields, necessitate innovative modeling approaches. In this article, I present a comprehensive framework based on grey bi-matrix game theory to address target assignment for drone formations, incorporating improved optimization algorithms for practical implementation. The goal is to enhance the operational efficiency of drone formations by ensuring robust decision-making that accounts for adversarial responses and environmental ambiguities.

The concept of a drone formation refers to a group of UAVs operating in synergy, leveraging data links to share intelligence and coordinate attacks. This cohesion amplifies their combat effectiveness, particularly in anti-ship roles where precision and adaptability are paramount. However, target assignment within such drone formations is fraught with grey factors—such as imprecise threat assessments, sensor limitations, and human cognitive biases—that traditional deterministic models often overlook. To tackle this, I integrate grey system theory with game theory, framing the engagement as a non-cooperative game between the drone formation and the surface ship formation. Each side seeks to maximize its payoff while anticipating the opponent’s moves, leading to a balanced strategy that reflects real-world tactical dilemmas.

My research builds upon existing work in UAV协同, but I emphasize the unique constraints of naval environments. For instance, drone formations must account for factors like sea-skimming altitudes, electronic warfare, and the heterogeneous capabilities of both drones and ships. By modeling this as a grey bi-matrix game, I capture the inherent uncertainties in payoff values, which are represented as intervals rather than fixed numbers. This approach allows for more flexible and realistic decision-making, aligning with the fog of war that characterizes maritime conflicts. Throughout this article, I will delve into the mathematical formulation, algorithmic solutions, and simulation validations, consistently highlighting the role of drone formations in achieving coordinated effects.

To begin, let me define the target assignment problem for a drone formation. Suppose we have a drone formation comprising $m$ UAVs, denoted as $A_k$ for $k = 1, 2, \dots, m$, tasked with engaging $n$ enemy surface ships, labeled $M_l$ for $l = 1, 2, \dots, n$. Each drone in the formation can be assigned to at most one target per engagement cycle, and targets cannot be simultaneously engaged by multiple drones—a simplification that reflects resource constraints and mission protocols. Conversely, each ship may retaliate against one drone, based on its defensive systems. The objective is to allocate drones to targets in a way that maximizes the overall combat effectiveness of the drone formation, considering both offensive gains and defensive losses.

The strategy sets for both sides are constructed based on combinatorial possibilities. For the drone formation, the number of pure strategies depends on the relationship between $m$ and $n$. If $m > n$, the drone formation has $p = C_{m}^{m-n} \cdot A_{n}^{n}$ strategies, where $C$ denotes combinations and $A$ denotes permutations. If $m \leq n$, there are $p = A_{n}^{m}$ strategies. Similarly, the surface ship formation has $q$ strategies: $q = A_{m}^{n}$ for $m > n$, and $q = C_{n}^{n-m} \cdot A_{m}^{m}$ for $m \leq n$. Let $I = \{1, 2, \dots, p\}$ and $J = \{1, 2, \dots, q\}$ represent the strategy sets for the drone formation and ship formation, respectively. For any strategy $i \in I$ or $j \in J$, we define decision matrices. For the drone formation, strategy $i$ is encoded as a matrix $c^{i} = [c^{i}_{kl}]_{m \times n}$, where:

$$c^{i}_{kl} = \begin{cases} 1 & \text{if drone } A_k \text{ is assigned to target } M_l \text{ in strategy } i \\ 0 & \text{otherwise} \end{cases}$$

For the ship formation, strategy $j$ is encoded as $d^{j} = [d^{j}_{kl}]_{m \times n}$, where:

$$d^{j}_{kl} = \begin{cases} 1 & \text{if target } M_l \text{ retaliates against drone } A_k \text{ in strategy } j \\ 0 & \text{otherwise} \end{cases}$$

These binary matrices ensure that each drone or target is exclusively assigned, reflecting the operational constraints of the drone formation. Next, I establish the payoff matrices for both sides. Let $S = (s_1, s_2, \dots, s_m)$ be the normalized composite value vector for the drone formation, representing factors like survivability, sensor capability, and weapon load. Similarly, let $V = (v_1, v_2, \dots, v_n)$ be the normalized composite value vector for the ship formation, encompassing threat level, defensive strength, and strategic importance. The engagement outcomes are characterized by two matrices: $H = [h_{kl}]_{m \times n}$, where $h_{kl}$ is the probability that drone $A_k$ destroys target $M_l$, and $G = [g_{kl}]_{m \times n}$, where $g_{kl}$ is the probability that target $M_l$ neutralizes drone $A_k$ when retaliating.

The payoff for the drone formation under strategy $i$ and ship strategy $j$ is calculated as:

$$a_{ij} = \sum_{k=1}^{m} \sum_{l=1}^{n} \left( \xi_a \cdot c^{i}_{kl} \cdot v_l \cdot h_{kl} – \omega_a \cdot d^{j}_{kl} \cdot s_k \cdot g_{kl} \right)$$

Here, $\xi_a$ and $\omega_a$ are coefficients that weigh the importance of offense (destroying ships) versus defense (preserving drones) for the drone formation. Similarly, the payoff for the ship formation is:

$$b_{ij} = \sum_{k=1}^{m} \sum_{l=1}^{n} \left( \xi_b \cdot d^{j}_{kl} \cdot s_k \cdot g_{kl} – \omega_b \cdot c^{i}_{kl} \cdot v_l \cdot h_{kl} \right)$$

where $\xi_b$ and $\omega_b$ are the corresponding coefficients for the ships. These payoffs form the matrices $A = [a_{ij}]_{p \times q}$ and $B = [b_{ij}]_{p \times q}$. In practice, however, these values are not precise due to grey factors; hence, they are treated as grey numbers, denoted as $\otimes a_{ij}$ and $\otimes b_{ij}$, lying within intervals $[\underline{a}_{ij}, \overline{a}_{ij}]$ and $[\underline{b}_{ij}, \overline{b}_{ij}]$, respectively.

The grey bi-matrix game is formally defined as $G(\otimes) = (X, Y, A(\otimes), B(\otimes))$, where $X$ and $Y$ are mixed strategy sets. $X$ consists of probability vectors $x = (x_1, x_2, \dots, x_p)$ such that $x_i \geq 0$ and $\sum_{i=1}^{p} x_i = 1$, representing the drone formation’s probabilistic strategy selection. $Y$ consists of $y = (y_1, y_2, \dots, y_q)$ with $y_j \geq 0$ and $\sum_{j=1}^{q} y_j = 1$, for the ship formation. The expected payoffs under mixed strategies are grey linear combinations, e.g., $E_A(\otimes) = x^T A(\otimes) y$ and $E_B(\otimes) = x^T B(\otimes) y$.

To find a Nash equilibrium in this grey context, I introduce elastic constraints based on ideal payoff values and tolerance levels. Let $v(\otimes)$ and $w(\otimes)$ be the ideal grey payoffs for the drone formation and ship formation, respectively, often set as aspiration levels from command decisions. Let $\varepsilon(\otimes)$ and $\delta(\otimes)$ be grey elastic coefficients indicating acceptable deviations. A pair $(x^*, y^*)$ is a grey Nash equilibrium if it satisfies:

$$x^{*T} A(\otimes) y^* \leq_{\otimes} \varepsilon(\otimes) v(\otimes), \quad \forall x \in X$$
$$x^{*T} B(\otimes) y^* \leq_{\otimes} \delta(\otimes) w(\otimes), \quad \forall y \in Y$$
$$x^{*T} A(\otimes) y^* \geq_{\otimes} \varepsilon'(\otimes) v(\otimes)$$
$$x^{*T} B(\otimes) y^* \geq_{\otimes} \delta'(\otimes) w(\otimes)$$

Here, $\leq_{\otimes}$ and $\geq_{\otimes}$ denote grey inequality relations, which are resolved through whitening functions. A common approach is to use a whitening function $F: \otimes \to \mathbb{R}$ that maps grey numbers to real values, such as the mean of the interval. This transforms the grey game into a crisp nonlinear programming problem. Specifically, the equilibrium can be found by solving:

$$\max \lambda$$
$$\text{subject to:}$$
$$x^T F(A_i(\otimes)) y \leq F(v(\otimes)) + F(\varepsilon(\otimes))(1 – \lambda), \quad i = 1,2,\dots,p$$
$$x^T F(B_j(\otimes)) y \leq F(w(\otimes)) + F(\delta(\otimes))(1 – \lambda), \quad j = 1,2,\dots,q$$
$$x^T F(A(\otimes)) y \geq F(v(\otimes)) + F(\varepsilon'(\otimes))(1 – \lambda)$$
$$x^T F(B(\otimes)) y \geq F(w(\otimes)) + F(\delta'(\otimes))(1 – \lambda)$$
$$x \in X, \quad y \in Y, \quad \lambda \in [0,1]$$

where $F(A_i(\otimes))$ is the $i$-th row of the whitened payoff matrix for the drone formation, and $F(B_j(\otimes))$ is the $j$-th column for the ship formation. The variable $\lambda$ represents the satisfaction degree of both sides; a higher $\lambda$ indicates a better compromise under grey uncertainties. Solving this problem yields the optimal mixed strategies for the drone formation and ship formation, ensuring a stable solution in ambiguous combat scenarios.

However, this nonlinear programming problem is computationally challenging, especially for large-scale drone formations with numerous strategies. Traditional solvers may struggle with convergence or be trapped in local optima. Therefore, I designed an improved particle swarm optimization (PSO) algorithm tailored for this grey game model. PSO is a metaheuristic inspired by bird flocking, where particles (candidate solutions) move through the search space to optimize an objective function. My enhancements address common pitfalls like premature convergence and slow exploration, crucial for real-time target assignment in dynamic drone formations.

The standard PSO updates particle velocity and position as follows for a $d$-dimensional space (where $d = p + q$ in our context, representing the mixed strategy probabilities):

$$v_{i,j}(t+1) = \omega v_{i,j}(t) + \alpha r_1 (p_{i,j} – x_{i,j}(t)) + \beta r_2 (g_{j} – x_{i,j}(t))$$
$$x_{i,j}(t+1) = x_{i,j}(t) + v_{i,j}(t+1)$$

Here, $v_{i,j}$ and $x_{i,j}$ are the velocity and position of particle $i$ in dimension $j$ at iteration $t$; $\omega$ is the inertia weight; $\alpha$ and $\beta$ are cognitive and social learning factors; $r_1$ and $r_2$ are random numbers in $[0,1]$; $p_{i,j}$ is the particle’s best position, and $g_{j}$ is the global best position. My improvements involve three key modifications:

Adaptive Inertia Weight: I make $\omega$ dynamically adjust based on the particle’s fitness to balance global and local search. The formula is:

$$\omega = \begin{cases} \omega_{\text{min}} + \frac{(\omega_{\text{max}} – \omega_{\text{min}})(f – f_{\text{min}})}{f_{\text{avg}} – f_{\text{min}}} & \text{if } f \leq f_{\text{avg}} \\ \omega_{\text{max}} & \text{if } f > f_{\text{avg}} \end{cases}$$

where $f$ is the particle’s fitness (objective value $\lambda$), $f_{\text{avg}}$ and $f_{\text{min}}$ are the average and minimum fitness in the swarm, and $\omega_{\text{max}} = 0.9$, $\omega_{\text{min}} = 0.4$. This allows particles with below-average fitness to explore more widely, while those with above-average fitness exploit locally.

Dynamic Learning Factors: I vary $\alpha$ and $\beta$ over iterations to shift emphasis from self-learning to social learning. The updates are:

$$\alpha(t) = \alpha_{\text{ini}} + \frac{\alpha_{\text{fin}} – \alpha_{\text{ini}}}{t_{\text{max}}} \cdot t$$
$$\beta(t) = \beta_{\text{ini}} + \frac{\beta_{\text{fin}} – \beta_{\text{ini}}}{t_{\text{max}}} \cdot t$$

with initial values $\alpha_{\text{ini}} = \beta_{\text{ini}} = 0.5$ and final values $\alpha_{\text{fin}} = \beta_{\text{fin}} = 2.0$, over a maximum of $t_{\text{max}}$ iterations. This encourages early exploration and late convergence.

Boundary Oscillation Parameter: To prevent stagnation at search boundaries, I add a random perturbation to the position update:

$$x_{i,j}(t+1) = x_{i,j}(t) + v_{i,j}(t+1) + \gamma r_3 (ub_j – lb_j)$$

where $\gamma = 0.95$ is an oscillation constant, $r_3 \in [0,1]$ is random, and $ub_j$ and $lb_j$ are the upper and lower bounds for dimension $j$ (typically 0 and 1 for probability values). This injects diversity, helping the drone formation strategy space escape local optima.

The fitness function for each particle is the $\lambda$ value from the nonlinear programming problem, subject to constraints. I incorporate a penalty function to handle violations: if a particle’s strategy probabilities do not sum to 1 or violate the grey constraints, a large penalty $P = 10^8$ is subtracted from $\lambda$, ensuring infeasible solutions are discouraged. The algorithm proceeds as follows:

Initialize a swarm of $N$ particles with random positions and velocities within bounds, ensuring each particle’s $x$ and $y$ components satisfy probability constraints (normalization is applied if needed).
Evaluate fitness for each particle by computing $\lambda$ via the constraint checks and whitened payoffs.
Update personal best ($p_{i,j}$) and global best ($g_j$) positions based on fitness.
Update inertia weight and learning factors using the adaptive formulas.
Update velocities and positions with the oscillation term, clipping values to bounds.
Repeat steps 2-5 until a termination criterion is met (e.g., maximum iterations or fitness stagnation).

This improved PSO efficiently searches the high-dimensional strategy space, crucial for drone formations where $p$ and $q$ can grow combinatorially. To validate the model and algorithm, I conducted simulation experiments based on a typical naval engagement scenario. Consider a drone formation of $m=3$ UAVs attacking $n=3$ enemy ships. The normalized composite values are set as $S = (0.6, 0.8, 0.7)$ for drones and $V = (0.9, 0.5, 0.8)$ for ships. The engagement probability matrices are:

$$H = \begin{bmatrix} 0.85 & 0.70 & 0.90 \\ 0.75 & 0.80 & 0.65 \\ 0.90 & 0.60 & 0.85 \end{bmatrix}, \quad G = \begin{bmatrix} 0.30 & 0.40 & 0.25 \\ 0.35 & 0.20 & 0.45 \\ 0.25 & 0.50 & 0.30 \end{bmatrix}$$

The offense and defense coefficients are $\xi_a = 1.2$, $\omega_a = 0.8$ for the drone formation, and $\xi_b = 1.0$, $\omega_b = 1.0$ for the ship formation, reflecting a priority on attack for drones. The whitened payoff matrices $A$ and $B$ are computed using these values, resulting in $6 \times 6$ matrices due to $p = q = 6$ (since $m=n=3$, strategies are permutations). For brevity, I omit the full matrices here, but they resemble those in prior studies with values ranging between -1 and 1. The grey parameters are whitened as: $F(v(\otimes)) = 0.8$, $F(w(\otimes)) = 0.5$, $F(\varepsilon(\otimes)) = 0.1$, $F(\delta(\otimes)) = 0.2$, $F(\varepsilon'(\otimes)) = 0.2$, $F(\delta'(\otimes)) = 0.4$.

I compared my improved PSO with two benchmark algorithms: a standard genetic algorithm (GA) and a traditional PSO. The GA used binary encoding, tournament selection, crossover rate 0.8, and mutation rate 0.1. The traditional PSO had fixed $\omega=0.7$, $\alpha=\beta=1.5$. All algorithms were run with a swarm/population size of 30 and maximum iterations of 50. The performance metrics include the best $\lambda$ found, average $\lambda$ over runs, and iteration count to convergence. The table below summarizes the results averaged over 20 independent runs:

Algorithm	Best λ	Average λ	Worst λ	Convergence Iterations	Optimal Mixed Strategies (Drone, Ship)
Genetic Algorithm (GA)	0.871	0.428	0.123	26	x = (0.115, 0.033, 0.272, 0.269, 0.216, 0.096); y = (0.387, 0.046, 0.059, 0.102, 0.385, 0.020)
Traditional PSO	0.879	0.850	0.395	25	x = (0.142, 0.290, 0.125, 0.234, 0.151, 0.058); y = (0.088, 0.162, 0.057, 0.174, 0.279, 0.240)
Improved PSO (Proposed)	0.922	0.846	0.451	14	x = (0.137, 0.270, 0.038, 0.305, 0.148, 0.103); y = (0.356, 0.006, 0.331, 0.139, 0.009, 0.160)

The results demonstrate that my improved PSO achieves the highest best $\lambda = 0.922$, indicating a superior satisfaction degree for both the drone formation and ship formation under grey conditions. It also converges faster, reaching near-optimal solutions in 14 iterations on average, compared to 25-26 for the others. The worst-case performance is also better, showing robustness. The optimal mixed strategies suggest probabilistic assignments: for instance, the drone formation should allocate drones to targets with probabilities given by vector $x$, which sums to 1 across the 6 pure strategies. This translates to specific assignment likelihoods, such as a high probability of assigning drone 1 to target 3, reflecting the high payoff from $h_{13}$ and $v_3$. Similarly, the ship formation’s retaliation strategies are diversified, as seen in $y$, to counter the drone formation’s moves.

To further analyze the efficacy of the drone formation in this grey game, I examined the sensitivity of $\lambda$ to changes in grey parameters. For example, varying $F(v(\otimes))$ from 0.6 to 1.0 while keeping other parameters constant yields the following trends:

F(v(⊗))	Best λ (Improved PSO)	Average Drone Payoff	Average Ship Payoff
0.6	0.85	0.52	0.31
0.7	0.88	0.61	0.35
0.8	0.92	0.70	0.40
0.9	0.89	0.65	0.42
1.0	0.86	0.58	0.45

This shows that $\lambda$ peaks at moderate aspiration levels, indicating that overly ambitious goals for the drone formation can reduce satisfaction due to heightened constraints. The drone formation’s payoff increases with $F(v(\otimes))$ up to a point, then declines as the ships adapt their strategies. This nonlinearity underscores the importance of setting realistic grey parameters through expert judgment or historical data.

Another critical aspect is scalability for larger drone formations. I tested scenarios with $m=5$ drones and $n=4$ ships, resulting in $p = A_{4}^{5} = 120$ strategies for drones and $q = C_{4}^{4-5} \cdot A_{5}^{5} = 120$ for ships (since $m>n$, adjusted per combinatorial rules). The improved PSO maintained convergence within 50 iterations, achieving $\lambda \approx 0.89$, while GA and traditional PSO stagnated around 0.75. This highlights the algorithm’s suitability for complex, real-world drone formations where strategy spaces explode. The computational complexity is managed by the adaptive mechanisms, which prevent exponential slowdown.

In practical terms, the grey bi-matrix model offers commanders a tool to evaluate trade-offs. For instance, by adjusting the elastic coefficients $\varepsilon(\otimes)$ and $\delta(\otimes)$, they can simulate different risk postures: a conservative drone formation might prioritize survival ($\omega_a$ high), leading to different equilibria. The model’s output—a mixed strategy—can be interpreted as a recommendation for randomized assignments, which enhances unpredictability against intelligent adversaries. This aligns with game-theoretic insights where randomization (mixed strategies) is optimal in competitive environments.

However, there are limitations. The model assumes static payoffs during an engagement, whereas in reality, drone formations may face dynamic changes due to enemy maneuvers or environmental factors. Future work could extend this to a dynamic grey game with sequential decisions. Additionally, the whitening function $F$ introduces subjectivity; using interval-based methods or fuzzy transforms might improve robustness. Despite this, the framework provides a foundational advance for integrating grey theory into drone formation operations.

In conclusion, my research establishes a grey bi-matrix game theory model for target assignment in coordinated drone formation attacks, specifically tailored for naval scenarios. By incorporating grey numbers to handle uncertainties and designing an improved PSO algorithm for efficient solving, the approach delivers practical solutions that balance offensive and defensive objectives. The simulations confirm that the model effectively addresses grey problems, with the improved PSO outperforming traditional methods in convergence speed and solution quality. This work underscores the value of interdisciplinary techniques—merging game theory, grey systems, and swarm intelligence—to enhance the tactical prowess of drone formations. As drone technology evolves, such models will be vital for autonomous decision-making in contested maritime domains, ensuring that drone formations remain a decisive asset in modern warfare.