A Two-Stage Multi-Objective Collaborative Optimization Method for Heterogeneous Multi-Unmanned Drone Cooperative Detection

Unmanned Aerial Vehicles (UAVs), commonly known as drones, have become indispensable assets in modern operations, including battlefield surveillance, target detection, and environmental mapping, owing to their exceptional flexibility, low operational cost, and capacity to mitigate human risk. However, the increasing scale and complexity of modern missions often surpass the capabilities of a single unmanned drone, which is typically constrained by its specific sensor payloads, endurance, and flight performance. To overcome these limitations, the coordination of a heterogeneous fleet of unmanned drone platforms has emerged as a critical research frontier. By leveraging the complementary strengths of different drone types—such as pairing long-endurance, low-speed scouts with fast, high-payload carriers—the system’s overall coverage, mission efficiency, and robustness can be significantly enhanced. Consequently, developing effective mission planning methodologies for heterogeneous multi-unmanned drone cooperative detection is of paramount theoretical and practical importance for maximizing the operational effectiveness and survivability of drone swarm systems.

Research in this domain has traditionally focused on two core sub-problems: task allocation and path planning. Task allocation deals with assigning specific targets or areas to each unmanned drone in the fleet, considering factors like target value, drone capability, and resource constraints. Path planning focuses on determining a collision-free, efficient flight path for each drone to visit its assigned targets. A prevalent approach is to decouple and solve these problems sequentially. While methods like the Non-dominated Sorting Genetic Algorithm-II (NSGA-II) or Improved Multi-objective Gray Wolf Optimization (IMOGWO) have shown success in task allocation, and advanced search or evolutionary algorithms excel in path planning, this sequential paradigm often fails to capture their inherent interdependency. The assignment of targets directly influences the feasibility and cost of the resulting paths, and conversely, path-related constraints (like energy consumption or conflict risk) should inform the allocation process. Although some integrated frameworks have been proposed, they frequently lack a deep, iterative feedback mechanism between the two stages, leading to suboptimal global solutions.

To address this gap, this work proposes a novel Two-Stage Multi-Objective Collaborative Optimization method. The core innovation lies in a tightly coupled architecture featuring a bidirectional feedback loop, enabling iterative co-optimization of task allocation and path planning. For the task allocation stage, we develop an Enhanced NSGA-II (E-NSGA-II) algorithm incorporating a sector-based initialization strategy, adaptive mutation, and Simulated Annealing (SA)-driven local search to accelerate convergence and avoid local optima. For the path planning stage, we propose an Enhanced Multi-Objective A* (E-MOA*) algorithm, which integrates a weighted nearest-neighbor heuristic and explicit penalty mechanisms for inter-drone conflicts and sharp turns. Comprehensive simulation experiments demonstrate that our method can generate a well-distributed, high-quality Pareto-optimal front under multiple heterogeneous constraints, outperforming several state-of-the-art multi-objective optimization algorithms in terms of solution quality, frontier coverage, and convergence stability.

1. Problem Formulation and Modeling

We consider a two-dimensional operational area where a heterogeneous fleet of $M$ unmanned drone platforms is deployed from a common base $p_B = (x_B, y_B)$. The fleet must cooperatively detect $N$ stationary targets dispersed throughout the region. Each unmanned drone $u_i$, where $i \in \{1, 2, …, M\}$, is characterized by a distinct set of capabilities: a cruise speed $v_i$, initial energy $E^0_i$, flight energy consumption rate $e^f_i$, and hover energy consumption rate $e^h_i$. Each target $t_j$, where $j \in \{1, 2, …, N\}$, is defined by its coordinates $(x_j, y_j)$, detection reward $r_j$, tactical priority level $p^T_j$, and required hover time for sensor operation $\tau_j$.

The decision variables include a binary assignment matrix $X \in \{0, 1\}^{N \times M}$, where $x_{ij}=1$ if target $t_j$ is assigned to drone $u_i$, and a set of ordered visitation sequences $P = \{P_1, P_2, …, P_M\}$. For drone $u_i$, its path is $P_i = [p_{i,0}, p_{i,1}, …, p_{i,L_i}, p_{i,L_i+1}]$, where $p_{i,k}$ denotes the $k$-th node visited (either a target or the base), $L_i$ is the number of assigned targets, and by convention $p_{i,0} = p_{i,L_i+1} = p_B$.

1.1 Objective Functions

We formulate a bi-objective optimization problem balancing mission effectiveness and operational efficiency.

1) Maximize Total Weighted Detection Reward: This objective prioritizes the collective intelligence gain, combining the intrinsic reward and tactical priority of each detected target.
$$ \max f_1(X, P) = \sum_{i=1}^{M} \sum_{j=1}^{N} x_{ij} \cdot r_j \cdot p^T_j $$

2) Minimize Composite Time Cost: This objective seeks to reduce the total mission time while promoting balanced workload distribution among the drones to enhance coordination and survivability.
$$ \min f_2(X, P) = \alpha \cdot \frac{1}{M}\sum_{i=1}^{M} T_i + (1 – \alpha) \cdot \sigma_T $$
where $\alpha \in [0, 1]$ is a weighting coefficient balancing efficiency and equity. $T_i$ is the total mission time for drone $u_i$, and $\sigma_T$ is the standard deviation of the $T_i$ values across the fleet, penalizing large disparities. The mission time for a single drone comprises flight time and hover time:
$$ T^f_i = \sum_{k=0}^{L_i} \frac{d(p_{i,k}, p_{i,k+1})}{v_i}, \quad T^h_i = \sum_{j=1}^{N} x_{ij} \cdot \tau_j $$
$$ T_i = T^f_i + T^h_i $$
Here, $d(a,b)$ denotes the Euclidean distance between two nodes $a$ and $b$.

1.2 Constraints

The optimization is subject to the following realistic operational constraints:

1) Target Uniqueness Constraint: Each target can be assigned to at most one unmanned drone.
$$ \sum_{i=1}^{M} x_{ij} \le 1, \quad \forall j \in \{1, 2, …, N\} $$

2) Base Departure and Return Constraint: Every drone’s path must start and end at the common base.
$$ p_{i,0} = p_{i, L_i+1} = p_B, \quad \forall i \in \{1, 2, …, M\} $$

3) Energy Constraint: The total energy consumed by each unmanned drone must not exceed its initial energy.
$$ E_i = E^f_i + E^h_i \le E^0_i, \quad \forall i \in \{1, 2, …, M\} $$
where $E^f_i = e^f_i \cdot T^f_i$ is the flight energy and $E^h_i = e^h_i \cdot T^h_i$ is the hover energy.

2. The Two-Stage Collaborative Optimization Method

The formulated problem is a high-dimensional, nonlinear, and constrained multi-objective optimization. To tackle its complexity, we propose a Two-Stage Collaborative Optimization architecture with a bidirectional feedback mechanism, as illustrated below conceptually. This architecture breaks down the problem while maintaining crucial interdependencies.

2.1 Architecture and Bidirectional Feedback Mechanism

The core idea is to iteratively refine solutions by passing information between the task allocation and path planning stages.

Forward Pass (Allocation to Planning): The Task Allocation stage (employing E-NSGA-II) generates a candidate target assignment matrix $X$ optimized for the objectives $f_1$ and an estimated $f_2$. This assignment, along with target data, is passed to the Path Planning stage.

Backward Pass (Planning to Allocation): The Path Planning stage (employing E-MOA*) computes the exact, feasible paths $P$ for the given $X$, accurately calculating the true time costs $T_i$, energy consumption $E_i$, and identifying any inter-drone conflicts or path inefficiencies. This detailed cost and feasibility feedback is sent back to the Task Allocation stage.

Iterative Refinement: The allocation stage uses this feedback to adjust its search, penalizing assignments that lead to infeasible paths, high energy use, or severe conflicts. This “allocate-plan-feedback-revise” loop continues across generations/iterations, allowing both stages to co-adapt, ultimately converging to a set of Pareto-optimal solutions where both assignment and path are jointly optimized.

2.2 Stage 1: Enhanced Task Allocation with E-NSGA-II

We enhance the standard NSGA-II framework to better suit the heterogeneous unmanned drone task allocation problem.

1) Sector-Based Initialization: Instead of random initialization, we sort all targets by their azimuth angle relative to the base and partition them into $M$ sectors. Initial population chromosomes are created by assigning targets within a sector primarily to the corresponding drone. This provides a spatially coherent starting point, reducing initial path crossovers and accelerating convergence.

2) Adaptive Mutation Probability: The mutation probability $p_m$ is adjusted dynamically based on population diversity:
$$ p_m = p_{m\_base} + \beta \cdot (1 – SP_{norm}) – \gamma \cdot HD_{norm} $$
where $p_{m\_base}$ is a baseline rate, $SP_{norm}$ is the normalized Spacing metric (measuring objective space distribution), $HD_{norm}$ is the normalized average Hamming distance between assignment matrices (measuring decision space diversity), and $\beta, \gamma$ are control parameters. This promotes exploration early on and exploitation later.

3) Local Search Augmentation:

Simulated Annealing Micro-adjustment: Periodically, we apply the SA metaheuristic to the current Pareto set. Neighbor solutions are generated by swapping a few target assignments between drones. The acceptance probability follows the standard SA rule, allowing temporary acceptance of worse solutions to escape local optima.
Angle-Guided Repair: To maintain path coherence, we impose a constraint on the maximum angular span of targets assigned to a single unmanned drone. If violated, outliers are reassigned to drones operating in adjacent angular sectors, directly utilizing feedback from the path planner about poor geometric feasibility.

2.3 Stage 2: Enhanced Path Planning with E-MOA*

For a given task assignment $X$, we plan detailed paths using an enhanced multi-objective A* algorithm.

1) Weighted Nearest-Neighbor Heuristic: To guide the search, we employ a heuristic function $h(n)$ for a node $n$ (current target) to estimate the cost to the goal (return to base). It considers both distance and the value of remaining assigned targets:
$$ h(n) = \omega_1 \cdot d(n, p_B) + \omega_2 \cdot \sum_{t \in R} \frac{p^T_t}{r_t} $$
where $R$ is the set of remaining targets for the current drone, and $\omega_1, \omega_2$ are weights. This encourages the drone to prioritize high-value, high-priority targets earlier in the route.

2) Path Optimization and Multi-Dimensional Penalty:

2-opt Local Optimization: After a initial path is generated for a drone, a 2-opt procedure is applied to eliminate path crossovers, shortening the total flight distance without changing the visitation order.
Conflict and Maneuver Penalty: We augment the cost function $g(n)$ during the A* search with penalty terms:
$$ g'(n) = g(n) + \lambda_c \cdot C_{count} + \lambda_a \cdot \Theta_{span} $$
Here, $C_{count}$ estimates potential future inter-drone conflicts based on assigned sectors and timing, and $\Theta_{span}$ penalizes the cumulative absolute change in heading angle along the path, discouraging sharp turns. $\lambda_c$ and $\lambda_a$ are penalty coefficients informed by the feedback loop from previous iterations.

3. Experimental Results and Analysis

We conduct simulations to validate the feasibility and performance of the proposed method. The operational area is a 10 km × 10 km region with the base at the center. We consider $N=90$ targets with random locations, rewards $r_j=1$, hover times $\tau_j \in [5, 15]$ seconds, and three priority levels $p^T_j \in \{1, 1.2, 1.5\}$ distributed among the targets. A heterogeneous fleet of $M=3$ unmanned drone platforms is configured with parameters as shown in Table 1.

Table 1: Configuration of Heterogeneous Unmanned Drone Parameters
Parameter	UAV1	UAV2	UAV3
Cruise Speed (m/s)	50	60	40
Flight Energy Rate (unit/s)	1.0	1.2	0.8
Hover Energy Rate (unit/s)	0.5	0.6	0.45
Initial Energy (units)	800	800	800

Algorithm parameters were tuned via sensitivity analysis. Key final parameters are listed in Table 2. For the time cost objective, we set $\alpha=0.7$, prioritizing efficiency while moderately balancing workload.

Table 2: Key Algorithm Parameters
Parameter	Value	Parameter	Value
Population Size	30	SA Initial Temperature	1.0
Max Generations	50	SA Cooling Coefficient	0.85
Crossover Probability	0.8	Angle Repair Rate	0.05
Base Mutation Probability	0.1	Time Cost Weight ($\alpha$)	0.7

3.1 Feasibility and Pareto Front Analysis

The proposed method successfully generated a set of 12 feasible Pareto-optimal solutions. The Pareto front shows a smooth, continuous trade-off curve between total weighted reward (ranging from 67.9 to 101.3) and composite time cost (ranging from 571.7 to 730.3). All solutions strictly satisfy the base-return and energy constraints. The average energy remaining across drones for all solutions was 16.61%, indicating effective energy budget management. The iterative convergence curves for both best reward and best time cost show rapid improvement within the first 20 generations and stabilization near generation 40, confirming the efficacy of the initialization and adaptive mechanisms.

3.2 Ablation Study

To dissect the contribution of each algorithmic enhancement, we conducted an ablation study comparing the full E-NSGA-II + E-MOA* method against several variants, each with one key module removed (e.g., no sector initialization, no SA micro-adjustment, no adaptive mutation). The full method consistently achieved the highest Hypervolume (HV = 4490.41), indicating the best overall Pareto front quality. It also reached the highest best reward (103.96) while maintaining a competitive best time cost (522.98). Metrics like Generational Distance (GD=1.5) and Inverted Generational Distance (IGD=2.5) were significantly lower for the full method, demonstrating superior convergence and proximity to the true Pareto front. The results confirm that the modules work synergistically: initialization provides a good starting point, adaptive mutation balances exploration/exploitation, and SA with angle repair refines solution quality and feasibility.

3.3 Comparative Performance Evaluation

We compared our integrated two-stage method against five prominent multi-objective optimization algorithms applied to the coupled problem: NSGA-II, NSGA-III, MOEA/D, SPEA2, and IMOGWO. Each algorithm was adapted to handle the same constraints and objectives. The comparative results, averaged over multiple runs, are summarized in Table 3.

Table 3: Comparative Performance of Multi-Objective Algorithms
Algorithm	Best Reward ($f_1 \uparrow$)	Best Time Cost ($f_2 \downarrow$)	Hypervolume (HV $\uparrow$)	GD ($\downarrow$)	IGD ($\downarrow$)
*E-NSGA-II + E-MOA (Proposed)**	93.4	471.47	5396.27	2.47	5.33
NSGA-II	86.5	685.87	5108.94	152.87	210.72
NSGA-III	89.5	480.14	3085.44	8.48	14.77
MOEA/D	80.6	499.32	2270.18	12.84	18.30
SPEA2	92.4	483.98	4259.74	11.78	18.08
IMOGWO	64.9	566.74	2467.21	46.17	83.56

The proposed method demonstrates comprehensive superiority. It achieves the highest Best Reward and the lowest Best Time Cost simultaneously, showcasing its ability to find superior extremes of the trade-off spectrum. Its Hypervolume score is the highest, indicating the most comprehensive and high-quality Pareto front coverage. Furthermore, its GD and IGD values are an order of magnitude better than most competitors, signifying excellent convergence toward the true Pareto-optimal set. The convergence curves for our method are also smoother and more stable. These results validate that the two-stage collaborative optimization with bidirectional feedback is highly effective for the complex, constrained problem of heterogeneous multi-unmanned drone cooperative detection planning.

4. Conclusion

This paper addressed the critical challenge of insufficient coupling between task allocation and path planning in heterogeneous multi-unmanned drone cooperative detection missions. We proposed a novel Two-Stage Multi-Objective Collaborative Optimization method featuring a bidirectional feedback mechanism that enables iterative co-adaptation between the allocation and planning stages. The Enhanced NSGA-II algorithm incorporates strategic initialization and local search techniques for effective task assignment, while the Enhanced MOA* algorithm integrates advanced heuristics and penalty mechanisms for generating efficient, conflict-aware paths. Extensive simulation experiments confirm that the method can generate a well-distributed set of Pareto-optimal solutions under multiple real-world constraints, significantly outperforming several established multi-objective optimization algorithms in terms of solution quality, Pareto front characteristics, and convergence behavior. Future work will focus on extending the framework to dynamic environments with pop-up targets or threats and incorporating more granular models of unmanned drone sensor and communication heterogeneity.