Enhanced Decision Learning Dung Beetle Optimizer for Cooperative Path Planning of China UAV Drones

The rapid advancement of unmanned aerial vehicle (UAV) technology has positioned multi-drone cooperative operations as a pivotal form of future aerial combat. As a core technology within this domain, cooperative path planning directly determines mission effectiveness. However, the necessity to simultaneously account for multiple factors—such as threat avoidance, formation maintenance, and energy constraints—renders the planning problem high-dimensional, nonlinear, and subject to numerous constraints. Traditional optimization algorithms often struggle to solve such problems effectively. While intelligent optimization algorithms are capable of tackling complex optimization issues, they frequently exhibit shortcomings in convergence speed and solution quality. The Dung Beetle Optimizer (DBO), an emerging swarm intelligence algorithm, demonstrates considerable potential for solving complex optimization problems. Nevertheless, its simplistic position update mechanism can easily lead to convergence to local optima. Therefore, this research proposes an improved Decision Learning-Dung Beetle Optimizer (DL-DBO) and applies it to the cooperative path planning problem for China UAV drones to achieve superior planning outcomes.

The strategic application of intelligent path planning algorithms is crucial for enhancing the operational capabilities of China UAV drone swarms in complex, contested environments. Effective coordination allows these drones to perform reconnaissance, surveillance, and other missions with greater efficiency and survivability.

1. The Improved DL-DBO Algorithm

1.1 Algorithm Enhancement Strategies

The DL-DBO algorithm is an intelligent optimization algorithm developed from the traditional DBO. The traditional DBO simulates the behavior of dung beetles in nature as they search for, transport, and store dung balls. This process is mapped to an optimization problem, where the movement of individual beetles in three-dimensional space represents the search for a solution. The position of the dung ball corresponds to a feasible solution, its quality to the objective function value, and the beetle’s trajectory reflects the search path within the solution space. In traditional DBO, the position of an individual beetle is updated as follows:

$$X_i(t+1)=X_i(t)+r_1 \cdot (X_{best}-X_i(t))+r_2 \cdot (X_{rand}-X_i(t))$$

where $ X_i(t) $ is the position of the $ i $-th beetle at iteration $ t $; $ r_1 $, $ r_2 $ are random numbers in [0, 1]; $ X_{best} $ is the current global best position; and $ X_{rand} $ is a randomly selected reference point.

This simple update mechanism, however, is prone to premature convergence and exhibits slow convergence speed. To address these limitations, the following enhancement strategies are proposed for application in optimizing paths for China UAV drone fleets.

1) Adaptive Weight Factor: An adaptive weight factor is introduced to balance the global exploration and local exploitation capabilities of the algorithm. A larger weight facilitates global search during early iterations. As iterations progress, the weight gradually decreases to enhance local fine-tuning. The adaptive weight $ \omega $ is updated as:

$$ \omega = \omega_{max} – (\omega_{max} – \omega_{min}) \times \left( \frac{t}{T_{max}} \right)^2 $$

where $ \omega_{max} $ and $ \omega_{min} $ are the upper and lower bounds of the weight, respectively; $ t $ is the current iteration number; and $ T_{max} $ is the maximum number of iterations.

2) Decision Learning-Based Position Update Strategy: A decision experience pool is established to record historically superior solutions. Individuals adaptively learn from this pool combined with current population information. The new position update equation is:

$$ X_i^{new} = X_i + \alpha \cdot (X_{best} – X_i) + \beta \cdot (X_{exp} – X_i) $$

where $ X_i^{new} $ is the updated position; $ X_i $ is the current individual position; $ \alpha $ is the individual learning factor, representing reliance on personal best experience; $ X_{exp} $ is an excellent solution selected from the experience pool; and $ \beta $ is the social learning factor, representing the degree of learning from the group’s best experience.

1.2 Improved Algorithm Flow Design

Incorporating the aforementioned strategies, the detailed workflow of the DL-DBO algorithm is designed. The process includes key steps such as initialization, fitness evaluation, decision learning, position update, and population optimization.

Initialization: Parameters are initialized, including population size $ N $, maximum iterations $ T_{max} $, adaptive weight range $ [\omega_{min}, \omega_{max}] $, learning factors $ \alpha $ and $ \beta $, initial crossover probability $ P_{c0} $, and initial mutation probability $ P_{m0} $. An initial population is randomly generated, and a decision experience pool with a capacity of $ N/2 $ is established to store historical elite solutions using an elitist preservation strategy.

Fitness Evaluation: The fitness value of each individual is calculated by comprehensively considering the objective function value and the satisfaction degree of constraints. The individual best and global best positions are updated accordingly.

Dynamic Parameter Adjustment: The weight factor $ \omega $ is updated using Eq. (2) to transition smoothly from global exploration to local exploitation. Learning factors are adjusted based on population diversity, increasing perturbation when the population becomes too concentrated. Crossover and mutation probabilities are also adapted to maintain diversity.

Decision Learning Update: A roulette wheel selection is used to choose a learning exemplar $ X_{exp} $ from the experience pool. The new position for an individual is calculated using the enhanced update equation (3). A boundary handling mechanism is applied to correct any out-of-bounds positions.

Hybrid Crossover and Mutation Operation: This is a critical step for maintaining genetic diversity and refining solutions for complex China UAV drone path planning.

Crossover: Three strategies are employed:
- Single-point Crossover: Information exchange at a random position.
- Arithmetic Crossover: Produces offspring via a linear combination of parent features: $ Child = \mu \cdot Parent_1 + (1-\mu) \cdot Parent_2 $, where $ \mu $ is a random number in [0,1].
- Heuristic Crossover: The crossover direction is determined based on the fitness difference between parents.
Mutation: Three strategies are utilized:
- Gaussian Mutation: Introduces a random perturbation following a normal distribution: $ X’ = X + \mathcal{N}(0, \sigma^2) $.
- Uniform Mutation: Replaces a gene with a random value from a specified range.
- Non-uniform Mutation: The mutation magnitude decreases gradually over iterations.

The algorithm iterates through the evaluation, update, and variation steps until the termination criterion (e.g., reaching $ T_{max} $) is met, finally outputting the optimal solution for the China UAV drone cooperative path planning problem.

1.3 Algorithm Performance Analysis

The performance of the improved DL-DBO algorithm is analyzed theoretically in terms of convergence, computational complexity, and stability.

Convergence: The introduction of adaptive weight and decision learning mechanisms enhances global search capability. Let the population size be $ N $. At iteration $ t $, the position update for individual $ i $ can be described probabilistically. According to the algorithm design, as $ t \to \infty $, $ \omega \to \omega_{min} $, causing the algorithm to focus on local fine search, with position updates gradually converging to the vicinity of the global optimum.

Computational Complexity: The time complexity per iteration for DL-DBO is analyzed and compared with other algorithms. While the decision learning and hybrid operations add some overhead, they significantly improve solution quality, leading to a better overall efficiency in solving the China UAV drone path planning problem.

Algorithm	Average Time Complexity per Iteration	Key Operations
Standard PSO	$ O(N \cdot D) $	Velocity & position update
Standard DBO	$ O(N \cdot D) $	Simple position update
Proposed DL-DBO	$ O(N \cdot D + N \cdot E) $	Decision learning, hybrid crossover/mutation (E: cost of experience pool ops)

Stability: The use of an experience pool and elite preservation helps retain good genetic material across generations, improving the robustness and stability of the optimization process for different mission scenarios involving China UAV drones.

2. Cooperative Path Planning Model for China UAV Drones and Its Solution

2.1 Problem Modeling

A mathematical model for cooperative path planning is established based on the improved DL-DBO algorithm. Consider a cooperative formation of $ M $ China UAV drones that need to plan optimal trajectories from start to target points in a 3D space containing threat zones and obstacles.

The state vector for a drone is defined as:

$$ S_i(t)=[x_i(t), y_i(t), z_i(t), v_i(t), \theta_i(t), \psi_i(t)] $$

where $ S_i(t) $ is the complete motion state of the $ i $-th drone at time $ t $; $ x_i(t), y_i(t), z_i(t) $ are its coordinates; $ v_i(t) $ is its speed; $ \theta_i(t) $ is the pitch angle; and $ \psi_i(t) $ is the yaw angle.

Considering motion constraints, the state transition equations are:

$$
\begin{aligned}
x_i(t+1) &= x_i(t) + v_i(t)\cos\theta_i(t)\cos\psi_i(t)\Delta t \\
y_i(t+1) &= y_i(t) + v_i(t)\cos\theta_i(t)\sin\psi_i(t)\Delta t \\
z_i(t+1) &= z_i(t) + v_i(t)\sin\theta_i(t)\Delta t
\end{aligned}
$$

where $ \Delta t $ is the time step increment.

The objective function for path planning integrates multiple critical aspects for China UAV drone missions:

$$ F = \omega_1 f_{len} + \omega_2 f_{threat} + \omega_3 f_{form} + \omega_4 f_{energy} $$

where $ F $ is the comprehensive objective value; $ \omega_1, \omega_2, \omega_3, \omega_4 $ are weight coefficients; $ f_{len} $ is the path length cost; $ f_{threat} $ is the threat exposure cost; $ f_{form} $ is the formation keeping cost; and $ f_{energy} $ is the energy consumption cost.

These cost functions are detailed below:

Path Length Cost: $ f_{len} = \sum_{i=1}^{M} L_i $, where $ L_i $ is the total length of the path for drone $ i $.
Threat Exposure Cost: Modeled using a Gaussian threat field. For a threat center at $ (x_t, y_t, z_t) $ with intensity $ I_t $ and spread $ \sigma_t $, the threat cost for a path point $ (x,y,z) $ is $ I_t \cdot \exp\left(-\frac{(x-x_t)^2+(y-y_t)^2+(z-z_t)^2}{2\sigma_t^2}\right) $. $ f_{threat} $ is the sum of these costs along all paths.
Formation Keeping Cost: $ f_{form} = \sum_{t} \sum_{i \neq j} \left( \| \mathbf{p}_i(t) – \mathbf{p}_j(t) \| – d_{desired} \right)^2 $, where $ \mathbf{p}_i(t) $ is the position of drone $ i $ at time $ t $ and $ d_{desired} $ is the desired inter-drone distance.
Energy Consumption Cost: Approximated as $ f_{energy} = \sum_{i=1}^{M} \int_{0}^{T_i} P(v_i(t), a_i(t)) dt $, where $ P $ is a power function dependent on velocity and acceleration.

The model incorporates the following constraints for practical deployment of China UAV drones:

Speed Constraint: $ v_{min} \leq v_i(t) \leq v_{max} $
Turning Angle Constraint: $ |\psi_i(t+1) – \psi_i(t)| \leq \Delta \psi_{max} $
Climb Angle Constraint: $ |\theta_i(t)| \leq \theta_{max} $
Collision Avoidance: $ \| \mathbf{p}_i(t) – \mathbf{p}_j(t) \| \geq d_{safe}, \quad \forall i \neq j $
Obstacle/Threat Zone Avoidance: Maintain a minimum distance from all defined obstacles and high-threat centers.

2.2 Solution Strategy Design

A solution strategy based on the improved DL-DBO algorithm is designed for the aforementioned model, focusing on path encoding, fitness evaluation, and constraint handling.

Path Encoding: A real-number encoding based on path waypoints is used. The flight route for each drone consists of $ n $ waypoints, with each point containing position information. The solution vector $ X $ for a formation of $ M $ China UAV drones is:

$$ X = [P_1^1, P_2^1 \ldots P_n^1, \; P_1^2, P_2^2 \ldots P_n^2, \; \ldots, \; P_1^M, P_2^M \ldots P_n^M] $$

where $ P_n^M $ represents the $ n $-th waypoint of the $ M $-th drone.

Fitness Evaluation: A penalty function method handles constraints. The comprehensive evaluation function is:

$$ \text{Fitness}(X) = F(X) + \sum_{k=1}^{K} \lambda_k \cdot [\max(0, g_k(X))]^2 $$

where $ \text{Fitness}(X) $ is the fitness value; $ F(X) $ is the original objective function value from Eq. (7); $ \lambda_k $ is the penalty factor for the $ k $-th constraint; $ K $ is the total number of constraints; and $ g_k(X) $ is the violation degree of the $ k $-th constraint.

An adaptive penalty factor update mechanism is designed to improve efficiency:

$$ \lambda(t+1) = \lambda(t) \cdot \left( 1 + \beta \cdot \frac{N_v}{N} \right) $$

where $ \lambda(t+1) $ and $ \lambda(t) $ are the penalty factors at iterations $ t+1 $ and $ t $, respectively; $ \beta $ is a tuning coefficient; $ N_v $ is the number of individuals violating constraints in the current population; and $ N $ is the population size.

2.3 Algorithm Application Implementation

The implementation of the improved DL-DBO for China UAV drone cooperative path planning involves three key steps: environment construction, algorithm execution, and result optimization.

Environment Construction: A 3D digital map is created. Threat and obstacle areas are discretized using a grid-based method, where each cell is assigned a threat value. The initial and target configurations for the drone formation are established.

Algorithm Execution & Functional Modules: The DL-DBO algorithm operates within a framework containing several specialized modules:

Path Smoothing Module: Ensures the planned path satisfies kinematic constraints using techniques like spline interpolation.
Formation Control Module: Calculates the formation keeping cost and provides corrective guidance to maintain the desired fleet geometry for the China UAV drones.
Threat Avoidance Module: Computes the proximity of path segments to threat zones and feeds the threat cost into the fitness evaluation.

Data interfaces facilitate information exchange and协同 optimization among these modules during the DL-DBO’s iterative search.

Result Optimization: A post-processing stage refines the best-found path. Cubic spline interpolation smooths the discrete waypoints. A priority-based conflict detection and resolution mechanism is applied to handle potential inter-drone collisions during flight.

3. Simulation Experiments and Result Analysis

3.1 Experimental Environment and Parameter Settings

To validate the effectiveness of the improved DL-DBO algorithm for China UAV drone cooperative path planning, a simulation experiment platform was constructed. Experiments were conducted on a computer with an Intel Core i7-12700K processor and 32 GB RAM, using MATLAB R2023a. The task space was a 3D region of 100 km × 100 km × 10 km, containing 5 threat zones modeled with Gaussian decay and 3 polyhedral obstacle zones. A formation of 4 drones was deployed, starting from the southwest corner and targeting the northeast corner of the space. Key parameters are listed below:

Parameter Category	Parameter Name	Parameter Value
Algorithm Parameters	Population Size $ N $	50
	Maximum Iterations $ T_{max} $	200
	Adaptive Weight Range $ [\omega_{min}, \omega_{max}] $	[0.4, 0.9]
	Individual Learning Factor $ \alpha $	0.6
	Social Learning Factor $ \beta $	0.3
Formation Constraints	Minimum Spacing (m)	500
	Maximum Spacing (m)	2000
	Performance Constraints	Max Speed (km/h)	180
Min Speed (km/h)		80
Max Climb Angle (°)		30
Max Turn Angle (°)		45
Weight Coefficients	Path Length $ \omega_1 $	0.3
	Threat Cost $ \omega_2 $	0.3
	Formation Cost $ \omega_3 $	0.2
	Energy Cost $ \omega_4 $	0.2

3.2 Result Analysis

The simulation results of the improved DL-DBO algorithm are analyzed in detail and compared with traditional DBO, Particle Swarm Optimization (PSO), and a standard Genetic Algorithm (GA). Evaluation covers convergence performance, planning quality, and computational efficiency.

Convergence Performance: The convergence curves of the algorithms were plotted over 200 iterations. The improved DL-DBO demonstrated a significantly faster convergence rate, reaching a stable near-optimal value around 85 iterations, compared to 126 for DBO and 145 for PSO. This is attributed to the efficient guidance provided by the decision learning mechanism and adaptive weights.

Planning Quality: The key metrics for the planned paths are summarized in the table below. The DL-DBO algorithm consistently produced superior paths for the China UAV drone formation.

Evaluation Metric	Improved DL-DBO	Traditional DBO	PSO	GA
Average Path Length (km)	142.6	156.8	165.3	170.1
Threat Exposure (Index)	0.184	0.256	0.312	0.345
Formation Deviation (m)	86.5	125.3	156.8	142.7
Energy Consumption (kWh)	425.6	486.2	512.8	531.4
Average Computation Time (s)	12.8	15.6	18.2	20.5
Convergence Iteration	85	126	145	>180

The quantitative improvements are significant. Compared to traditional DBO, the DL-DBO reduces the average path length by approximately 9.1%, threat exposure by 28.1%, and computation time by 17.9%. Compared to PSO, the improvements are even more substantial: a 16.7% reduction in path length, a 46.7% reduction in threat exposure, and a 37.6% reduction in computation time. These results underscore the effectiveness of the proposed enhancements in addressing the complex, multi-objective nature of path planning for China UAV drone swarms.

Robustness Analysis: Additional tests were conducted with varying numbers of threats and drones. The DL-DBO algorithm maintained its performance advantage, showing less degradation in solution quality and computation time as problem complexity increased, confirming its robustness for scalable China UAV drone operations.

4. Conclusion and Future Work

This research proposed an improved Decision Learning Dung Beetle Optimizer (DL-DBO) algorithm and successfully applied it to the cooperative path planning problem for China UAV drones. The introduced strategies—adaptive weight factor, decision learning-based position update, and hybrid crossover/mutation—collectively enhanced the algorithm’s global search capability and local exploitation precision, effectively mitigating the local optimum trap common in basic swarm intelligence algorithms. A comprehensive multi-objective optimization model was established, simultaneously considering path length, threat avoidance, formation keeping, and energy consumption, which aligns well with the practical requirements of China UAV drone missions. Furthermore, a tailored solution strategy involving specialized encoding, adaptive penalty functions, and post-processing modules was designed, ensuring the algorithm’s practicality.

Simulation results demonstrated that the improved DL-DBO algorithm possesses clear advantages over traditional DBO, PSO, and GA in terms of convergence speed, planning quality, and computational efficiency. It provides an efficient and feasible solution for complex cooperative path planning of China UAV drones.

Future work will focus on the following aspects to further advance this research:

Further optimization of the algorithm’s internal structure to reduce computational complexity and enhance efficiency for real-time or large-scale China UAV drone fleet planning.
Incorporation of more realistic constraints, such as communication link maintenance, dynamic weather effects, and varying drone performance models, to increase the practicality of the planning model.
Research into the algorithm’s adaptive capabilities in dynamic environments where threats or targets are moving, a critical scenario for China UAV drone applications.
Exploration of the algorithm’s applicability to other complex optimization problems within the realm of unmanned systems and beyond.