Optimizing Police UAV Drones Patrol with Multimodal ALNS Algorithm

In contemporary public security operations, the complexity of urban environments demands advanced solutions for maintaining law and order. As a researcher in intelligent optimization algorithms, I recognize the critical role of UAV drones in establishing a multi-dimensional治安 patrol system. UAV drones, with their superior mobility and multi-source information acquisition capabilities, have become indispensable tools. However, coordinating multiple UAV drones from a single nest for police patrols presents significant challenges: balancing endurance and coverage, ensuring inspection quality in key areas, and accommodating priority differences among patrol points. These spatiotemporal constraints intertwine, forming a typical NP-hard combinatorial optimization problem that requires robust and dynamically adaptive path planning solutions.

Early research often employed meta-heuristic methods like genetic algorithms, which offer global search capabilities but tend to fall into local optima when handling complex aerial constraints. To address this, I propose a novel chaotic-reinforcement-learning-driven multimodal adaptive large neighborhood search (CRL-ALNS) algorithm. This approach integrates a deep reinforcement learning framework for intelligent operator selection and chaotic sequences to enhance global exploration. The algorithm is designed to tackle the dual-objective optimization model for UAV drones patrols, considering time window constraints and priority-based coverage. In this article, I will detail the problem formulation, algorithmic framework, and experimental validation, demonstrating its effectiveness in improving patrol efficiency for UAV drones.

The deployment of UAV drones in police patrols involves multiple UAV drones operating from a central nest to service various patrol points. These points are categorized based on strategic importance: primary points (e.g., government buildings) require full coverage, while secondary points (e.g., general public areas) demand a baseline coverage threshold. Each patrol point has strict time windows for service, and UAV drones must adhere to endurance limits in terms of flight time and distance. The goal is to minimize total flight distance and the number of UAV drones used, ensuring efficient resource utilization. This scenario is representative of real-world applications where UAV drones must navigate urban landscapes with precision.

To formalize the problem, let us define the following model. Consider a set of nodes $V = \{0, 1, \dots, n\}$, where node $0$ represents the central nest for UAV drones. Let $V_p$ denote the set of primary patrol points and $V_s$ denote the set of secondary patrol points. The UAV drones fleet is indexed by $k \in K = \{1, 2, \dots, m\}$. The Euclidean distance between patrol points $i$ and $j$ is $d_{ij}$, and the flight time is $t_{ij}$. Each patrol point $i$ has a service time $s_i$, a time window $[e_i, l_i]$ for arrival, and UAV drones have maximum flight time $T_{\text{max}}$ and maximum range $D_{\text{max}}$. Decision variables include $x_{ijk} \in \{0,1\}$, indicating if UAV drone $k$ travels from $i$ to $j$, and $y_{ik} \in \{0,1\}$, indicating if patrol point $i$ is served by UAV drone $k$. The arrival time at point $i$ is $t_i$, and $u_{ik}$ is an auxiliary variable for sub-tour elimination.

The dual-objective optimization model is formulated as follows. The primary objective is to minimize total flight distance for all UAV drones:

$$ \min Z_1 = \sum_{k \in K} \sum_{i \in V} \sum_{j \in V} d_{ij} x_{ijk} $$

The secondary objective is to minimize the number of UAV drones deployed:

$$ \min Z_2 = \sum_{k \in K} y_{0k} $$

Subject to constraints that ensure complete coverage of primary points and at least 90% coverage of secondary points for UAV drones:

$$ \sum_{k \in K} y_{ik} = 1, \quad \forall i \in V_p $$

$$ \frac{1}{|V_s|} \sum_{i \in V_s} \sum_{k \in K} y_{ik} \geq 0.9 $$

Other constraints include flow conservation, time window adherence, and endurance limits for UAV drones. For flow conservation:

$$ \sum_{j \in V} x_{ijk} = y_{ik}, \quad \forall i \in V, \forall k \in K $$

$$ \sum_{i \in V} x_{ijk} = y_{jk}, \quad \forall j \in V, \forall k \in K $$

For time windows, with $M$ as a large constant to relax constraints when no travel occurs:

$$ t_j \geq t_i + s_i + t_{ij} – M(1 – x_{ijk}), \quad \forall i,j \in V, i \neq j, \forall k \in K $$

$$ t_i + s_i + t_{ij} \leq t_j + M(1 – x_{ijk}), \quad \forall i,j \in V, i \neq j, \forall k \in K $$

$$ e_i \leq t_i \leq l_i, \quad \forall i \in V $$

For endurance limits of UAV drones:

$$ \sum_{i \in V} \sum_{j \in V} t_{ij} x_{ijk} + \sum_{i \in V} s_i y_{ik} \leq T_{\text{max}}, \quad \forall k \in K $$

$$ \sum_{i \in V} \sum_{j \in V} d_{ij} x_{ijk} \leq D_{\text{max}}, \quad \forall k \in K $$

And sub-tour elimination using Miller-Tucker-Zemlin formulation:

$$ u_{ik} – u_{jk} + |V| \cdot x_{ijk} \leq |V| – 1, \quad \forall i,j \in V, i \neq j, \forall k \in K $$

This model captures the intricacies of coordinating UAV drones for police patrols, balancing multiple objectives and constraints. Solving it efficiently requires advanced algorithmic techniques, which leads to the proposed CRL-ALNS approach.

The CRL-ALNS algorithm builds upon the adaptive large neighborhood search (ALNS) framework, which dynamically selects destruction and repair operators based on historical performance. However, traditional ALNS methods often rely on单一模态 strategies, limiting their ability to balance exploration and exploitation. To overcome this, I integrate a multimodal operator selection mechanism driven by reinforcement learning and chaotic optimization. This enhances the adaptability of UAV drones path planning in dynamic environments.

The algorithm begins with an initial solution generated using a greedy heuristic for UAV drones assignment. Then, in each iteration, it employs a destruction operator to remove部分 patrol points from the current solution and a repair operator to reinsert them, guided by the multimodal strategy. The core innovation lies in three components: an adaptive hybrid chaotic mechanism, a reinforcement learning-based operator selection strategy, and a multimodal operator selection policy. These components work synergistically to optimize paths for UAV drones.

First, the adaptive hybrid chaotic mechanism replaces pseudo-random number generators with chaotic sequences to inject exploratory perturbations. This involves a composite chaotic system combining Logistic and Tent maps, defined as follows. For a chaotic variable $c_n$, the Logistic map is:

$$ c_{n+1} = \mu c_n (1 – c_n) $$

where $\mu$ is a control parameter. The Tent map is:

$$ c_{n+1} = \begin{cases}
\alpha c_n & \text{if } c_n < 0.5 \\
\alpha (1 – c_n) & \text{otherwise}
\end{cases} $$

By switching between these maps based on adaptive probabilities, the system achieves better ergodicity and avoids periodic windows, enhancing global search for UAV drones paths. Additionally, an adaptive chaotic weight $\omega$ is adjusted using a non-linear decay strategy:

$$ \omega = \omega_{\text{max}} – (\omega_{\text{max}} – \omega_{\text{min}}) \times \left( \frac{\text{iter}}{\text{max\_iter}} \right)^\gamma $$

where $\text{iter}$ is the current iteration, $\text{max\_iter}$ is the maximum iterations, and $\gamma$ controls decay rate. This weight modulates perturbation intensity during neighborhood search for UAV drones.

Second, the reinforcement learning-based operator selection strategy uses a deep Q-network (DQN) to choose destruction-repair operator pairs intelligently. The state space $s_t$ encodes current solution features, search progress, and operator history. It includes normalized iteration count, objective function ratio relative to best solution, diversity metric based on Hamming distance, and operator performance history via sliding window averages. The action space $a_t$ selects a pair from Cartesian product $D \times R$ of destruction and repair operators. The reward function $r_t$ balances immediate improvement, milestone achievements, and feasibility for UAV drones paths:

$$ r_t = \lambda_1 \cdot \frac{L(S_{\text{current}}) – L(S_{\text{candidate}})}{|L(S_{\text{current}})|} + \lambda_2 \cdot I(L(S_{\text{candidate}}) < L(S^*) ) + \lambda_3 \cdot \Phi(S_{\text{candidate}}) $$

where $L(\cdot)$ is the cost function (e.g., total distance), $S^*$ is the historical best solution, $I(\cdot)$ is an indicator function, $\Phi(\cdot)$ is a feasibility reward, and $\lambda_1, \lambda_2, \lambda_3$ are weighting coefficients summing to 1. The DQN architecture employs experience replay and target networks, trained via stochastic gradient descent to approximate Q-values for operator selection in UAV drones planning.

Third, the multimodal operator selection strategy combines reinforcement learning decisions with probabilistic selection and emergency response modules. The probabilistic selection uses a Boltzmann distribution based on operator performance ratings $\eta_i$:

$$ P_i = \frac{\exp(\eta_i / \tau)}{\sum_{j=1}^K \exp(\eta_j / \tau)} $$

where $\tau$ is a temperature parameter controlling randomness. This complements the RL strategy by providing stability. Additionally, when search stagnation is detected, a定向突破 strategy increases destruction operator intensity and prioritizes underutilized operators to escape local optima for UAV drones paths.

The CRL-ALNS algorithm iterates until convergence, continuously refining solutions for UAV drones patrol routes. Its multimodal approach ensures robust performance across diverse scenarios. To validate effectiveness, I conducted extensive experiments on randomly generated test instances simulating urban patrol environments for UAV drones.

The experimental setup involves a square patrol area of 20 km × 20 km, with a central nest for UAV drones. Patrol points are uniformly distributed: 50 primary points and 150 secondary points, generated using random seeds to ensure reproducibility. UAV drones parameters include a speed of 20 km/h, maximum flight time of 6 hours, and no fixed range上限 in tests to evaluate algorithm极限性能. The algorithm is compared against an improved genetic algorithm (GA) with population size 100, crossover rate 0.85, and mutation rate 0.05. Both algorithms are run 20 times per instance, and average results are reported.

Table 1 summarizes the CRL-ALNS performance on ten instances, showing significant optimization in total flight distance and number of UAV drones used. The initial solutions are generated via a greedy heuristic, and CRL-ALNS consistently improves them.

Random Seed	Initial Distance (km)	CRL-ALNS Distance (km)	Optimization Rate	Initial UAV Drones Count	CRL-ALNS UAV Drones Count	UAV Drones Saved
42	2303.9	355.4	84.6%	21	14	7
44	2329.7	282.8	87.9%	24	9	15
46	2308.1	303.2	86.9%	24	10	14
48	2302.1	311.5	86.5%	23	10	13
50	2187.2	326.2	85.1%	24	12	12
52	2373.3	280.6	88.2%	25	9	16
54	2179.7	307.5	85.9%	23	11	12
56	2216.1	301.1	86.4%	26	9	17
58	2128.3	317.5	85.1%	25	11	14
60	2101.1	280.2	86.7%	24	9	15
Mean	2243.0	306.6	86.3%	23.9	10.4	13.5

The data indicates that CRL-ALNS reduces total flight distance by an average of 86.3% and saves 13.5 UAV drones on average, demonstrating efficient resource allocation for UAV drones patrols. The optimization is stable across different spatial distributions, highlighting the algorithm’s adaptability.

Table 2 compares CRL-ALNS with the improved genetic algorithm (GA) on the same instances. CRL-ALNS achieves superior performance in both distance and UAV drones count reduction.

Random Seed	GA Distance (km)	CRL-ALNS Distance (km)	Optimization Rate vs GA	GA UAV Drones Count	CRL-ALNS UAV Drones Count	UAV Drones Saved vs GA
42	974.7	355.4	63.5%	16	14	2
44	1540.3	282.8	81.6%	17	9	8
46	1108.1	303.2	72.6%	19	10	9
48	1562.4	311.5	80.1%	20	10	10
50	1018.5	326.2	68.0%	15	12	3
52	1566.1	280.6	82.1%	20	9	11
54	1524.9	307.5	79.8%	20	11	9
56	761.0	301.1	60.5%	10	9	1
58	914.9	317.5	65.3%	13	11	2
60	1058.6	280.2	73.5%	19	9	10
Mean	1203.0	306.6	72.7%	16.9	10.4	6.5

CRL-ALNS reduces total flight distance by 72.7% on average compared to GA, and decreases UAV drones usage by 38.5% (from 16.9 to 10.4). Moreover, CRL-ALNS exhibits lower standard deviation in results (20.9 km vs. 332.8 km for GA), indicating greater stability for UAV drones patrol planning. The convergence curves show rapid improvement within 250 iterations and stabilization after 500 iterations, thanks to the reinforcement learning guidance and chaotic perturbations.

The effectiveness of CRL-ALNS stems from its multimodal design. The chaotic mechanism ensures thorough exploration of the solution space for UAV drones paths, while reinforcement learning optimizes operator selection based on real-time state. The probabilistic and emergency strategies add robustness, making the algorithm resilient to varying patrol scenarios. These features are crucial for real-world applications where UAV drones must adapt to dynamic conditions.

In conclusion, the proposed CRL-ALNS algorithm addresses the NP-hard challenge of police UAV drones协同 patrol path planning with a novel integration of chaos and reinforcement learning. By formulating a dual-objective model and implementing a multimodal operator selection mechanism, it achieves significant improvements in flight distance and resource efficiency for UAV drones. Experimental results validate its superiority over traditional methods, showcasing稳定 performance across diverse test instances. This approach has practical implications for enhancing public security operations, offering a scalable solution for coordinating UAV drones in urban environments. Future work may extend the algorithm to multi-nest scenarios or incorporate real-time data from UAV drones for adaptive replanning.