Research on Drone Formation Cooperative Control with Semantic Fusion in Uncertain Environments

In recent years, the application of drone formation has expanded significantly across various fields, including geological surveying, emergency rescue, intelligence reconnaissance, maritime scenario detection, and artistic performances. The ability of drone formation to perform complex tasks that are challenging for a single drone has driven extensive research into cooperative control strategies. However, operating in uncertain environments—characterized by dynamic factors such as weather conditions, geographic spatial information, and communication system states—poses substantial challenges for drone formation control. Traditional approaches often model the entire system from a holistic perspective, aiming to achieve unified control decisions through coordinated parallel solutions. While these methods can ensure system stability, they tend to encounter spatial continuous optimization problems as the number of drones increases, leading to computational inefficiencies and difficulties in describing and predicting flight trajectories and member positions. To address these issues, multi-agent control models have been proposed, leveraging competition and cooperation among agents to manage协同 behaviors within the drone formation. However, under uncertain influences, information-sharing mechanisms can be hindered, exacerbating the heterogeneity among system members and complicating control decisions.

From an information processing perspective, I propose a novel drone formation cooperative control method that integrates semantics to handle uncertainties. This approach constructs a framework incorporating uncertain situation detection, uncertain behavior recognition, and a semantic strategy ontology model. By leveraging Bayesian networks for situation detection and a reinforcement learning method based on individual activating expectation values, the learned knowledge can be transferred to similar new tasks, updating the semantic ontology model in real-time. This method not only enhances the adaptability of drone formation in uncertain environments but also improves decision-making efficiency through semantic fusion. In this article, I will detail the framework, methodologies, and simulation results, emphasizing the关键词 drone formation throughout to underscore its relevance.

Cooperative Control Model

The core of my proposed method lies in a structured cooperative control model designed to manage the flow of uncertain data in drone formation operations. This model addresses background constraints and information perception challenges by dividing the control process into three interconnected modules: uncertain situation detection, uncertain behavior recognition, and a semantic strategy ontology model. Each module plays a critical role in ensuring robust drone formation control under dynamic conditions.

Cooperative Control Framework

The uncertain situation detection module computes the probabilities of various events, triggering situation detection based on Bayesian network inference in conjunction with the semantic strategy ontology model. This module consists of three sub-modules: environmental perception, task execution detection, and system status detection. The environmental perception module detects surrounding environmental information that influences the current drone formation’s behavior control, such as weather patterns and terrain features. The task execution module monitors the progress and performance of the drone formation’s mission planning, ensuring alignment with objectives. The system status module assesses the internal states of the drone system, including engine control, position control, and attitude control, which are vital for maintaining stability and responsiveness.

The uncertain behavior recognition module analyzes information detected by the uncertain situation detection module. Using a reinforcement learning method based on individual activating expectation values, this module transfers learned knowledge to similar new tasks and updates the uncertain task ontology model. This dynamic adaptation allows the drone formation to continuously refine its control strategies based on experiential learning.

The semantic strategy ontology model serves as a knowledge base built on OWL (Web Ontology Language), storing semantically enriched data related to maps, environments, tasks, and states. Its functions include rule reasoning, behavior task updates, and real-time maintenance. This model acts as the underlying framework for the entire cooperative control system, providing a unified and规范化 data representation that facilitates information processing. As illustrated in the framework, drone formation obtains external uncertain information through sensors, generates probabilities of behavior occurrences and their impacts on control, and integrates this with strategic planning knowledge into the semantic ontology model. This integrated data supports the behavior recognition module, while perceptual behaviors, system states, and task strategies jointly trigger cooperative control conditions. A visualization interface enables real-time interaction between operators and drones, enhancing the judgment of system status and parameter information. Ontology, as a tool for formal knowledge representation, offers effective descriptions of concepts and instances for background knowledge, enabling semantic-fused cooperative control services.

Implementation of Semantic Strategy Ontology Model

The semantic strategy ontology model is constructed by acquiring raw information from sensors to generate ontologies for environmental perception, task execution, and system status. This involves defining related concepts, attributes, and instances. The model encompasses three primary ontologies: behavior state ontology, environmental ontology, and task strategy ontology. The behavior state ontology describes the drone system’s status, including engine state, position state, attitude adjustment, and speed adjustment. Through instance relationships, it extracts parameters from dynamic models, such as horizontal tail deflection angle, vector rudder deflection angle, pitch angle, and center angle. The environmental ontology derives data from sensors and visual geographic information systems, focusing on geographic environment concepts, instances, and relationships. This includes meteorological environments and map concepts, with instances ranging from points, lines, and surfaces to complex environmental phenomena data. The task strategy ontology intelligently determines flight strategies based on current user instructions, incorporating instances like GPS navigation, map search, flight obstacles, flight key points, and flight target points.

To illustrate the relationships within the ontology, consider the following table summarizing key concepts and their attributes:

Ontology Component	Concepts	Instances	Attributes
Behavior State	Engine Control, Attitude Control, Position Control	MotorControl_1, AttitudeControl_1, PositionControl_2	Propulsion value, Filtering value, Collocation value
Environmental	Weather, Terrain, Obstacles	RainyCondition, MountainArea, Obstacle_1	Intensity, Elevation, Coordinates
Task Strategy	Navigation, Search, Avoidance	GPS_Nav_1, TargetSearch_2, AvoidObstacle_3	Waypoints, Priority, RiskLevel

The integration of these ontologies enables a comprehensive representation of the drone formation’s operational context, facilitating semantic reasoning and control decisions.

Situation Detection Based on Ontology Reasoning

To effectively detect situations in uncertain environments, I employ an ontology reasoning mechanism that converts the semantic strategy ontology model into a Bayesian network structure. This allows for probabilistic inference, synthesizing detected information into recognizable data exchange formats. The process begins by using Apache Jena API to extract relevant concepts and instances from the semantic strategy ontology model. The OWL-based model is then transformed into a Bayesian network graph, where Bayesian inference analyzes the information comprehensively.

Ontology Reasoning Mechanism

Through Apache Jena API, all related concepts and instances in the semantic strategy ontology are retrieved. For example, consider a code snippet representing drone状态 instances:

<MotorControl rdf:ID="MotorControl_1"/>
<propulsion rdf:ID="propulsion_1">
  <hasValue rdf:datatype="http://www.w3.org/2001/XMLSchema#float">0.855</hasValue>
</propulsion>
<AttitudeControl rdf:ID="AttitudeControl_1"/>
<filtering rdf:ID="filtering_1">
  <hasValue rdf:datatype="http://www.w3.org/2001/XMLSchema#float">0.519</hasValue>
</filtering>
<PositionControl rdf:ID="PositionControl_2"/>
<Collocation rdf:ID="Collocation_1">
  <hasValue rdf:datatype="http://www.w3.org/2001/XMLSchema#float">0.365</hasValue>
</Collocation>
<Situation rdf:ID="Situation_1"/>
  <hasValue rdf:datatype="http://www.w3.org/2001/XMLSchema#float">0.736</hasValue>
</Situation>

In this snippet, MotorControl_1 represents an instance of a single drone’s engine state, propulsion_1 indicates the normal coefficient of current state propulsion with a probability of 0.855, AttitudeControl_1 denotes attitude reachability, filtering represents Kalman filter observation data with a noise influence factor of 0.519, PositionControl signifies the system’s position, Collocation indicates co-location parameters with a value of 0.365, and Situation_1 represents the current moment’s situational information with an influence degree of 0.736 from the three controls on perception.

For such data formats, custom rules like connect_fact can link associated drones, such as drone A1’s state propulsion coefficient propulsion_1 with drone A2, generating an extended fact库 con_link. This is achieved through Jena’s inference engine, which traverses all nodes and applies rule-based reasoning for simple events, ensuring the completeness and validity of the semantic strategy ontology. For complex issues involving multiple drones, Bayesian network inference with conditional probability distributions is utilized.

Bayesian Network Inference Method

The Bayesian network system parses concepts and instances from the semantic strategy ontology, constructing rules to generate nodes, edges, and conditional probability tables (CPTs). The steps are as follows:

Preprocess the detected current event data using Laplace smoothing, and select concept node $x$ from the task strategy ontology that matches the preprocessed event via Jena API.
Update and expand this node to $x = (x_1, x_2, \ldots, x_n)$ with parent node $C$, where the joint probability is:
$$P(C, x_1, x_2, \ldots, x_n) = P(C) \prod_{i=1}^{n} P(x_i | C)$$
Generate causal edges through OWL instance relationships and form a conditional probability distribution file. For example, a CPT might represent the influence of filtering, collocation, and propulsion on control levels, as shown below:

Conditional Probability Distribution Table for Control Levels
P(ctl/level)	Filtering (f0, f1, f2)	Collocation (c0, c1, c2)	Propulsion (p0, p1)
L0	0.89, 0.56, 0.37	0.22, 0.47, 0.28	0.59, 0.36
L1	0.61, 0.46, 0.51	0.67, 0.62, 0.71	0.49, 0.55
L2	0.35, 0.29, 0.18	0.91, 0.71, 0.48	0.78, 0.45

In this table, Filtering has three values (f0, f1, f2) representing Kalman filter observation values for system noise under complete, suitable, and unsuitable conditions, affecting drone attitude reachability outcomes. Collocation has three values (c0, c1, c2) indicating co-location parameters based on a median of 0.500, influencing drone positions with high, moderate, and low effects. Similarly, Propulsion represents the outcome of drone propulsion force factors. By applying Bayesian networks to uncertain perceptual information, nodes represent the degree of influence of uncertain information on current tasks and states. For a set of current uncertain information, Bayesian networks detect the probability of events and update the semantic strategy ontology model accordingly.

Reinforcement Learning Based on Individual Activating Expectation

To recognize uncertain data detected by the behavior detection module, enabling the drone formation to continuously transmit formation changes and surrounding information to members while updating current behavior states, I propose a reinforcement learning method based on individual activating expectation values. The core idea is to treat the current drone formation system as a learning network $G_{inti}$, the semantic strategy ontology model as a指导 network $S$, and each drone member as a node in $G_{inti}$. By calculating the expectation values of nodes and edges, a maximized expectation network $G_{max}$ is generated for $G_{inti}$,强制 learning the behavior of the指导 network $S$ in each state. Finally, the learned knowledge is transferred to similar new tasks, updating the指导 network $S$ with probability extensions in real-time.

Calculation of Individual Activating Expectation Values

Let the threshold vector for the expected position of any drone be $\delta_v$, with each drone as a node in the learning network $G_{inti}$. The maximized expectation degree is estimated based on the activating expectation of nodes and edges, weighted by the joint distribution probability from Bayesian networks.

Definition 1 (Edge Activating Expectation Calculation): Let $v$ be any inactive drone node in the learning network $G_{inti}$. $IN(v)$ is the set of active入 neighbor nodes of $v$. Let $u$ be any inactive入 node of $v$. The expectation value of node $u$ activating $v$ via edge $e(u, v)$ is denoted as $GH(u, v)$:
$$GH(u, v) = \min\left\{ \frac{w_{u,v}}{\delta_v – \sum_{x \in IN(v)} w_{x,v}}, 1 \right\}$$
where $GH(u, v) \in [0, 1]$; $w_{u,v}$ is the weight between two nodes, provided by Bayesian probability distributions. When $v$ is inactive, $\delta_v > \sum_{x \in IN(v)} w_{x,v}$. Otherwise, $v$ is active. If $GH(u, v) = 1$, $v$ can be directly activated by $e(u, v)$. Further, node expectation values can be calculated based on edge activating expectations.

Definition 2 (Node Activating Expectation Calculation): Let $OUT(v)$ be the set of out-edge neighbors of node $v$. The $l$-step expectation contribution value of $v$ is $GH_l(v)$ (for $l \geq 2$), calculated as:
$$GH_l(v) = \sum_{u \in OUT(v)} \left( GH(v, u) + GH(v, u) \times GH_{l-1}(u) \right)$$
For $l = 1$, the expectation value of node $v$ with its connected $l$-step neighbor nodes is:
$$GH_1(v) = \sum_{u \in OUT(v)} GH(v, u)$$
Here, $GH_l(v)$ represents the value of node $v$ transitioning from an unactivated state to an activated state, reflecting $v$’s influence on $l$-step neighbor nodes. Based on Definition 2, the $GH$ values of nodes can be computed iteratively, forming a network $G_{max}$ with maximum expectation.

Reinforcement Learning Method

When nodes and edges in the network are activated, resulting in the maximum expectation network $G_{max} = \{G_1, G_2, \ldots, G_N\}$, the指导 network $S = \{S_1, S_2, \ldots, S_N\}$ (i.e., the ontology network) can be converted into a policy network by outputting a Q-value Boltzmann distribution:
$$\pi_{S_i}(a | G_{max}) = \frac{e^{\tau^{-1} Q_{S_i}(G_{max}, a)}}{\sum_{a’ \in A_{S_i}} e^{\tau^{-1} Q_{S_i}(G_{max}, a’)}}$$
where $\tau$ is an influence factor, and $A_{S_i}$ is the action space of指导 network $S_i$. For each state in $G_{max}$, a policy regression objective function is defined based on the cross-entropy between the learning network policy and the指导 network policy:
$$L_i^p(\theta) = -\sum_{a \in A_{S_i}} \pi_{E_i}(a | G_{max}) \log \pi_{AMN}(a | G_{max}; \theta)$$
Here, $\pi_{AMN}(a | G_{max}; \theta)$ guides the behavior of the current learning network $G_{max}$, with the指导 network’s output policy serving as a stable supervised training signal that continuously aligns the current network’s behavior with that of the指导 network.

Algorithm Implementation

The algorithm for implementing this reinforcement learning method is outlined below:

Input: Network $G_{inti}(V, E)$, association initialization set length $k$.
Output: Updated semantic strategy ontology $S$ and $G_{max}$.
Determine network instances $g(v, e(u, v)) \in G$ by random expectation threshold vector $\delta$, initialize $G_{max} \leftarrow \emptyset$.
For each $g \in G$:
- Take $g$ as an initial set to be activated. For each node $v \in V$ in $g$, obtain its入 edge weight $w_{u,v}$ from Bayesian probability distribution tables.
While $(|g_i| < k$ and $|g_i| < k)$:
- Compute edge activating expectation value $GH(u, v)$ using Definition 1.
- Compute node activating expectation value $GH_l(v)$ using Definition 2.
For $v \in V$:
- Calculate activating expectation node: $EGH(v) \leftarrow \sum_{g \in G} GH(v) / |G|$.
Select the node with maximum expectation: $v \leftarrow \arg\max_{v \in V \setminus S_i} EGH(v)$.
Update maximum expectation network: $G_{max} \leftarrow G_{max} \cup \{v\}$.
Convert the指导 network $S$ into a policy network using the Boltzmann distribution formula.
Solve the policy regression objective function.
Update the current task strategy ontology model $S$.
Return $S$ and $G_{max}$.

This algorithm begins by randomly selecting an unactivated node $v_1$. For step length $l = 2$, using Definitions 1 and 2, we might get values like $GH_2(v_1) = 0.347$, $GH_2(v_2) = 1.252$, $GH_2(v_3) = 0.548$, $GH_2(v_4) = 0.378$, $GH_2(v_5) = 0.195$. The node with the highest activating expectation, say $v_2$, is chosen as the activating expectation node. Since $v_2$ influences $v_3$ and $v_4$, after $v_2$ is activated, the activating expectation values of directed edges指向 $v_3$ and $v_4$ change. Thus, these edges and inactive nodes’ expectation values are recalculated, and the node with the maximum expectation is selected as the next激活 node, forming a maximum expectation network $G_{max}$. Finally, by solving the policy regression objective function, the current task ontology $S$ is updated.

Simulation Results and Analysis

To validate the effectiveness of my proposed drone formation cooperative control method, I conducted simulation experiments on the NetLogo platform. The simulations involved four drones operating at an altitude of 200 meters in an uncertain marine environment, with added风雨 conditions to simulate dynamic challenges. The simulation time was set to 60 seconds, with a sampling period of 5 seconds. Data were sourced from an Automatic Identification System (AIS), comprising岸基 and船载 device data, including existing真实 data for风雨 scenarios. Compared to real-world scenarios, these experiments allowed for random布置 of scenes to reduce simulation time for equilibrium states. To address data imbalances in the network and feature interference beyond prediction ranges—such as instances not present in the strategy ontology库—Laplace smoothing was applied to preprocess detected current event data. Random methods could severely impact control, so this preprocessing step ensured robustness. Additionally, to enhance practicality and rationality, the experiments were repeated 10 times, with average values taken as final results to mitigate random errors.

Key Point Reachability Control Analysis

The initial positions of the four drones were randomly set. Upon receiving flight information, each drone aimed to fly to expected positions based on designated target key points. As shown in the figure, the drones started from arbitrary locations, guided by the semantic strategy ontology to fly toward target key points. Sensors acquired uncertain information for perception, generating behavior occurrence probabilities to provide uniform data formats for behavior recognition. Arrows in the simulation indicate the expected directions generated by individual activating expectation values, ultimately forming a diamond-shaped drone formation, achieving the effect of key target point reachability.

The relative distance errors between drones gradually decreased through the semantic-fused drone formation cooperative control method. Before 25 seconds, drone 1 maintained a higher飞行 speed than drones 2, 3, and 4. After 25 seconds, the four-drone system趋于 stable, flying progressively with identical姿态 and positions. Further analysis shows that the distances between any two drone members converged quickly and stabilized at constant values, forming a stable diamond formation distance. This is because when the drone formation system generates the OWL semantic strategy ontology, it实时 converts to Bayesian networks to analyze and judge the formation’s operational state, triggering corrections for the entire system’s协作状态. In practical engineering applications, if a drone’s flight state deviates due to external or internal factors, the current member is activated through node and edge activating expectation calculations,强制 learning the expected goals of the current指导 network, and updating the instance库 of the semantic strategy ontology in real-time.

To quantify this, consider the distance error data summarized in the following table:

Distance Errors Between Drones Over Time
Time (s)	Drone 1-2 Distance (m)	Drone 1-3 Distance (m)	Drone 1-4 Distance (m)	Drone 2-3 Distance (m)
0	150.2	200.5	180.3	120.8
10	100.7	150.9	130.4	90.5
20	60.3	110.2	85.6	65.2
30	40.1	80.4	55.8	45.7
40	30.5	60.2	40.3	35.4
50	25.0	50.1	30.2	30.1
60	20.0	45.0	25.0	25.0

The data demonstrate rapid convergence and stabilization, highlighting the effectiveness of the drone formation control under semantic fusion.

Obstacle Avoidance Control

The drone formation effectively avoided obstacle threats, with flight trajectories exhibiting minimal弯曲, smooth transitions, and flexible operations among members. After obstacle clearance, the formation quickly restored its original shape. This capability stems from processing uncertain perceptual information through Bayesian network inference and applying the individual expectation contribution reinforcement learning algorithm, which updates the semantic strategy ontology with probability extensions. This allows for rapid detection of uncertain data, enabling obstacle avoidance while maintaining stable drone formation飞行 control. The output results for drone formation cooperative control in uncertain environments are summarized below:

Output Results for Drone Formation Cooperative Control in Uncertain Environments
Sampling Period (s)	Uncertain Event and Probability	Event Node	GH Value
10	Key Point (0.46)	v3, v1	0.677
15	Random Obstacle (0.68)	v9	0.712
20	Propulsion (0.33)	v3	0.593
25	Random Obstacle (0.38)	v5	0.408
30	Route Deviation (0.59)	v5	0.376
35	Collocation (0.28)	v1, v8	0.582
40	Kalman Filter (0.41)	v8	0.730
45	Propulsion (0.17)	v8	0.832

The table shows that events like key point reachability occur starting at the 60th second, with drones at nodes v3 and v1 having a probability of 0.46 for unreachable key points. Using my method, GH values are computed to form a maximum expectation network and update the ontology model. From an engineering practicality perspective, the algorithm outputs results every 5 seconds; for similar tasks, the task strategy ontology can directly trigger actions, reducing computational load and ensuring data rationality and科学性.

Comparative Analysis

To further evaluate the performance of drone formation cooperative control, I used a cost function as a key metric, measuring control performance through evolutionary curves. My method was compared with the pigeon-inspired optimization (PIO) algorithm and a multi-agent algorithm. The cost function is defined as:
$$C(t) = \sum_{i=1}^{N} \left( \| \mathbf{p}_i(t) – \mathbf{p}_i^{\text{des}}(t) \|^2 + \lambda \| \mathbf{v}_i(t) – \mathbf{v}_i^{\text{des}}(t) \|^2 \right)$$
where $N$ is the number of drones, $\mathbf{p}_i(t)$ and $\mathbf{v}_i(t)$ are the position and velocity of drone $i$ at time $t$, $\mathbf{p}_i^{\text{des}}(t)$ and $\mathbf{v}_i^{\text{des}}(t)$ are the desired position and velocity, and $\lambda$ is a weighting factor.

The comparison results are summarized in the following table and discussed below:

Comparison of Convergence Generations and Cost Values
Method	Convergence Generation	Final Cost Value	Stability
Proposed Method	6	15.2	High
Pigeon Algorithm	30	25.8	Moderate
Multi-Agent Algorithm	N/A (fluctuating)	35.4	Low

My method achieved convergence at the 6th generation, obtaining the optimal solution quickly. This is attributed to the策略回归目标函数 defined by the cross-entropy between learning and指导 network strategies during individual activating expectation calculations, which outputs a stable supervised training signal, allowing individuals to趋于 stable and consistent states in practical applications. In contrast, the pigeon algorithm converged at the 30th generation, showing less efficiency. The multi-agent algorithm systematically addressed local optima but陷入 unstable states as iterations increased, highlighting the superiority of my semantic-fused approach for drone formation control.

Conclusion

In this article, I addressed the uncertainties in drone formation control related to perceptual behaviors, system states, and task planning by proposing a cooperative control framework from an information processing perspective. This framework integrates uncertain environment detection, uncertain behavior recognition, and a semantic strategy ontology model. I implemented ontology-based situation detection using Bayesian networks and a reinforcement learning method based on individual activating expectation values. The simulation results demonstrated that my method ensures rapid convergence and stability in relative distances between any two drones, effectively judges key point reachability and obstacle avoidance, and outperforms existing methods like pigeon and multi-agent algorithms in terms of convergence speed and stability. This approach holds significant potential for applications in complex marine environments, such as target search and抢险救援, where drone formation operations are critical. Future work will involve scaling simulations to larger drone formations, optimizing learning algorithms for drone networks, and integrating cloud computing and big data platforms to further enhance cooperative control efficiency and robustness in dynamic scenarios.