Text Mining and Social Network Analysis of Civilian Drone Accident Causes

In recent years, the rapid expansion of the low-altitude economy has propelled civilian drones into widespread use across various sectors, including logistics, emergency response, and agriculture. As the number of registered civilian drones continues to grow globally, safety concerns have become increasingly prominent. Accident reports involving civilian drones often highlight complex interdependencies among causal factors, necessitating a systematic approach to identify and analyze these causes. In this study, we leverage text mining and social network analysis (SNA) to dissect accident causes for civilian drones, aiming to uncover hidden patterns and relationships that traditional subjective analyses might overlook. Our objective is to provide a data-driven foundation for enhancing the safety and reliability of civilian drone operations, thereby supporting the healthy development of the low-altitude economy industry.

The integration of civilian drones into daily operations brings both opportunities and challenges. While civilian drones offer efficiency and versatility, their accident rates raise alarms about potential risks to public safety. Previous research has often focused on subjective risk assessments or isolated incident types, leaving a gap in comprehensive, objective analyses of accident causes and their interconnections. To address this, we employ text mining techniques to process unstructured accident reports and social network analysis to model the relationships among identified causes. This combined approach allows us to transform qualitative accident narratives into quantitative insights, facilitating a deeper understanding of the causal landscape for civilian drones.

Our methodology begins with the construction of a corpus from 122 civilian drone accident reports sourced from authoritative aviation safety databases in countries like the United Kingdom, the United States, and Australia. These reports, spanning from 2007 to 2023, were selected based on relevance and clarity, excluding incidents with unknown causes or involving non-drone aerial vehicles. The text from sections such as “accident summary,” “event description,” and “cause analysis” was extracted and translated into English for consistency. This corpus serves as the foundation for our text mining pipeline, which is implemented using Python due to its flexibility and robust libraries for natural language processing.

The text preprocessing phase is critical for accurate analysis. We utilize the Jieba library for Chinese text segmentation, augmented with a custom lexicon specific to civilian drone terminology to improve segmentation accuracy. Stop words, including common but insignificant terms like “flight” and “accident,” are removed using an expanded stop word list based on the Harbin Institute of Technology list. Additionally, a merging lexicon is applied to normalize synonymous expressions; for instance, terms such as “low voltage,” “insufficient battery,” and “battery depletion” are unified under “low battery power.” This preprocessing ensures that the text data is clean and structured for further analysis. The process is iterative, with lexicons dynamically adjusted until optimal results are achieved, as summarized in the table below.

Table 1: Example of Text Preprocessing Steps for Civilian Drone Accident Reports
Step	Description	Example Input	Output
Segmentation	Split text into tokens using Jieba with custom lexicon.	“The operator’s incorrect input to the ground station controller led to loss of control.”	[“operator”, “incorrect input”, “ground station controller”, “loss of control”]
Stop Word Removal	Eliminate non-informative words.	[“operator”, “incorrect input”, “ground station controller”, “loss of control”]	[“incorrect input”, “ground station controller”, “loss of control”]
Term Merging	Normalize synonyms to consistent terms.	[“loss of control”]	[“flight失控”] (later translated to “flight失控” for analysis, but standardized in English as “flight失控” for consistency; note: in practice, we use English terms like “flight失控” after translation).

Following preprocessing, we perform feature extraction using the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm. This method evaluates the importance of terms within individual documents relative to the entire corpus, helping to identify keywords that are significant yet not overly common. The TF-IDF value for a term $i$ in document $j$ is calculated as:

$$ \text{TF-IDF}_{ij} = \text{TF}_{ij} \times \text{IDF}_i $$

where term frequency (TF) is:

$$ \text{TF}_{ij} = \frac{n_{ij}}{N_j} $$

with $n_{ij}$ being the number of occurrences of term $i$ in document $j$, and $N_j$ the total number of terms in document $j$. The inverse document frequency (IDF) is:

$$ \text{IDF}_i = \log \left( \frac{D}{d_i + 1} \right) $$

where $D$ is the total number of documents in the corpus, and $d_i$ is the number of documents containing term $i$. We compute TF-IDF scores for all terms across the corpus and select those with values greater than 1 as keywords representing accident causes for civilian drones. This yields 23 key terms, as shown in the table below, which are visualized in a word cloud to highlight their relative significance.

Table 2: Keywords Extracted from Civilian Drone Accident Reports Using TF-IDF
Keyword ID	Keyword (Cause)	TF-IDF Value
T1	Flight失控	28.887
T2	Mechanical Structural Failure	22.207
T3	Propulsion System Failure	16.996
T4	Crash	16.216
T5	Mid-Air Collision	15.516
T6	Low Battery Power	13.284
T7	Communication Interruption	11.386
T8	Unexpected Weather Factors	10.736
T9	Regulatory Violation in Flight	10.645
T10	Insufficient Inspection	6.983
T11	Software Error	6.530
T12	Battery System Failure	6.318
T13	Operational Error	6.055
T14	Insufficient Experience and Skills	4.947
T15	Motor and Electronic Speed Controller Failure	3.803
T16	Unreasonable Program Design	3.649
T17	Navigation System Failure	3.461
T18	Substandard Design and Manufacturing Quality	2.928
T19	Incomplete Operational Guidelines	2.552
T20	Obstructed Line of Sight	2.393
T21	Bird Strike	2.198
T22	Magnetic Field Interference	2.044
T23	Inadequate Supervision and Management	1.342

The word cloud visualization emphasizes that intrinsic issues related to civilian drones, such as mechanical structural failure and propulsion system failure, are predominant causes of accidents. This aligns with the operational nature of civilian drones, where high automation often reduces direct human control, and technological immaturity compared to traditional aviation poses ongoing challenges. The frequent appearance of these terms underscores the need for enhanced reliability in civilian drone systems.

To explore the interrelationships among these causes, we construct a co-occurrence network using social network analysis. A co-occurrence matrix is generated based on the simultaneous appearance of keyword pairs within the same accident reports, with a rule that multiple co-occurrences in one report count only once. This matrix is then imported into Gephi and Ucinet software to visualize and analyze the network. The network density is calculated as 0.889 with a standard deviation of 0.356, indicating strong connectivity among nodes. Each node represents a cause keyword, and edges denote co-occurrence relationships, with edge thickness reflecting the strength of association.

We conduct centrality analysis to assess the importance of each node in the network. Three metrics are used: degree centrality, closeness centrality, and betweenness centrality. Degree centrality measures the number of direct connections a node has, closeness centrality evaluates the average distance from a node to all others, and betweenness centrality quantifies the node’s role as a bridge in the network. The results, summarized in the table below, reveal that nodes like flight失控 (T1), insufficient inspection (T10), and propulsion system failure (T3) exhibit high centrality values, signifying their pivotal roles in civilian drone accident causation.

Table 3: Centrality Analysis of Civilian Drone Accident Cause Network
Cause Node (Keyword ID)	Degree Centrality	Closeness Centrality	Betweenness Centrality
T1 (Flight失控)	434	22	2.52
T10 (Insufficient Inspection)	360	22	2.52
T3 (Propulsion System Failure)	271	22	2.52
T2 (Mechanical Structural Failure)	258	23	1.84
T5 (Mid-Air Collision)	240	22	2.52
T4 (Crash)	234	22	2.52
T8 (Unexpected Weather Factors)	230	23	1.84
T16 (Unreasonable Program Design)	225	23	1.03
T9 (Regulatory Violation in Flight)	183	22	2.52
T6 (Low Battery Power)	168	24	0.49
T7 (Communication Interruption)	162	23	1.03
T15 (Motor and ESC Failure)	153	25	0.38
T11 (Software Error)	150	24	0.67
T20 (Obstructed Line of Sight)	122	23	1.03
T12 (Battery System Failure)	110	25	0.30
T14 (Insufficient Experience and Skills)	91	25	0.62
T13 (Operational Error)	90	26	0.24
T19 (Incomplete Operational Guidelines)	76	23	2.03
T17 (Navigation System Failure)	50	26	0.34
T23 (Inadequate Supervision and Management)	43	24	0.91
T18 (Substandard Design and Manufacturing Quality)	30	28	0.06
T22 (Magnetic Field Interference)	26	30	0.05
T21 (Bird Strike)	18	35	0.00

Building on centrality analysis, we perform core-periphery structure analysis to delineate the network into core and peripheral regions. This helps identify which causes are central to the network and which are more marginal. Using Ucinet, we classify the 23 causes into core and peripheral sets. The core causes, which exhibit strong mutual interactions, include eight nodes: flight失控 (T1), mechanical structural failure (T2), propulsion system failure (T3), crash (T4), mid-air collision (T5), unexpected weather factors (T8), insufficient inspection (T10), and unreasonable program design (T16). The remaining 15 causes are peripheral. The average network density within the core region is 23.857, significantly higher than the peripheral region’s density of 2.648, underscoring the intense interconnectivity among core causes in civilian drone accidents.

To further elucidate the relationships between core and peripheral causes, we construct core-periphery cause sets. For each core cause $ T_i $ in the core set $ S_c = \{T1, T2, T3, T4, T5, T8, T10, T16\} $, we identify the peripheral causes $ T_{pi} $ in the peripheral set $ S_p = \{T6, T7, T9, T11, T12, T13, T14, T15, T17, T18, T19, T20, T21, T22, T23\} $ that are strongly associated with it. This is represented as $ S_{T_i} = \{T_{pi}\} $, where $ T_i \in S_c $ and $ T_{pi} \in S_p $. For example, for the core cause mid-air collision (T5), the associated peripheral causes include regulatory violation in flight (T9), obstructed line of sight (T20), low battery power (T6), communication interruption (T7), software error (T11), operational error (T13), motor and ESC failure (T15), incomplete operational guidelines (T19), and insufficient experience and skills (T14). This set, denoted $ S_{T5} = \{T9, T20, T6, T7, T11, T13, T15, T19, T14\} $, indicates that when these peripheral factors co-occur, they significantly elevate the risk of mid-air collisions involving civilian drones.

Similarly, we derive core-periphery sets for other core causes. The frequent appearance of common peripheral causes across multiple core sets, such as low battery power (T6) and software error (T11), suggests that these factors act as catalysts that amplify the impact of core causes. This interconnectedness highlights the complexity of accident causation for civilian drones, where isolated issues can cascade into broader failures. For instance, the core cause flight失控 (T1) is linked to peripheral causes like motor and ESC failure (T15), communication interruption (T7), and regulatory violation in flight (T9), implying that technical malfunctions combined with human factors often lead to loss of control in civilian drone operations.

Our analysis reveals that intrinsic failures within civilian drone systems, such as mechanical and propulsion issues, are primary drivers of accidents. This finding stresses the importance of enhancing the design, manufacturing, and maintenance standards for civilian drones. Moreover, the core causes—particularly flight失控, insufficient inspection, and unexpected weather factors—demand prioritized attention in safety protocols. For instance, improving pre-flight inspection procedures and integrating real-time weather monitoring systems could mitigate these risks. The peripheral causes, while less central, should not be ignored; their associations with core causes suggest that addressing them can prevent cascading failures. For example, bolstering training programs for civilian drone operators to reduce operational errors and ensuring robust battery management systems can indirectly curb core cause activation.

From a practical standpoint, our study offers actionable insights for stakeholders involved in the civilian drone ecosystem. Manufacturers should focus on reliability engineering, incorporating fail-safe mechanisms and rigorous testing for critical components like propulsion and battery systems. Regulatory bodies can use our findings to refine safety guidelines, emphasizing mandatory inspections and weather assessments for civilian drone flights. Operators and organizations employing civilian drones should invest in comprehensive training and establish clear operational guidelines to minimize human-related errors. By targeting both core and peripheral causes, these measures can collectively enhance the safety landscape for civilian drones, fostering greater public trust and sustainable growth in the low-altitude economy.

In conclusion, through text mining and social network analysis, we have systematically identified and analyzed the causes of accidents involving civilian drones. Our approach transforms unstructured accident reports into a structured network model, revealing 23 key causes with 8 core and 15 peripheral elements. The strong interconnectivity among core causes underscores the need for integrated safety strategies, while the core-periphery relationships highlight the potential for preventive actions targeting peripheral factors. This research not only advances the academic understanding of civilian drone safety but also provides a foundation for evidence-based risk management. As civilian drones continue to proliferate, such data-driven analyses will be crucial for ensuring their safe integration into airspace and promoting the orderly development of the low-altitude economy industry. Future work could expand the corpus to include more diverse data sources or employ dynamic network analysis to track evolving risk patterns for civilian drones over time.