Intelligent Recognition of UAV Remote Sensing Images Based on Convolutional Neural Networks

In recent years, the rapid advancement of unmanned aerial vehicle (UAV) technology has revolutionized remote sensing applications, particularly in agricultural monitoring. In China, UAV drones have become indispensable tools for large-scale crop surveillance due to their flexibility, cost-effectiveness, and high-resolution imaging capabilities. However, the efficient processing and accurate recognition of UAV remote sensing images remain challenging, especially when dealing with complex feature spaces and class imbalances. Traditional methods, such as manual inspection or basic machine learning algorithms, often fall short in handling the vast amounts of data generated by China UAV drone systems. To address this, we propose an intelligent recognition framework that integrates Convolutional Neural Networks (CNN) with Random Forest algorithms, aiming to enhance the accuracy and efficiency of UAV remote sensing image analysis. This study focuses on the recognition of specific patterns in tobacco leaves, but the methodology is broadly applicable to various agricultural and environmental monitoring tasks using China UAV drone technology.

The core of our approach lies in leveraging CNN’s ability to extract spatial and spectral features from UAV remote sensing images, combined with the robust classification power of Random Forest. We collected a dataset of 500 samples from UAV drone surveys in China, encompassing five typical features: normal, mottled pattern, granular pattern, frosty pattern, and etch pattern. These images were preprocessed through geometric correction and radiometric normalization to ensure consistency. The CNN model, enhanced with an attention mechanism, extracts hierarchical features, which are then fused and fed into a Random Forest classifier for multi-class recognition. We compared our method with Support Vector Machine (SVM), BP neural network, and Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA) to validate its superiority. Experimental results demonstrate that our framework achieves high F1 scores and AUC values, outperforming existing methods and showcasing its potential for real-world applications in China UAV drone-based remote sensing.

The proliferation of China UAV drone systems in agriculture has led to an exponential increase in remote sensing data, necessitating automated and intelligent processing techniques. Remote sensing images captured by UAV drones provide detailed insights into crop health, pest infestation, and environmental stress, but their manual analysis is time-consuming and prone to error. Previous studies have explored various machine learning and deep learning approaches for image recognition. For instance, SVM and BP neural networks have been applied for classification, but they often struggle with high-dimensional and nonlinear data. Hyperspectral imaging, while accurate, is costly and computationally intensive, limiting its scalability for widespread use with China UAV drone fleets. Our work builds upon these foundations by proposing a hybrid model that combines the feature extraction prowess of CNN with the ensemble learning strengths of Random Forest. This integration allows for handling complex feature representations and mitigating class imbalance, common issues in UAV remote sensing datasets from diverse regions in China.

In this paper, we detail the methodology, experimental setup, and results of our intelligent recognition system. We begin by describing the data acquisition and preprocessing steps tailored for China UAV drone imagery. Then, we formulate the CNN architecture and Random Forest algorithm mathematically, using equations to elucidate key processes. Tables are employed to summarize dataset characteristics, parameter settings, and performance metrics. Finally, we discuss the implications of our findings and suggest future research directions to further optimize UAV drone-based recognition systems in China and beyond.

Methodology

Our intelligent recognition framework consists of three main stages: data preprocessing, feature extraction using CNN, and classification with Random Forest. We designed this pipeline specifically for processing images from China UAV drone surveys, ensuring robustness to variations in lighting, scale, and orientation.

Data Acquisition and Preprocessing

We utilized a DJI Phantom 4 Multispectral UAV drone for data collection, which is widely adopted in China for agricultural remote sensing due to its multispectral capabilities and stability. The drone was flown at an altitude of 120 meters, with an 80% image overlap rate and a spatial resolution of 0.03 meters. The dataset includes 500 samples of tobacco leaf images, categorized into five classes: normal (200 samples), mottled pattern (100 samples), granular pattern (120 samples), frosty pattern (100 samples), and etch pattern (80 samples). These samples were gathered from various regions in China to ensure diversity. To prepare the images for analysis, we applied geometric correction to account for distortions and radiometric normalization to standardize pixel values. The preprocessing enhances image quality and facilitates accurate feature extraction, critical for downstream tasks in China UAV drone applications.

For image segmentation, we employed an entropy-based thresholding method. Let the probability of a pixel value $ o $ in the image be denoted as $ p_o $. The threshold $ s $ divides the image into foreground $ c_1 $ and background $ c_2 $. The probability for the foreground is calculated as:

$$
p_{c_1} = \sum_{o=0}^{s} p_o = p_s
$$

The entropy for the foreground is given by:

$$
q_{c_1} = \sum_{o=0}^{s} \frac{p_o}{p_{c_1}} \lg \left( \frac{p_o}{p_{c_1}} \right)
$$

Similarly, for the background:

$$
p_{c_2} = \sum_{o=s+1}^{h-1} p_o = 1 – p_s
$$

And its entropy:

$$
q_{c_2} = \sum_{o=s+1}^{h-1} \frac{p_o}{p_{c_2}} \lg \left( \frac{p_o}{p_{c_2}} \right)
$$

where $ h $ is the maximum gray level. This segmentation aids in isolating regions of interest in UAV drone images. Additionally, we performed geometric normalization to correct pixel coordinates $ (x_1, y_1) $ to $ (x’_1, y’_1) $ using a function $ G $:

$$
\begin{aligned}
x’_1 &= G_{x_1}(x_1, y_1) \\
y’_1 &= G_{y_1}(x_1, y_1)
\end{aligned}
$$

This step ensures that images from different China UAV drone flights are aligned and comparable.

Feature Extraction with Convolutional Neural Network

We designed a CNN model to extract spatial and spectral features from the preprocessed images. The CNN comprises multiple convolutional layers, each followed by activation functions and pooling operations. For an input image $ C $ with corrected pixel coordinates, the output of a convolutional layer $ a $ is computed as:

$$
C^a_{x’_1 y’_1} = \sigma \left( \sum_{b=0}^{s} \sum_{d=0}^{s} C(x’_1 + b, y’_1 + d) \cdot f_a(b, d) + b_a \right)
$$

where $ \sigma $ is the Sigmoid activation function, $ f_a $ is the filter kernel, $ b_a $ is the bias term, and $ s $ is the filter size. To enhance focus on salient features, we incorporated an attention mechanism. Let $ C_1 $ be the initial feature map; the weighted feature map $ C_z $ is obtained through element-wise multiplication with weights $ w_z $:

$$
C_z = C_1 \odot w_z
$$

The weights are derived using a softmax function over learned parameters:

$$
w_z = \text{softmax}(\alpha \cdot \tanh(\beta \cdot C_1))
$$

where $ \alpha $ and $ \beta $ are learnable factors. The fused feature map $ C_Z $ is a weighted sum across $ N $ layers:

$$
C_Z = \sum_{i=1}^{N} w_{z_i} \cdot C_{z_i}
$$

This fusion captures multi-scale information essential for recognizing diverse patterns in China UAV drone imagery. The CNN architecture is optimized for efficiency, considering the computational constraints often faced in processing large datasets from UAV drone deployments in China.

Classification with Random Forest Algorithm

The fused features $ C_Z $ are input into a Random Forest classifier, an ensemble learning method that constructs multiple decision trees. Each tree is built using a bootstrap sample of the data, and splits are determined by maximizing information gain. For the $ j $-th tree at node $ e $, the attribute feature $ \zeta_{je} $ guides the recursive function $ u(C_Z, \zeta_{je}) \in \{0, 1\} $. The information gain $ I_{je} $ is calculated as:

$$
I_{je} = H(S_{je}) – \left( \frac{|S^r_{je}|}{|S_{je}|} H(S^r_{je}) + \frac{|S^l_{je}|}{|S_{je}|} H(S^l_{je}) \right)
$$

where $ S_{je} $ is the sample set at the node, $ S^r_{je} $ and $ S^l_{je} $ are the sample sets for right and left children, respectively, and $ H $ is the entropy function:

$$
H(S_{je}) = \sum_{x=1}^{X} p_x \log_2 p_x
$$

Here, $ p_x $ is the probability of class $ x $, and $ X $ is the number of classes. The Random Forest aggregates predictions from all trees, and the final class is determined by majority voting. This approach mitigates overfitting and improves generalization, crucial for handling variability in China UAV drone data.

To summarize the methodology, Table 1 outlines the key parameters of the UAV drone used for data collection, highlighting its suitability for remote sensing in China.

Table 1: Specifications of the UAV Drone Used for Data Collection in China
Parameter	Specification
Frame Size	289.5 mm × 289.5 mm × 196 mm
Weight	1.48 kg
Diagonal Wheelbase	350 mm
Maximum Power Consumption	360 W
Hovering Power Consumption	330 W
Imaging Capabilities	Multispectral (visible and non-visible light)
Typical Flight Altitude	120 m
Spatial Resolution	0.03 m

Experimental Setup

We conducted experiments to evaluate the performance of our proposed framework. The dataset was split into 80% for training, 10% for validation, and 10% for testing, ensuring a robust assessment. We compared our method with three baseline approaches: SVM, BP neural network, and OPLS-DA. All methods were implemented using Python with libraries such as TensorFlow and scikit-learn, and experiments were run on a high-performance computing cluster to simulate real-world processing conditions for China UAV drone data.

For SVM, we used the Radial Basis Function (RBF) kernel with a penalty factor of 1.0 and a tolerance of $ 1 \times 10^{-3} $. The BP neural network had a learning rate of 0.01, ReLU activation in hidden layers, softmax in the output layer, a batch size of 64, and the Adam optimizer. OPLS-DA was configured with 2 components determined by cross-validation, followed by linear discriminant analysis. Our CNN-Random Forest model employed a learning rate of 0.001, Adam optimizer, 100 training epochs, and a batch size of 64 for CNN, while the Random Forest had 60 trees with automatic depth expansion until convergence. These parameters were tuned via grid search to optimize performance for UAV drone image recognition in China.

Evaluation metrics included F1 score and Area Under the Curve (AUC) value. The F1 score balances precision and recall, making it suitable for class-imbalanced datasets common in UAV drone applications. AUC measures the model’s ability to distinguish between classes, with higher values indicating better performance. We report results averaged over 10 runs to ensure statistical reliability.

Table 2 provides a summary of the dataset distribution, emphasizing the variety of patterns captured by China UAV drone surveys.

Table 2: Dataset Composition for UAV Drone Remote Sensing Image Recognition
Class	Number of Samples	Description
Normal	200	Healthy leaves with smooth texture and green color
Mottled Pattern	100	Irregular reddish-brown patches on leaf surface
Granular Pattern	120	Dense, frog-eye-like granules on edges and surface
Frosty Pattern	100	Fine white crystalline substance covering leaves
Etch Pattern	80	Etched or corroded patterns indicating damage

Results and Analysis

The experimental results demonstrate the effectiveness of our CNN-Random Forest framework for intelligent recognition of UAV remote sensing images. We first present the F1 scores for each class, comparing our method with the baseline approaches. As shown in Table 3, our method achieved superior performance across all categories, highlighting its robustness in handling complex features from China UAV drone imagery.

Table 3: F1 Scores (%) for Different Recognition Methods on UAV Drone Image Classes
Method	Mottled Pattern	Granular Pattern	Frosty Pattern	Etch Pattern
SVM	91.14	90.01	91.00	90.76
BP Neural Network	88.07	88.57	90.00	88.47
OPLS-DA	92.00	92.05	91.75	91.00
Our Method (CNN-RF)	99.00	98.00	98.83	99.00

Our method attained F1 scores of 99% for mottled pattern, 98% for granular pattern, 98.83% for frosty pattern, and 99% for etch pattern. These represent improvements of 7.03% to 10.93% over the best baseline (OPLS-DA), underscoring the advantage of combining deep feature extraction with ensemble classification. The high scores indicate that our framework effectively captures discriminative features, even for minority classes, which is vital for practical applications in China UAV drone-based monitoring where early detection of anomalies is crucial.

Next, we evaluated the overall discriminative power using AUC values. The AUC results, summarized in Table 4, further confirm the superiority of our approach. Our method achieved an AUC of 0.995, significantly outperforming SVM (0.986), BP neural network (0.951), and OPLS-DA (0.965). This corresponds to an AUC improvement of 3.0% to 4.4%, demonstrating that our model excels at distinguishing between healthy and affected leaves in UAV drone images.

Table 4: AUC Values for Different Recognition Methods on UAV Drone Images
Method	AUC Value
SVM	0.986
BP Neural Network	0.951
OPLS-DA	0.965
Our Method (CNN-RF)	0.995

The high AUC value of 0.995 suggests that our framework minimizes false positives and false negatives, making it reliable for large-scale deployment with China UAV drone systems. We attribute this performance to the synergistic effects of CNN’s feature extraction and Random Forest’s classification robustness. The attention mechanism in CNN enhances focus on critical regions, such as disease patterns, while Random Forest handles feature interactions and reduces variance through ensemble learning.

To provide a mathematical perspective, we can express the overall accuracy gain. Let $ \text{F1}_{\text{our}} $ and $ \text{F1}_{\text{base}} $ denote the F1 scores of our method and a baseline, respectively. The percentage improvement $ \Delta \text{F1} $ is calculated as:

$$
\Delta \text{F1} = \frac{\text{F1}_{\text{our}} – \text{F1}_{\text{base}}}{\text{F1}_{\text{base}}} \times 100\%
$$

For instance, compared to OPLS-DA on mottled pattern, $ \Delta \text{F1} = (99 – 92)/92 \times 100\% \approx 7.61\% $. Similarly, for AUC, the improvement $ \Delta \text{AUC} $ is:

$$
\Delta \text{AUC} = \text{AUC}_{\text{our}} – \text{AUC}_{\text{base}}
$$

With $ \text{AUC}_{\text{our}} = 0.995 $ and $ \text{AUC}_{\text{base}} = 0.965 $ for OPLS-DA, $ \Delta \text{AUC} = 0.03 $ or 3.0%. These quantitative gains highlight the efficacy of our framework for UAV drone image recognition in China.

Discussion

The results validate the proposed CNN-Random Forest hybrid model as a powerful tool for intelligent recognition of UAV remote sensing images. Its success stems from addressing key challenges in China UAV drone applications: complex feature spaces due to varied environmental conditions, class imbalance from uneven disease distribution, and the need for real-time processing. By integrating CNN’s hierarchical feature learning with Random Forest’s ensemble decisions, our model achieves a balance between accuracy and computational efficiency.

Compared to traditional methods, our framework offers several advantages. SVM, while effective for small datasets, struggles with nonlinear patterns in high-dimensional UAV drone imagery. BP neural networks may overfit or require extensive tuning, limiting scalability. OPLS-DA, though useful for dimensionality reduction, lacks the depth to capture intricate spatial features. In contrast, our approach automates feature extraction and classification, reducing manual intervention and enhancing reliability for large-scale China UAV drone operations.

The attention mechanism in CNN plays a crucial role by dynamically weighting features, allowing the model to focus on regions of interest, such as disease spots or texture anomalies. This is particularly beneficial for UAV drone images where relevant features may be sparse. Moreover, Random Forest’s ability to handle mixed data types and its inherent robustness to noise align well with the variability in UAV drone data from different regions in China.

However, there are limitations to consider. The performance depends on the quality and diversity of training data. While our dataset included samples from multiple locations in China, expanding to more crop types and environmental conditions could improve generalizability. Additionally, the computational cost of CNN training may be high for resource-constrained settings, though advancements in edge computing could mitigate this for UAV drone deployments.

Future Work

Building on this study, future research will explore several directions to advance UAV drone-based intelligent recognition in China. First, we plan to integrate multi-source data fusion, combining UAV drone imagery with satellite data or ground sensors to enhance robustness and accuracy. This could involve developing hybrid models that leverage complementary information sources, improving detection rates in challenging scenarios.

Second, we aim to investigate automated data annotation techniques to reduce the manual labeling burden. Active learning or semi-supervised methods could be applied to UAV drone datasets, enabling efficient model updates as new data is collected across China. Third, we will study the cross-regional adaptability of our framework, testing it on diverse agricultural landscapes to ensure it remains effective under varying climatic and soil conditions.

Finally, we will focus on model optimization for lightweight deployment. Techniques like model pruning, quantization, or knowledge distillation could reduce the computational footprint, allowing real-time recognition on embedded systems within China UAV drones. This would facilitate on-board processing, minimizing data transmission needs and speeding up decision-making for precision agriculture.

Conclusion

In this paper, we presented an intelligent recognition framework for UAV remote sensing images based on Convolutional Neural Networks and Random Forest algorithms. Designed for China UAV drone applications, our method effectively addresses challenges in feature extraction and classification, achieving high F1 scores and AUC values. Experimental results on a dataset of tobacco leaf images demonstrate its superiority over SVM, BP neural network, and OPLS-DA, with improvements of up to 10.93% in F1 score and 4.4% in AUC. The integration of CNN’s deep learning capabilities with Random Forest’s ensemble strength provides a scalable solution for processing large volumes of UAV drone imagery in China.

This work contributes to the growing field of UAV drone-based remote sensing, offering a practical approach for automated crop monitoring and disease detection. As China continues to expand its UAV drone infrastructure for agriculture, our framework can support sustainable farming practices by enabling timely and accurate interventions. We believe that further refinements, such as multi-data fusion and model lightweighting, will unlock even greater potential for intelligent recognition systems in global UAV drone ecosystems.