Intelligent Recognition of UAV Tobacco Leaf Remote Sensing Images Based on Convolutional Neural Network

In recent years, the application of Unmanned Aerial Vehicle (UAV) technology in agricultural monitoring has gained significant traction due to its flexibility and efficiency. Specifically, the use of UAV remote sensing imagery for crop health assessment, such as tobacco leaf disease detection, presents a promising alternative to traditional manual inspection methods. However, existing approaches, including those based on Support Vector Machines (SVM), BP neural networks, and Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA), often struggle with handling complex feature spaces and class imbalance issues in large-scale image datasets. These limitations can lead to reduced accuracy and efficiency in real-time recognition tasks. To address these challenges, I propose an intelligent recognition framework that integrates Convolutional Neural Networks (CNN) with the Random Forest algorithm. This hybrid model leverages the feature extraction capabilities of CNN and the robust classification power of Random Forest to enhance the recognition of various tobacco leaf patterns from UAV remote sensing images. The primary goal is to achieve high precision and recall in identifying typical leaf features, such as mottled, granular, frosty, and etch patterns, while ensuring scalability for practical agricultural applications.

The core of my approach lies in the synergistic combination of deep learning and ensemble methods. Initially, UAV remote sensing images are collected using advanced multispectral drones, such as the JUYE UAV, which offers high spatial resolution and spectral sensitivity. These images undergo preprocessing steps, including geometric correction and radiometric normalization, to minimize distortions and enhance data quality. The preprocessed images are then fed into a CNN architecture designed to extract spatial and spectral features through multiple convolutional layers. To further improve feature relevance, an attention mechanism is incorporated to assign higher weights to critical regions within the images. The extracted features are subsequently used to train a Random Forest classifier, which performs multi-class recognition by aggregating predictions from multiple decision trees. This integrated framework not only improves recognition accuracy but also addresses class imbalance by leveraging the ensemble nature of Random Forest. In the following sections, I will detail the methodology, experimental setup, and results, demonstrating the effectiveness of this approach compared to conventional methods.

The overall intelligent recognition framework begins with data acquisition using Unmanned Aerial Vehicle systems, specifically the JUYE UAV, which is equipped with multispectral sensors capable of capturing high-resolution imagery. The collected images often contain geometric and radiometric distortions due to factors like sensor orientation and environmental conditions. To mitigate these issues, I apply geometric correction and radiometric normalization. Geometric correction involves transforming image coordinates to align with a reference system, reducing spatial inaccuracies. This can be represented mathematically as:

$$ x’ = G_x(x, y) $$
$$ y’ = G_y(x, y) $$

where (x, y) denotes the original pixel coordinates, (x’, y’) are the corrected coordinates, and G represents the normalization function. Radiometric normalization adjusts pixel intensities to account for variations in illumination, ensuring consistent image quality across different captures. For image segmentation, I use entropy-based thresholding to distinguish foreground regions (e.g., tobacco leaves) from the background. The probability of pixels belonging to the foreground class c1 at threshold s is given by:

$$ p_{c1} = \sum_{o=0}^{s} p_o = p_s $$

where o is the pixel value, and p_o is the probability of o. The entropy for the foreground is computed as:

$$ q_{c1} = \sum_{o=0}^{s} \frac{p_o}{p_{c1}} \log_2 \left( \frac{p_o}{p_{c1}} \right) $$

Similarly, for the background class c2, the probability and entropy are:

$$ p_{c2} = \sum_{o=s+1}^{h-1} p_o = 1 – p_s $$
$$ q_{c2} = \sum_{o=s+1}^{h-1} \frac{p_o}{p_{c2}} \log_2 \left( \frac{p_o}{p_{c2}} \right) $$

where h is the maximum gray level. This segmentation step isolates regions of interest, facilitating subsequent feature extraction.

For feature extraction, I employ a CNN architecture comprising convolutional layers, pooling layers, and fully connected layers. The convolutional layers apply filters to the input images to detect spatial patterns, such as edges and textures. The output of a convolutional layer for feature map a at position (x’, y’) is calculated as:

$$ C_{a}(x’, y’) = \sigma \left( \sum_{b=0}^{s} \sum_{d=0}^{s} C(x’ + b, y’ + d) \cdot f_a(b, d) + b_a \right) $$

where σ is the Sigmoid activation function, C is the input image, f_a is the filter kernel, and b_a is the bias term. To enhance the model’s focus on salient features, I integrate an attention mechanism that assigns weights to feature maps. The weighted feature map C_z is obtained by:

$$ C_z = C_1 \odot w_z $$

where ⊙ denotes element-wise multiplication, and the weight w_z is computed using a softmax function over a transformed version of the input feature map:

$$ w_z = \text{softmax}(\alpha \cdot \tanh(\beta \cdot C_1)) $$

Here, α and β are learnable parameters that adjust the emphasis on different features. The fused feature representation C_Z is then derived by combining multiple weighted feature maps:

$$ C_Z = \sum_{i=1}^{N} w_{zi} \cdot C_{zi} $$

This fusion captures both low-level and high-level features, improving the richness of the input for classification.

The classification stage utilizes a Random Forest algorithm, which constructs an ensemble of decision trees to reduce overfitting and enhance generalization. Each tree in the forest is built using a bootstrap sample of the training data. For a given feature vector C_Z, the prediction of the j-th tree at node e is determined by a recursive function:

$$ u(C_Z, \zeta_{je}) \in \{0, 1\} $$

where ζ_{je} represents the attribute feature at node e. If u(C_Z, ζ_{je}) = 0, the sample traverses to the left subtree; otherwise, it goes to the right. The attribute selection is based on information gain, which measures the reduction in entropy after splitting. The information gain I_{je} for node e is defined as:

$$ I_{je} = H(S_{je}) – \left( \frac{|S_{je}^r|}{|S_{je}|} H(S_{je}^r) + \frac{|S_{je}^l|}{|S_{je}|} H(S_{je}^l) \right) $$

where S_{je} is the set of samples at node e, S_{je}^r and S_{je}^l are the subsets after splitting, and H(S) is the entropy of set S:

$$ H(S) = \sum_{x=1}^{X} p_x \log_2 p_x $$

Here, p_x is the proportion of samples belonging to class x, and X is the number of classes. The final classification is obtained by majority voting across all trees, ensuring robust and accurate recognition of tobacco leaf patterns.

To validate the proposed method, I conducted experiments on a dataset comprising 500 samples of UAV remote sensing images, categorized into five classes: normal, mottled pattern, granular pattern, frosty pattern, and etch pattern. The dataset was split into 80% for training, 10% for validation, and 10% for testing. The JUYE Unmanned Aerial Vehicle was used for image acquisition, with a flight height of 120 meters and a spatial resolution of 0.03 meters. The preprocessing steps included geometric correction and radiometric normalization to ensure data consistency. I compared the proposed CNN-Random Forest hybrid against three baseline methods: SVM, BP neural network, and OPLS-DA. For SVM, I used the Radial Basis Function (RBF) kernel with a penalty factor of 1.0 and a tolerance of 1×10^{-3}. The BP neural network was configured with a learning rate of 0.01, ReLU activation in hidden layers, softmax in the output layer, and Adam optimizer. OPLS-DA was set with two components and linear discriminant analysis for classification. In the proposed method, the CNN had a learning rate of 0.001, Adam optimizer, 100 training epochs, and a batch size of 64, while the Random Forest consisted of 60 trees with automatic depth expansion.

The evaluation metrics included F1 score and Area Under the Curve (AUC) value, which are suitable for assessing performance in imbalanced datasets. The F1 score is the harmonic mean of precision and recall, computed as:

$$ F1 = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}} $$

where Precision = TP / (TP + FP) and Recall = TP / (TP + FN), with TP, FP, and FN representing true positives, false positives, and false negatives, respectively. The AUC value measures the model’s ability to distinguish between classes, with higher values indicating better performance. The results demonstrated that the proposed method achieved superior F1 scores across all pattern classes compared to the baseline methods. For instance, the F1 scores for mottled, granular, frosty, and etch patterns were 99%, 98%, 98.83%, and 99%, respectively, representing improvements of 7.03% to 10.93% over the other methods. Similarly, the AUC value of the proposed method was 0.995, which is 3.0% to 4.4% higher than the benchmarks. These findings highlight the effectiveness of combining CNN and Random Forest for handling complex feature spaces and class imbalance in UAV-based remote sensing image recognition.

Comparison of F1 Scores for Different Recognition Methods
Method	Mottled Pattern	Granular Pattern	Frosty Pattern	Etch Pattern
Proposed CNN-RF	99.00%	98.00%	98.83%	99.00%
SVM	91.14%	90.01%	91.00%	90.76%
BP Neural Network	88.07%	88.57%	90.00%	88.47%
OPLS-DA	92.00%	92.05%	91.75%	91.00%

The AUC values further confirm the robustness of the proposed approach. The BP neural network achieved an AUC of 0.951, while SVM and OPLS-DA reached 0.986 and 0.965, respectively. In contrast, the CNN-Random Forest hybrid attained an AUC of 0.995, indicating excellent discrimination between healthy and diseased tobacco leaves. This performance can be attributed to the CNN’s ability to extract hierarchical features and the Random Forest’s capacity to handle nonlinear relationships and reduce variance through ensemble learning. Additionally, the attention mechanism in the CNN enhances focus on relevant image regions, improving feature quality. The use of Unmanned Aerial Vehicle imagery, particularly from the JUYE UAV, ensures high-quality input data, which is crucial for accurate recognition. These results underscore the potential of the proposed method for real-world applications in precision agriculture, where timely and accurate disease detection can lead to improved crop management and yield.

In conclusion, the integration of Convolutional Neural Networks and Random Forest algorithms presents a powerful solution for intelligent recognition of tobacco leaf patterns from UAV remote sensing images. The proposed framework addresses key challenges such as feature complexity and class imbalance, achieving high F1 scores and AUC values compared to traditional methods. The success of this approach relies on the synergistic combination of deep learning for feature extraction and ensemble methods for classification, supported by robust preprocessing of Unmanned Aerial Vehicle data. Future work will focus on incorporating multi-source data fusion to enhance model robustness, developing automated data annotation techniques to reduce manual effort, and optimizing the model for cross-regional adaptability and lightweight deployment. By advancing these aspects, the system can be scaled for broader agricultural monitoring applications, ultimately contributing to sustainable farming practices and improved crop health management using technologies like the JUYE UAV.