A Deep Learning Approach with Unmanned Drone Multispectral Imaging for Wheat Stress and Management Evaluation

In modern agriculture, the precise and timely assessment of abiotic stresses and the efficacy of subsequent mitigation strategies is paramount for ensuring crop productivity and food security. Our research presents a novel, non-destructive framework for identifying waterlogging stress in wheat and evaluating the effectiveness of different foliar regulatory treatments. This framework synergistically integrates high-resolution multispectral imagery captured by unmanned drone platforms with a customized deep convolutional neural network, specifically an enhanced Visual Geometry Group (VGG) architecture.

The proliferation of unmanned drone technology has revolutionized agricultural remote sensing. These platforms offer unparalleled flexibility in data acquisition, enabling the capture of high-spatial-resolution imagery at critical crop growth stages. When equipped with multispectral sensors, unmanned drones provide rich datasets containing information beyond the visible spectrum, which is highly sensitive to subtle changes in plant physiology caused by environmental stress. Our methodology leverages this capability to construct a robust phenotyping pipeline.

The core of our analytical engine is a deep learning model. While standard architectures like VGG19 are proven, we hypothesized that further optimization for our specific data type—five-channel multispectral patches—could yield superior performance. We developed the VGG21 model by strategically modifying the network’s depth and structure. The adaptations included increasing the number of convolutional layers, reorganizing convolutional blocks, and replacing standard activation functions. The performance of this model was systematically compared against other established networks, including ResNet50 and Swin-Transformer, to validate its efficacy for this agricultural computer vision task.

The experimental data was generated under controlled conditions. Wheat plants were subjected to waterlogging stress during two sensitive growth phases: jointing-booting and flowering-grain filling. Alongside stressed and control groups, two regulatory treatments—silicon-based fertilizer and a compound amino acid solution—were applied to mitigate the stress effects. A key unmanned drone, the DJI P4 Multispectral, was deployed repeatedly during the experiment to capture canopy-level reflectance data across five distinct bands: Blue (450±16 nm), Green (560±16 nm), Red (650±16 nm), Red Edge (730±16 nm), and Near-Infrared (840±26 nm). This unmanned drone-based data collection ensured consistent, high-quality spectral information for model development.

The raw imagery from the unmanned drone required substantial preprocessing. This pipeline included radiometric correction using calibration panels, geometric alignment, and the synthesis of individual bands into a cohesive multispectral image. A critical step was the removal of non-plant pixels, such as soil and shadow, to isolate the wheat canopy signal. This was achieved by calculating a vegetation index, such as the Enhanced Vegetation Index (EVI), and applying a mask. The EVI is computed as:

$$EVI = \frac{NIR – Red}{NIR + C_1 \times Red – C_2 \times Blue + L}$$

For simplicity and efficiency in our initial masking, a linear variant was used. Following preprocessing, images were categorized into four classes: Control (CK), Waterlogging Stress (WS), Silicon Regulation (SR), and Amino Acid Regulation (AR). These images were then split into small, uniform patches to create a vast dataset suitable for deep learning, which was subsequently divided into training, validation, and test sets using a stratified sampling approach.

The architecture of our proposed VGG21 model is detailed below. It accepts an input tensor of dimensions 48×48×5, corresponding to the spatial size and the five spectral channels from the unmanned drone. The model consists of four convolutional blocks, each followed by a max-pooling layer, and culminates in three fully-connected layers for classification.

Layer Block	Configuration	Output Shape
Input	–	48×48×5
Conv Block 1	3×3 Conv (64), LeakyReLU ×3	48×48×64
Pooling 1	2×2 MaxPool	24×24×64
Conv Block 2	3×3 Conv (128), LeakyReLU ×3	24×24×128
Pooling 2	2×2 MaxPool	12×12×128
Conv Block 3	3×3 Conv (256), LeakyReLU ×6	12×12×256
Pooling 3	2×2 MaxPool	6×6×256
Conv Block 4	3×3 Conv (512), LeakyReLU ×6	6×6×512
Pooling 4	2×2 MaxPool	3×3×512
Classifier	Flatten, FC(1024), Dropout(0.5), FC(1024), Dropout(0.5), FC(4), Softmax	4

The model was trained using the AdamW optimizer with a cosine annealing learning rate scheduler. The loss function was label-smoothed cross-entropy, and performance was evaluated using standard metrics: Accuracy, Precision, Recall, and F1-Score. Their formulas are given by:

$$Accuracy = \frac{TP + TN}{TP + TN + FP + FN}$$

$$Precision = \frac{TP}{TP + FP}$$

$$Recall = \frac{TP}{TP + FN}$$

$$F1\text{-}Score = 2 \times \frac{Precision \times Recall}{Precision + Recall}$$

where TP, TN, FP, and FN represent True Positives, True Negatives, False Positives, and False Negatives, respectively.

Our first analysis benchmarked the VGG21 model against other architectures on a binary classification task (Silicon Regulation vs. Stress). The results, derived from the unmanned drone multispectral dataset, clearly demonstrate the superiority of the VGG-based models for this specific data type.

Model	Accuracy (%)	Params (M)	Precision (SR/WS) (%)	Recall (SR/WS) (%)
ResNet50	74.42	22.43	83.77 / 60.29	76.13 / 71.08
Swin-Transformer	83.53	6.58	88.86 / 75.47	84.55 / 81.76
VGG19	90.14	18.59	96.15 / 81.08	88.48 / 93.30
VGG21	91.04	21.40	96.42 / 82.95	89.53 / 93.88
VGG23	90.07	24.22	97.94 / 78.17	87.15 / 96.16

The VGG21 model achieved the highest overall accuracy, indicating its optimal balance between depth and representational power for features extracted from unmanned drone imagery. While VGG23 had slightly higher precision for the Silicon Regulation class, it suffered from lower precision for the Stress class and a drop in overall accuracy, suggesting potential overfitting or inefficiency.

The ultimate validation of any stress mitigation strategy is grain yield. Our field measurements confirmed the physiological impact of waterlogging and the positive effect of regulatory treatments. The data from the flowering-grain filling stage, a period critical for yield determination, showed a consistent trend.

Treatment (Flowering-Grain Filling Stage)	Average Yield (kg/ha)	Yield Change vs. Stress (%)
Control (No Stress)	5210	+37.8
Waterlogging Stress (15 days)	3780	0.0 (Baseline)
Silicon Regulation	4520	+19.6
Amino Acid Regulation	4180	+10.6

This yield analysis confirmed that both foliar applications had a positive mitigatory effect, with the silicon-based treatment outperforming the amino acid treatment. This agronomic result provides crucial context for interpreting the performance of our unmanned drone-based recognition models.

We then trained separate VGG21 binary classifiers to distinguish each regulatory treatment from waterlogging stress. The performance disparity between the two models was striking and aligned with the yield data.

Binary Classification Model	Overall Accuracy (%)	Class	Precision (%)	Recall (%)	F1-Score (%)
Stress vs. Silicon Regulation	91.05	Silicon Regulation	96.42	89.53	92.85
Stress vs. Silicon Regulation	91.05	Stress	82.95	93.88	88.08
Stress vs. Amino Acid Regulation	69.62	Amino Acid Regulation	85.35	70.59	77.27
Stress vs. Amino Acid Regulation	69.62	Stress	45.53	66.97	54.21

The model for silicon regulation achieved excellent metrics, with F1-scores above 88% for both classes. In contrast, the model for amino acid regulation struggled, particularly with correctly identifying stressed plants (low precision). This suggests that the spectral signature of amino-acid-treated plants under stress was less distinct from untreated stressed plants, making classification more challenging. Interestingly, the superior “Silicon vs. Stress” model demonstrated robust generalization when applied to the amino acid test data, achieving performance comparable to the specifically trained amino acid model.

Finally, we tackled the more complex multiclass problem of distinguishing between Control, Stress, and Silicon Regulation. The VGG21 model’s performance is summarized below.

Class	Precision (%)	Recall (%)	F1-Score (%)
Control (CK)	65.88	67.91	66.88
Waterlogging Stress (WS)	62.15	67.68	64.80
Silicon Regulation (SR)	95.77	88.71	91.80

The model maintained exceptional performance for the Silicon Regulation class, with an F1-Score over 91%. However, it faced significant difficulty discriminating between Control and Stress samples, as reflected in their lower F1-scores. This confusion likely stems from the high phenotypic variability within the control group and the potentially moderate or variable stress intensity applied, leading to overlapping spectral features in the data captured by the unmanned drone. The high precision for the regulation class confirms that its spectral phenotype is unique and readily identifiable.

The findings from this integrated study are significant. First, they validate that unmanned drone-based multispectral imaging is a powerful tool for non-invasively detecting not just abiotic stress, but also the physiological recovery induced by mitigation strategies. The spectral and spatial data provided by the unmanned drone was rich enough for a deep learning model to learn highly diagnostic patterns. Second, the custom VGG21 model proved to be particularly well-suited for this task, outperforming other contemporary architectures on our specific dataset. Its hierarchical feature extraction mechanism effectively captured the salient patterns from the five-channel unmanned drone imagery.

Most importantly, a strong correlation exists between the agronomic efficacy of a treatment and its “recognizability” by the model. The silicon-based treatment, which yielded the most substantial recovery in grain yield, also produced the most distinct spectral phenotype, resulting in near-perfect classification metrics. This alignment suggests that the deep learning model is not merely recognizing arbitrary patterns but is sensitive to the underlying physiological health of the crop, which is ultimately reflected in yield. The confusion between control and stressed plants indicates a limitation and an opportunity: future work should involve more severe or longer-duration stress gradients to create more distinct spectral separability, or incorporate temporal sequences of unmanned drone flights to capture the dynamics of stress progression.

In conclusion, our research establishes a viable and effective pipeline for precision agriculture applications. By combining the agility and high-resolution sensing of an unmanned drone with the powerful pattern recognition capabilities of a tailored VGG21 deep learning model, we can rapidly and accurately identify waterlogging stress in wheat and evaluate the success of different management interventions. This approach moves beyond simple stress detection towards assessing crop health and treatment efficacy, providing a data-driven tool for making informed decisions. This technology holds great promise for enabling more resilient and efficient crop management systems, allowing for targeted application of resources and timely intervention to safeguard yield. Future work will focus on expanding the model’s robustness across more varieties, environments, and stress types, further solidifying the role of unmanned drone and AI synergy in sustainable agriculture.