In modern agriculture, lodging in maize poses a significant threat to crop yield and quality, particularly during critical growth stages. As a researcher focused on precision agriculture, I have explored the integration of UAV drone remote sensing with advanced deep learning techniques to address this challenge. This article presents a comprehensive study on monitoring maize lodging across multiple growth stages, leveraging improved semantic segmentation models and height information from digital surface models (DSM). The core innovation lies in fusing DSM data with visible imagery and enhancing the PIDNet model with an efficient multi-scale attention (EMA) module, which collectively boost detection accuracy and generalization. Through extensive experiments, I demonstrate that this approach outperforms existing methods, offering a robust solution for real-time, large-scale lodging assessment. The findings highlight the potential of UAV drone technology in transforming agricultural management practices.
The proliferation of UAV drone systems has revolutionized remote sensing applications in agriculture, enabling high-resolution data collection at unprecedented scales. Maize lodging, often triggered by environmental stressors like wind and rain, leads to reduced photosynthesis, nutrient transport issues, and even plant death. Traditional monitoring methods are labor-intensive and lack scalability, whereas UAV drone platforms provide rapid, cost-effective, and detailed coverage of fields. In this study, I utilize UAV drone imagery captured at key growth stages—silking, milk, and dough—to construct a rich dataset. By combining RGB images with DSM-derived height information, I enhance feature representation for lodging areas, addressing challenges such as occlusion and morphological variations across stages. The improved PIDNet model, augmented with EMA modules, effectively aggregates contextual details and boundary features, resulting in superior segmentation performance. This work underscores the synergy between UAV drone remote sensing and cutting-edge computer vision, paving the way for automated crop health monitoring.

To contextualize this research, I briefly review related efforts in UAV drone-based lodging detection. Previous studies have employed various deep learning models, such as U-Net and DeepLabv3+, for segmenting lodged crops from UAV drone imagery. However, these models often struggle with multi-growth-stage scenarios due to limited adaptability to changing plant features. For instance, some approaches rely solely on RGB data, ignoring height cues that are critical for distinguishing lodged plants from healthy ones. Others incorporate texture or spectral features but lack efficient attention mechanisms to handle complex field conditions. My work builds on these insights by explicitly integrating DSM height maps and introducing EMA modules into PIDNet, a model renowned for its triple-branch architecture handling details, context, and boundaries. This combination not only improves accuracy but also enhances model robustness across diverse UAV drone datasets.
The methodology begins with data acquisition using a DJI Phantom 4 RTK UAV drone, equipped with a high-resolution visible light camera. Flights were conducted at 30 meters altitude during sunny conditions, ensuring minimal atmospheric interference. The collected images were processed using Pix4Dmapper to generate orthomosaics and DSM layers, which were then precisely co-registered. Each pixel in the DSM represents elevation data, crucial for quantifying height reductions in lodged maize. Preprocessing steps included normalization to align value ranges between RGB (0–255) and DSM (0–500) channels, followed by fusion to create four-channel input data. This fusion process is mathematically represented as:
$$ I_{fused} = [I_{RGB}, I_{DSM}] $$
where \( I_{RGB} \) denotes the three-channel RGB image and \( I_{DSM} \) the single-channel height map. The dataset was partitioned by field plots: two for training, one for validation, and one for testing, with data augmentation techniques like noise addition and rotation applied to expand the training set. Overall, thousands of 512×512 image patches were generated per growth stage, ensuring diversity for model training.
The core of my approach is the enhanced PIDNet model, which I modified to accept four-channel inputs and incorporate EMA modules. PIDNet’s original design comprises three branches: a detail branch (P) for high-resolution features, an integral branch (I) for global context, and a differential branch (D) for boundary detection. To leverage UAV drone data effectively, I integrated DSM information into the input layer, allowing the model to learn joint visual-height features. Additionally, I inserted EMA modules at strategic points in the integral branch to boost multi-scale feature aggregation and cross-dimensional interactions. The EMA mechanism reshapes channels into sub-features, reducing computational overhead while preserving information. The modified architecture can be summarized as:
$$ F_{output} = \text{PIDNet}_{EMA}(I_{fused}) $$
where \( F_{output} \) is the segmentation map. The EMA module enhances feature recalibration through parallel branches, improving the model’s ability to capture pixel-level relationships in UAV drone imagery. This is particularly beneficial for lodging detection, where subtle height variations and complex textures require nuanced analysis.
For training, I used an SGD optimizer with a learning rate of 0.001, weight decay of \(5 \times 10^{-6}\), and a batch size of 16. The model was trained for 100 epochs on a server with NVIDIA RTX 3090 GPUs. Evaluation metrics included mean pixel accuracy (mPA) and mean intersection over union (mIoU), defined as:
$$ \text{mPA} = \frac{1}{C} \sum_{c=1}^{C} \frac{TP_c}{TP_c + FP_c} $$
$$ \text{mIoU} = \frac{1}{C} \sum_{c=1}^{C} \frac{TP_c}{TP_c + FP_c + FN_c} $$
where \( C \) is the number of classes (lodged vs. non-lodged), \( TP_c \) denotes true positives, \( FP_c \) false positives, and \( FN_c \) false negatives for class \( c \). These metrics provide a comprehensive assessment of segmentation quality in UAV drone-based monitoring.
The experimental results demonstrate the superiority of the enhanced PIDNet model. I compared it with several baseline models, including U-Net, PSPNet, DeepLabv3+, and UHRNet, using the UAV drone dataset across three growth stages. The performance is summarized in Table 1, which shows mPA and mIoU values for each model. Notably, the improved PIDNet achieved the highest scores, indicating its effectiveness in handling multi-stage lodging detection.
| Model | mPA | mIoU | ||||
|---|---|---|---|---|---|---|
| Silking | Milk | Dough | Silking | Milk | Dough | |
| U-Net | 75.58 | 73.99 | 74.03 | 72.15 | 70.56 | 70.26 |
| PSPNet | 79.51 | 78.80 | 78.01 | 74.64 | 73.81 | 73.92 |
| DeepLabv3+ | 79.89 | 77.97 | 77.66 | 72.74 | 72.18 | 71.56 |
| UHRNet | 84.64 | 80.08 | 81.96 | 75.96 | 72.56 | 73.14 |
| PIDNet (original) | 87.79 | 86.57 | 84.59 | 79.61 | 78.70 | 76.87 |
| Improved PIDNet (proposed) | 91.83 | 91.66 | 90.84 | 83.94 | 82.77 | 82.64 |
The improved PIDNet model consistently outperformed others, with mPA exceeding 90% and mIoU above 82% across all stages. This highlights its robustness in adapting to varying lodging patterns captured by UAV drone sensors. For instance, at the dough stage, where lodging severity increases and plants are often obscured, the model maintained high accuracy, thanks to the EMA modules’ ability to enhance context aggregation. Visualization of segmentation results further confirms that the proposed method reduces false positives and preserves boundary details, unlike baseline models that often misclassify healthy areas or miss lodged patches.
To validate the contributions of individual components, I conducted ablation studies, as shown in Table 2. These experiments assessed the impact of DSM fusion and EMA module integration on PIDNet’s performance. The results clearly indicate that both elements are crucial for achieving optimal detection rates.
| DSM Fusion | EMA Modules | mPA | mIoU | ||||
|---|---|---|---|---|---|---|---|
| Silking | Milk | Dough | Silking | Milk | Dough | ||
| No | No | 82.46 | 80.98 | 80.63 | 74.09 | 72.84 | 72.61 |
| Yes | No | 87.79 | 86.57 | 84.59 | 79.61 | 78.70 | 76.87 |
| Yes | Partial (1 EMA) | 87.73 | 86.42 | 84.43 | 78.30 | 78.55 | 75.68 |
| Yes | Partial (2 EMAs) | 87.69 | 86.04 | 84.92 | 79.58 | 79.08 | 76.59 |
| Yes | Full (3 EMAs) | 91.83 | 91.66 | 90.84 | 83.94 | 82.77 | 82.64 |
The ablation results reveal that DSM fusion alone improves mPA by approximately 5 percentage points across stages, underscoring the value of height information from UAV drone data. Adding EMA modules further boosts performance, with the full configuration (three EMAs) delivering the best outcomes. This synergy enables the model to better generalize across growth stages, a key requirement for practical UAV drone applications in dynamic agricultural environments. The EMA modules facilitate cross-scale feature learning, which is mathematically expressed as:
$$ \text{EMA}(F) = \text{Concat}\left(\text{Attention}_1(F), \text{Attention}_2(F), \dots, \text{Attention}_k(F)\right) $$
where \( F \) represents input features, and each attention head captures different scale dependencies. This mechanism enhances the model’s sensitivity to lodging-related cues in UAV drone imagery, such as subtle height drops or texture changes.
In discussion, I analyze the implications of these findings for UAV drone-based agricultural monitoring. The improved PIDNet model’s high accuracy stems from its dual focus on visual and elevation features, which are abundant in UAV drone datasets. For example, DSM data directly quantifies plant height reductions during lodging, complementing RGB cues like color shifts. The EMA modules then refine feature maps, reducing noise and improving boundary precision. This approach is particularly effective for multi-growth-stage analysis, as evidenced by consistent performance across silking, milk, and dough stages. Compared to prior work, my method offers a more holistic solution by integrating height information and advanced attention mechanisms, addressing limitations in existing UAV drone studies that rely on single-modality data or less adaptable models.
However, challenges remain, such as handling extreme weather conditions or dense crop canopies that may obscure UAV drone imagery. Future research could explore multi-sensor UAV drone systems, combining visible, thermal, and multispectral cameras to enrich feature sets. Additionally, real-time processing algorithms could be developed to enable on-the-fly lodging detection during UAV drone flights, further enhancing operational efficiency. The scalability of this approach also warrants investigation for large-scale farms, where UAV drone fleets might be deployed for continuous monitoring.
In conclusion, this study demonstrates the efficacy of combining UAV drone remote sensing with an enhanced PIDNet model for maize lodging detection. By fusing DSM height maps and incorporating EMA modules, I achieved significant improvements in segmentation accuracy across multiple growth stages. The proposed method outperforms state-of-the-art models, offering a reliable tool for agricultural management. UAV drone technology, coupled with advanced deep learning, holds immense potential for automating crop health assessment, reducing labor costs, and supporting timely interventions. As UAV drone platforms become more accessible, such approaches will play a pivotal role in sustainable agriculture, enabling precise monitoring of lodging and other stressors. This work contributes to the growing body of research on UAV drone applications, underscoring the importance of integrated data and adaptive models in addressing real-world agricultural challenges.
