YOLOv12n-RCL: Drone-Based Detection of Sunflower Disk Rot Severity

In modern precision agriculture, the efficient and accurate assessment of crop diseases is critical for early intervention, targeted pesticide application, and yield loss estimation. Sunflower, one of the major oil crops, is severely threatened by disk rot caused by Sclerotinia sclerotiorum. Traditional manual field surveys for evaluating disease severity are labor-intensive, subjective, and lack timeliness. To overcome these limitations, we propose a novel deep learning approach that leverages high-resolution unmanned aerial vehicle (UAV) imagery and an improved YOLOv12n model—named YOLOv12n-RCL—for grading sunflower disk rot severity at the mature stage.

Our study is centered on the integration of drone technology for data acquisition and a series of architectural enhancements to achieve both high detection accuracy and lightweight deployment. The entire workflow—from data collection to model inference and edge deployment—demonstrates the transformative potential of drone technology in enabling real-time, field-scale disease monitoring.

1. Introduction

Sunflower (Helianthus annuus L.) is valued for its tolerance to drought and poor soil conditions, making it a preferred crop for saline-alkali land remediation. However, disk rot disease caused by Sclerotinia sclerotiorum leads to empty seeds, reduced yield, and diminished kernel quality. Timely and accurate grading of disease severity is essential for precision management. Conventional manual inspection is not only inefficient but also subjective, especially when dealing with tall sunflower plants where flower heads are often hidden under leaves. Drone technology offers a transformative solution: UAVs equipped with high-resolution cameras can rapidly cover large fields and acquire images with detailed spatial information, enabling objective and repeatable disease assessment.

Recent advances in deep learning, particularly the YOLO (You Only Look Once) family, have shown great promise for object detection in agricultural applications. However, detecting sunflower disk rot from UAV imagery presents unique challenges: varying lesion morphologies, subtle differences between severity grades, leaf occlusion, and complex background interference. To address these, we improved YOLOv12n by: (1) replacing standard C3K2 modules with C3K2-RC modules that integrate receptive-field attention convolution (RFAConv) and coordinate attention (CA); (2) incorporating a lightweight content-aware upsampling operator CARFAE in the neck network; and (3) introducing a lightweight shared convolution detection head with separated batch normalization (LSCSBD). The resulting YOLOv12n-RCL model achieves a superior balance between accuracy and computational efficiency, fully exploiting the advantages of drone technology for large-scale disease monitoring.

2. Materials and Methods

2.1 Data Acquisition via Drone Technology

All field data were collected using a multi-rotor UAV (DJI M300) equipped with a Zenmuse P1 visible-light camera. The flight altitude was 30 m, speed 2.5 m/s, and front/side overlap 80%. The camera had a 35 mm equivalent focal length and captured images at 8192×5460 pixels in JPEG format. A total of 700 original images were obtained from a sunflower field at the mature stage. The use of drone technology allowed us to cover a large area efficiently while maintaining high spatial resolution, which is critical for distinguishing fine disease features.

2.2 Disease Severity Grading

We defined five severity grades based on the ratio of lesion area to total flower head area (computed via pixel counting): Grade 0 (0%), Grade 1 (1–25%), Grade 2 (26–50%), Grade 3 (51–75%), and Grade 4 (76–100%). The ratio is calculated as:

$$P_b = \frac{P_t}{P_h}$$

where \(P_b\) is the lesion proportion, \(P_t\) the number of lesion pixels, and \(P_h\) the total flower head pixels.

2.3 Dataset Construction

After cropping and filtering the original UAV images, we obtained 1,300 valid sub-images. They were split into training (1,040 images), validation (130), and test sets (130) following an “8:1:1” ratio. To enhance generalization, training images were augmented online using translation, rotation, color jitter, and brightness adjustment, expanding to 4,160 images. The label counts per grade before and after augmentation are summarized in Table 1.

Table 1: Sample distribution of sunflower disk rot dataset
Grade Training Validation Test Augmented Training
0 7,108 1,012 900 28,432
1 6,517 855 872 26,068
2 6,633 881 767 26,532
3 6,400 860 847 25,600
4 6,817 810 873 27,364

3. YOLOv12n-RCL Architecture

3.1 Baseline YOLOv12n

YOLOv12n is a state-of-the-art single-stage detector that uses area attention (A2 modules) and residual-efficient aggregation networks (Residual-ELAN). It removes position encoding, optimizes MLP ratios, and introduces large separable convolutions. While efficient, its ability to capture fine-grained disease features in complex UAV scenes is limited.

3.2 Improvements for Drone-Based Disease Detection

Three key modifications were made to tailor YOLOv12n to the challenges of sunflower disk rot detection from drone technology imagery:

3.2.1 C3K2-RC Module (RFAConv + CA)

The standard C3K2 modules in both backbone and neck were replaced with C3K2-RC modules. Within each C3K2-RC, the bottleneck uses RFAConv (which expands effective receptive field via attention-weighted grouping) followed by a coordinate attention (CA) layer. CA embeds positional information into channel attention, enabling the model to focus on lesion regions while suppressing background clutter. The effective receptive field comparison (measured via gradient-based visualization) shows that C3K2-RC provides a much broader and structured receptive field than the original convolution.

3.2.2 CARFAE Upsampling

The nearest-neighbor interpolation in the neck network was replaced with CARFAE (Content-Aware ReAssembly of Features). This module dynamically generates upsampling kernels conditioned on the input feature map, allowing content-aware feature reconstruction. It improves edge preservation and detail recovery for small lesions, which is crucial for distinguishing grades 2 and 3 that often have subtle boundary differences. Feature map visualizations confirm that CARFAE reduces background noise and highlights disease regions more sharply.

3.2.3 LSCSBD Detection Head

The original decoupled head was replaced with a lightweight shared convolutional detection head with separated batch normalization (LSCSBD). This head uses group-normalized shared convolutions across scales, reducing parameters while maintaining accuracy. The BN activation module improves small-object detection performance by normalizing intermediate features. The head structure is defined as:

Input → Conv_BN 1×1 → Conv_GN 3×3 → BnAct → [Conv-Reg, Conv-Cls] → Scale

3.3 Final Model

The complete YOLOv12n-RCL integrates all three improvements. The network processes input images of size 1024×1024 through the backbone (with C3K2-RC and A2C2f blocks), neck (with CARFAE upsampling and C3K2-RC), and LSCSBD detection head. Training used SGD optimizer with initial lr=0.01, momentum=0.935, weight decay=0.0005, batch size=16, and 200 epochs.

4. Experiments and Results

4.1 Training Curves

Figure? The training curves show that YOLOv12n-RCL (our improved model) consistently outperforms baseline YOLOv12n in precision, recall, and mAP metrics throughout the 200 epochs. The improved model converges faster and exhibits less fluctuation, indicating better stability.

4.2 Ablation Studies

We conducted a series of ablation experiments to evaluate each component’s contribution. Table 2 summarizes the results.

Table 2: Ablation study results
C3K2-RC CARFAE LSCSBD P (%) R (%) mAP0.5 (%) mAP0.5-0.95 (%) Params (M) FLOPs (G) Model Size (MB)
79.6 78.0 81.4 46.2 2.57 6.5 5.6
81.1 77.8 82.2 48.3 2.59 6.6 5.7
79.6 78.4 81.6 47.0 2.23 5.8 4.9
80.2 78.7 82.7 48.4 2.44 6.4 5.3
82.7 80.1 83.6 49.7 2.33 6.3 5.1
83.7 81.2 84.5 50.1 2.46 6.5 5.4
79.9 78.0 82.1 48.4 2.09 5.6 4.5
83.4 81.2 84.8 50.2 2.11 5.7 4.5

Key observations:

  • Each individual module improves detection accuracy, with C3K2-RC giving the largest boost in mAP0.5 (+1.2 pp).
  • CARFAE and LSCSBD reduce model complexity (parameters, FLOPs) while improving recall and mAP.
  • The full YOLOv12n-RCL (all three modules) achieves the best balance: mAP0.5 = 84.8%, mAP0.5-0.95 = 50.2%, with only 2.11M parameters and 5.7 GFLOPs.

4.3 Comparison with State-of-the-Art Methods

We compared our YOLOv12n-RCL with mainstream detectors: Faster R-CNN, SSD, CenterNet, RT-DETR, and various YOLO versions (YOLOv5s, v7-tiny, v8n, v9t, v10n, v11n, v12n). Table 3 presents the comprehensive comparison.

Table 3: Performance comparison of different detection networks
Model Type P (%) R (%) mAP0.5 (%) mAP0.5-0.95 (%) Params (M) FLOPs (G) Model Size (MB)
Faster R-CNN Two-Stage 70.1 65.6 68.8 40.2 137.1 370.2 114.2
SSD One-Stage 73.1 66.4 72.1 42.3 24.0 61.1 105.2
CenterNet One-Stage 72.1 67.1 75.6 43.7 32.7 70.2 90.6
RT-DETR (resnet50) DETR 71.9 68.2 78.7 44.1 42.8 130.5 96.7
YOLOv5s One-Stage 78.9 77.3 81.2 46.4 7.26 16.4 13.7
YOLOv7-tiny One-Stage 80.1 81.0 80.5 45.5 6.02 13.2 11.7
YOLOv8n One-Stage 79.1 78.0 81.4 46.7 3.01 8.2 6.3
YOLOv9t One-Stage 76.5 74.2 79.5 44.7 1.73 6.4 4.3
YOLOv10n One-Stage 79.2 78.7 80.4 45.1 2.69 8.2 5.8
YOLOv11n One-Stage 78.3 78.8 80.6 46.0 2.61 6.6 5.7
YOLOv12n One-Stage 79.6 78.0 81.4 46.2 2.57 6.5 5.6
YOLOv12n-RCL One-Stage 83.4 81.2 84.8 50.2 2.11 5.7 4.5

Our model achieves the highest precision, recall, and mAP values while maintaining the smallest computational footprint among mainstream YOLO variants (only YOLOv9t is lighter but performs worse). This demonstrates the effectiveness of our improvements in utilizing drone technology for high-precision yet lightweight disease detection.

4.4 Confusion Matrix Analysis

To analyze misclassification patterns, we computed the normalized confusion matrix for YOLOv12n-RCL on the test set. The diagonal values (recall per grade) are: Grade 0: 83.0%, Grade 1: 81.2%, Grade 2: 79.1%, Grade 3: 80.7%, Grade 4: 82.0%. The most frequent confusion occurs between Grade 2 and Grade 3: 8.2% of true Grade 2 are predicted as Grade 3, and 9.3% of true Grade 3 are predicted as Grade 2. All other inter-grade misclassifications are below 10%. This indicates that the model handles the subtle boundaries between adjacent moderate-severity classes reasonably well, and the overall grading capability is robust.

Figure? An example UAV image used in our dataset, showcasing the capability of drone technology to capture field-scale sunflower scenes with rich detail.

4.5 Edge Deployment on Jetson Orin Nano

We deployed both YOLOv12n and YOLOv12n-RCL on an NVIDIA Jetson Orin Nano embedded board to evaluate real-world applicability. The models were converted to ONNX, optimized, and then to TensorRT engine files with INT8 quantization. Table 4 summarizes the deployment results on 70 test images containing 1,298 annotated disk rot instances.

Table 4: Edge deployment comparison on Jetson Orin Nano
Model FPS (frames/s) Missed detections False detections
YOLOv12n 18.7 35 30
YOLOv12n-RCL 27.5 10 8

YOLOv12n-RCL achieves 27.5 FPS (47.1% faster than baseline) while reducing missed detections by 71% and false detections by 73%. This significant improvement in speed and accuracy makes it highly suitable for real-time drone-based monitoring. The lightweight design of our model fully leverages the computational constraints of edge devices, enabling practical deployment in drone technology systems for in-field disease assessment.

4.6 Field-Scale Visualization

We further validated the model on large UAV orthomosaic images. The detection results were tiled and merged to produce spatial distribution maps of disease severity. The merged output clearly shows that our model maintains consistent detection even under dense target distribution and leaf occlusion, with no obvious missed or false detections. This demonstrates the generalization ability when scaling from patch-level to field-level, a critical requirement for drone technology applications.

5. Discussion

The proposed YOLOv12n-RCL model successfully addresses the challenges of sunflower disk rot grading from UAV imagery. By integrating RFAConv and CA into the backbone and neck, the model expands its effective receptive field and incorporates spatial coordinate information, leading to better localization of irregular lesions. The CARFAE upsampling operator enhances detail reconstruction, which is particularly beneficial for distinguishing grades with subtle texture differences. The LSCSBD detection head reduces parameters while maintaining detection capability for small lesions.

The use of drone technology as the data acquisition platform provides several advantages: large-area coverage, high spatial resolution, and flexible flight planning. Combined with our lightweight model, it enables a practical workflow for field-scale disease monitoring. The deployment results on Jetson Orin Nano confirm that the model can run in near real-time (27.5 FPS) with minimal detection errors, making it feasible for integration into autonomous UAV spraying systems.

However, there are limitations. Our current dataset was collected at a fixed flight altitude (30 m) during the mature stage when flower heads face downward. Future work should include multi-altitude data to improve robustness, and extend to other growth stages (e.g., flowering stage) where flower heads are upright. Additionally, incorporating multi-view imaging could provide richer structural information. Model compression techniques such as knowledge distillation could further boost inference speed without sacrificing accuracy.

6. Conclusion

We developed YOLOv12n-RCL, an improved deep learning model specifically designed for detecting sunflower disk rot severity from drone technology imagery. The model integrates three key innovations: C3K2-RC modules (RFAConv + CA), CARFAE upsampling, and LSCSBD detection head. Evaluated on our self-constructed dataset, YOLOv12n-RCL achieves 83.4% precision, 81.2% recall, 84.8% mAP0.5, and 50.2% mAP0.5-0.95, outperforming all compared state-of-the-art detectors. It also reduces model parameters by 17.9%, FLOPs by 12.3%, and model size by 19.6% relative to baseline YOLOv12n. On an embedded Jetson Orin Nano, it runs at 27.5 FPS with only 10 missed detections and 8 false detections across 70 test images. The confusion matrix shows balanced recognition across all five severity grades with minimal inter-grade confusion. Overall, our work demonstrates that combining advanced model improvements with drone technology can deliver an effective, lightweight, and deployable solution for sunflower disease severity grading, paving the way for intelligent precision agriculture applications.

Scroll to Top