In modern agriculture, optimizing nitrogen (N) management is critical for sustainable crop production. Nitrogen use efficiency (NUE) serves as a key indicator for assessing how effectively crops utilize applied N. Among its components, nitrogen utilization efficiency (NUtE), which reflects the plant’s ability to convert absorbed N into biomass, is vital for guiding precision fertilization. Traditional methods for determining NUtE rely on destructive sampling and laboratory analysis, which are time-consuming, labor-intensive, and impractical for large-scale field monitoring. The advent of unmanned aerial vehicles (UAV drones) offers a revolutionary solution. These UAV drones can be equipped with various sensors, such as RGB and multispectral cameras, to rapidly acquire high-resolution canopy data over large areas. This study explores the potential of fusing multi-source features derived from UAV drone imagery to predict winter wheat NUtE across multiple growth stages using machine learning algorithms.
We conducted field experiments during the 2022-2023 growing season, involving three winter wheat cultivars under four nitrogen application rates (N0, N1, N2, N3). The experimental design included 36 plots with three replications. Ground truth data for NUtE were obtained through destructive sampling at five key phenological stages: jointing, booting, heading, early grain filling, and late grain filling. The NUtE was calculated using the following formula:
$$ NUtE = \frac{AGB}{PlantN} $$
where \( AGB \) is the aboveground biomass (kg·ha⁻¹) and \( PlantN \) is the plant nitrogen uptake (kg·ha⁻¹). Concurrently, we deployed UAV drones equipped with both RGB and multispectral sensors to capture canopy images. The UAV drone platform ensured consistent data acquisition under clear sky conditions at a flight altitude of 30 meters. The collected images were processed to generate orthomosaics and digital surface models (DSMs).

From the multispectral imagery, we computed 20 vegetation indices (VIs) known to correlate with biomass and nitrogen status. These VIs, listed in Table 1, leverage reflectance from blue (B), green (G), red (R), red-edge (RE), and near-infrared (NIR) bands. From the RGB imagery, we extracted texture features (TFs) using the Gray-Level Co-occurrence Matrix (GLCM) method. A 3×3 moving window was applied to calculate eight common texture measures per band, resulting in 24 initial TFs. Additionally, structural features (SFs) were derived: canopy cover (CC) was estimated by segmenting green pixels from RGB images, and plant height (PH) was calculated by subtracting the pre-planting DSM from the DSMs at each growth stage.
To handle the high dimensionality and avoid multicollinearity, we employed the minimum Redundancy Maximum Relevance (mRMR) algorithm to select the most informative VIs and TFs for NUtE prediction. The mRMR algorithm selects features that have maximum relevance to the target variable while minimizing redundancy among themselves. This process yielded five sensitive VIs and five sensitive TFs. The selected features, along with the two SFs (CC and PH), were used to construct predictive models.
We evaluated four machine learning algorithms: Partial Least Squares Regression (PLSR), Random Forest Regression (RFR), Support Vector Regression (SVR), and Gaussian Process Regression (GPR). These models were trained using a nested 5-fold cross-validation framework on a training dataset (n=120) and independently validated on a separate test set (n=60). Model performance was assessed using the coefficient of determination (R²), root mean square error (RMSE), and the ratio of performance to deviation (RPD). Their formulas are given below:
$$ R^2 = 1 – \frac{\sum_{i=1}^{n} (y_i – \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i – \bar{y})^2} $$
$$ RMSE = \sqrt{\frac{\sum_{i=1}^{n} (y_i – \hat{y}_i)^2}{n}} $$
$$ RPD = \frac{SD}{RMSE} $$
where \( y_i \) is the observed NUtE, \( \hat{y}_i \) is the predicted NUtE, \( \bar{y} \) is the mean of observed values, \( n \) is the sample size, and \( SD \) is the standard deviation of the observed values.
The dynamic change in NUtE across growth stages is summarized in Figure 2. NUtE increased progressively from jointing to late grain filling, with mean values of 36.91, 59.22, 77.32, 96.41, and 105.86 kg·kg⁻¹, respectively. A notable jump occurred from jointing to booting, likely due to top-dressing N application at jointing. Furthermore, NUtE generally decreased with increasing N application rate, indicating potential over-fertilization under high N inputs.
The mRMR algorithm selected the following sensitive features for model construction, as shown in Table 3.
| Vegetation Index (VI) | Formula | Reference |
|---|---|---|
| Normalized Difference Vegetation Index (NDVI) | $$ (NIR – R) / (NIR + R) $$ | [4] |
| Modified Vegetation Index (MVI) | $$ (RE – R) / (NIR + R) $$ | [15] |
| Enhanced Vegetation Index (EVI) | $$ 2.5(NIR – R) / (NIR + 6R – 7.5B + 1) $$ | [11] |
| Near-Infrared Reflectance of Vegetation (NIRv) | $$ NIR \times NDVI $$ | [4] |
| Structure Insensitive Pigment Index (SIPI) | $$ (NIR – G) / (NIR – R) $$ | [16] |
| Difference Vegetation Index (DVI) | $$ NIR – R $$ | [4] |
| Renormalized Difference Vegetation Index (RDVI) | $$ (NIR – R) / \sqrt{NIR + R} $$ | [11] |
| Chlorophyll Vegetation Index (CVI) | $$ (NIR \times R) / (G^2) $$ | [17] |
| Infrared Percentage Vegetation Index (IPVI) | $$ NIR / (NIR + R) $$ | [18] |
| Ratio Vegetation Index (RVI) | $$ NIR / R $$ | [4] |
| Normalized Difference Red Edge Index (NDREI) | $$ (RE – G) / (RE + G) $$ | [19] |
| Greenness Index (GM) | $$ RE / G $$ | [19] |
| Leaf Chlorophyll Index (LCI) | $$ (NIR – RE) / (NIR + R) $$ | [4] |
| Green Chlorophyll Vegetation Index (GCVI) | $$ NIR / G – 1 $$ | [20] |
| Normalized Difference Red Edge (NDRE) | $$ (NIR – RE) / (NIR + RE) $$ | [4] |
| Red Edge Chlorophyll Index (CIre) | $$ NIR / RE – 1 $$ | [4] |
| Modified Simple Ratio (mSR) | $$ (NIR – B) / (RE – B) $$ | [21] |
| Datt Index (DATT) | $$ (NIR – RE) / (NIR – R) $$ | [22] |
| MERIS Terrestrial Chlorophyll Index (MTCI) | $$ (NIR – RE) / (RE – R) $$ | [4] |
| Canopy Chlorophyll Content Index (SCCCI) | $$ NDRE / NDVI $$ | [23] |
Note: B, G, R, RE, NIR represent reflectance in the blue, green, red, red-edge, and near-infrared bands, respectively.
| Abbreviation | Texture Feature | Formula | Description |
|---|---|---|---|
| Mea | Mean | $$ \sum_{i=0}^{N-1} \sum_{j=0}^{N-1} i \cdot p(i,j) $$ | Average gray level within the window. |
| Var | Variance | $$ \sum_{i} \sum_{j} (i – \mu)^2 p(i,j) $$ | Gray level variance within the window. |
| Hom | Homogeneity | $$ \sum_{i} \sum_{j} \frac{1}{1+(i-j)^2} p(i,j) $$ | Measures the closeness of the distribution of elements in the GLCM to the diagonal. |
| Con | Contrast | $$ \sum_{n=0}^{N_g-1} n^2 \left\{ \sum_{i=1}^{N_g} \sum_{j=1}^{N_g} p(i,j) \right\} $$ | Measures the local variations in the image. |
| Dis | Dissimilarity | $$ \sum_{n=1}^{N_g-1} n \left\{ \sum_{i=1}^{N_g} \sum_{j=1}^{N_g} p(i,j) \right\} $$ | Similar to contrast but with linear increase. |
| Ent | Entropy | $$ -\sum_{i} \sum_{j} p(i,j) \log(p(i,j)) $$ | Measures the randomness or complexity of texture. |
| SM | Second Moment | $$ \sum_{i} \sum_{j} \{p(i,j)\}^2 $$ | Measures the uniformity of the gray-level distribution. |
| Cor | Correlation | $$ \frac{\sum_{i} \sum_{j} (i j) p(i,j) – \mu_i \mu_j}{\sigma_i \sigma_j} $$ | Measures the linear dependency of gray levels on those of neighboring pixels. |
Note: \( p(i,j) \) is the probability of gray-level pair (i,j); \( N_g \) is the number of gray levels; \( \mu \) and \( \sigma \) are the mean and standard deviation of \( p(i,j) \).
| Feature Type | Sensitive Features |
|---|---|
| Vegetation Indices (VI) | SIPI, CVI, DATT, MVI, DVI |
| Texture Features (TF) | B.Mea, G.Var, G.Cor, B.SM, G.Mea |
| Structural Features (SF) | Canopy Cover (CC), Plant Height (PH) |
We first evaluated models built using single feature types. The performance metrics on the validation set are presented in Table 4. Among the single-type models, those using VIs achieved the highest accuracy, with the SVR model performing best (R² = 0.896, RMSE = 10.608 kg·kg⁻¹, RPD = 3.098). Models based on SFs also showed strong performance, while TF-based models were the least accurate. This underscores the primary role of spectral information from UAV drones in estimating physiological traits like NUtE.
| Feature Type | Number | Metrics | PLSR | RFR | SVR | GPR |
|---|---|---|---|---|---|---|
| VI | 5 | R² | 0.616 | 0.803 | 0.896 | 0.666 |
| RMSE (kg·kg⁻¹) | 20.299 | 14.961 | 10.608 | 19.713 | ||
| RPD | 1.619 | 2.196 | 3.098 | 1.667 | ||
| TF | 5 | R² | 0.334 | 0.667 | 0.687 | 0.373 |
| RMSE (kg·kg⁻¹) | 27.007 | 19.627 | 19.495 | 26.024 | ||
| RPD | 1.217 | 1.674 | 1.685 | 1.263 | ||
| SF | 2 | R² | 0.823 | 0.811 | 0.823 | 0.823 |
| RMSE (kg·kg⁻¹) | 14.087 | 14.536 | 13.834 | 14.140 | ||
| RPD | 2.333 | 2.261 | 2.375 | 2.324 |
Next, we investigated the synergy of multi-source feature fusion. The results, detailed in Table 5, demonstrate that combining features significantly enhanced prediction accuracy. The fusion of all three feature types (VI+TF+SF) yielded the best overall performance. Among the algorithms, SVR consistently outperformed others, achieving the highest accuracy with the full feature set (R² = 0.920, RMSE = 9.391 kg·kg⁻¹, RPD = 3.499). This represents a substantial improvement over the best single-type model (VI-based SVR), with a 2.68% increase in R², an 11.47% decrease in RMSE, and a 12.94% increase in RPD. These results highlight the complementary nature of spectral, textural, and structural information captured by UAV drones.
| Feature Combination | Number | Metrics | PLSR | RFR | SVR | GPR |
|---|---|---|---|---|---|---|
| TF + SF | 7 | R² | 0.842 | 0.845 | 0.896 | 0.885 |
| RMSE (kg·kg⁻¹) | 13.367 | 13.822 | 11.078 | 11.433 | ||
| RPD | 2.458 | 2.377 | 2.966 | 2.874 | ||
| VI + SF | 7 | R² | 0.857 | 0.884 | 0.916 | 0.855 |
| RMSE (kg·kg⁻¹) | 12.316 | 11.786 | 9.741 | 12.673 | ||
| RPD | 2.668 | 2.788 | 3.373 | 2.593 | ||
| VI + TF | 10 | R² | 0.710 | 0.858 | 0.897 | 0.734 |
| RMSE (kg·kg⁻¹) | 17.683 | 12.990 | 10.994 | 17.484 | ||
| RPD | 1.858 | 2.530 | 2.989 | 1.879 | ||
| VI + TF + SF | 12 | R² | 0.879 | 0.890 | 0.920 | 0.895 |
| RMSE (kg·kg⁻¹) | 11.479 | 11.668 | 9.391 | 10.663 | ||
| RPD | 2.862 | 2.816 | 3.499 | 3.082 |
To understand the contribution of individual features, we analyzed the feature importance within the optimal SVR model (VI+TF+SF). The results, illustrated in Figure 3a, show that the Difference Vegetation Index (DVI) was the most important predictor, followed by Canopy Cover (CC), the Second Moment of the Blue band (B.SM), the Structure Insensitive Pigment Index (SIPI), and the Datt Index (DATT). Pearson correlation analysis (Figure 3b) revealed that DVI, SIPI, DATT, and CC were significantly correlated with NUtE (p < 0.01), while B.SM showed no significant linear correlation. This indicates that B.SM, a texture feature from UAV drone RGB imagery, contributes to the model by capturing non-linear relationships with NUtE, which linear indices might miss. This underscores the value of integrating texture information from UAV drones.
We further assessed the robustness of the best model (SVR with VI+TF+SF) under different agronomic conditions. The model maintained high accuracy (R² > 0.80) for low and medium N treatments but showed some overestimation for the high N treatment (N3). Across different wheat cultivars, which had similar phenology and plant architecture, the model performed excellently (R² > 0.93). However, predictive accuracy varied across growth stages, following a pattern of initial increase and subsequent decrease, with R² ranging from 0.61 to 0.86. The lower accuracy at the booting stage might be attributed to varying cultivar responses to top-dressed nitrogen. Despite these variations, spatial inversion maps generated by the model successfully captured the overall increasing trend of NUtE from jointing to late grain filling and reflected significant differences between treatments, demonstrating the practical utility of UAV drone-based monitoring.
The success of the SVR algorithm can be attributed to its ability to handle high-dimensional, non-linear relationships through kernel functions, such as the Radial Basis Function (RBF) used here. The flexibility of SVR in finding an optimal hyperplane makes it particularly suitable for fusing diverse data streams from UAV drones. In contrast, while PLSR is effective for handling collinearity, it may not fully capture complex non-linearities. RFR and GPR also performed well but were slightly less accurate than SVR in this specific high-dimensional fusion scenario. The mRMR feature selection proved crucial for managing dimensionality and focusing on the most informative variables, preventing overfitting and improving model interpretability.
Our findings align with and extend previous research. For instance, studies have shown that spectral VIs are powerful for estimating nitrogen-related traits, but they can saturate under dense canopy conditions. The incorporation of texture features from UAV drone imagery has been shown to mitigate this issue by providing spatial context. Similarly, structural features like canopy cover and plant height offer direct measures of crop growth status. Our work demonstrates that a synergistic fusion of all three information types delivers superior performance. The selected key features—DVI, CC, B.SM, SIPI, and DATT—collectively represent plant nitrogen absorption capacity (via VIs), canopy architecture and resource use (via SFs), and spatial homogeneity (via TFs). This multi-faceted perspective is enabled by the versatile data acquisition capabilities of modern UAV drones.
This study has certain limitations. The model was developed and validated on data from a single growing season and location. Its generalizability, especially in high-nitrogen input environments or across more diverse cultivars and climatic conditions, requires further validation with multi-year, multi-location datasets. Future work could explore the integration of data from more advanced UAV drone sensors, such as hyperspectral or LiDAR, to capture even finer physiological and structural details. Additionally, deep learning approaches like Convolutional Neural Networks (CNNs) could be tested to automatically learn optimal feature representations from raw UAV drone imagery.
In conclusion, this research establishes a robust framework for predicting winter wheat nitrogen utilization efficiency (NUtE) by leveraging multi-source remote sensing features from UAV drones. We systematically demonstrated that while individual feature types (vegetation indices, texture features, structural features) have predictive value, their fusion significantly enhances accuracy. The Support Vector Regression (SVR) model, trained on features selected by the mRMR algorithm, achieved the best performance (R² = 0.920, RMSE = 9.391 kg·kg⁻¹). Key predictors included spectral indices (DVI, SIPI, DATT), structural canopy cover, and textural information (B.SM). This approach effectively overcomes the limitations of single-feature models, such as saturation effects in vegetation indices, by providing a comprehensive description of crop status. The workflow highlights the transformative potential of UAV drone technology combined with machine learning for high-throughput phenotyping and precision nitrogen management. It provides a scalable and efficient method for in-season monitoring of nitrogen use efficiency, supporting decisions aimed at increasing yield, reducing fertilizer waste, and promoting environmental sustainability in wheat production systems.
