Predicting Coastal Wetland Soil Organic Carbon Density with Drone Technology

Coastal wetlands serve as critical transitional ecosystems between land and ocean, characterized by high primary productivity and exceptional carbon sequestration capacity. More than 90% of the carbon stored in these ecosystems resides within the soil layer. Soil organic carbon density (SOCD), defined as the mass of organic carbon per unit area in a given soil depth, is a core indicator for assessing carbon pool size and stability. Understanding SOCD spatial variability is essential for evaluating global carbon cycles and guiding wetland conservation. Traditional field sampling and laboratory analysis are accurate but labour-intensive and logistically challenging in remote or inaccessible areas such as tidal flats and marshes. Drone technology has revolutionised environmental monitoring by offering high mobility, sub-meter spatial resolution, and flexible revisit times. In this study, we harnessed drone technology to acquire multispectral imagery over a coastal wetland invaded by Spartina alterniflora, integrated with field soil samples, to develop robust SOCD predictive models. Our primary objectives were to (i) identify optimal predictive features using the Boruta algorithm, (ii) compare the performance of three machine learning (ML) algorithms—Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Boosted Regression Trees (BRT)—for SOCD estimation, (iii) enhance prediction accuracy by coupling ML models with Residual Kriging (RK), and (iv) map the spatial distribution of SOCD under the influence of S. alterniflora invasion. The results demonstrate that combining modern drone technology with advanced statistical methods can produce high-resolution SOCD maps, providing a scientific basis for carbon accounting and ecological management in coastal wetlands.

Study Area and Data Acquisition

Our study area is located in the coastal wetland of Dafeng District, Yancheng City, Jiangsu Province, China. This region experiences a temperate monsoon climate with an annual mean temperature of 13.7–14.6 °C and annual precipitation of 980–1,070 mm, mainly concentrated from June to September. The area hosts the largest coastal wetland nature reserve in China—the Yancheng Wetland Rare Birds National Nature Reserve—inscribed as a UNESCO World Natural Heritage site in 2019. Since the introduction of Spartina alterniflora in 1982, this exotic species has rapidly expanded, displacing native Phragmites australis and Suaeda glauca and forming monospecific stands. This invasion alters vegetation structure and organic matter input, thereby affecting SOCD dynamics.

Soil sampling was conducted in October 2023. A total of 161 topsoil samples (0–30 cm depth) were collected using a stratified random design that considered accessibility and spatial coverage. At each sampling point, three subsamples were taken and mixed to form a composite sample. The geographic coordinates were recorded with a handheld GPS unit. Samples were air-dried, sieved through a 2 mm mesh to remove roots and gravel, and then ground to pass a 100-mesh screen for laboratory analysis. Bulk density (BD) was determined by drying undisturbed cores at 105 °C to constant weight. Soil organic carbon (SOC) content was measured using the potassium dichromate oxidation method. SOCD (kg·m⁻²) was calculated as:

$$
SOCD = SOC \times BD \times H \times 10^{-1}
$$

where SOC is the organic carbon content (g·kg⁻¹), BD is bulk density (g·cm⁻³), and H is the soil layer thickness (0.3 m).

Drone-based multispectral imagery was acquired using a DJI Phantom 4 Multispectral UAV equipped with five narrow-band sensors: Blue (450 nm, 16 nm bandwidth), Green (560 nm, 16 nm), Red (650 nm, 16 nm), Red edge (730 nm, 16 nm), and Near-Infrared (NIR, 840 nm, 26 nm). Flights were conducted between 30 October and 1 November 2023, during 11:00–14:00 local time under clear sky conditions. The drone flew at an altitude of 100 m, yielding a ground sampling distance of 7 cm. Four flight missions were executed to cover the entire study area. Radiometric calibration panels were imaged before each flight to convert digital numbers to surface reflectance. Images were mosaicked and orthorectified using DJI Terra software, with output in WGS84 UTM Zone 51N coordinate system. The high spatial resolution of drone technology allowed us to capture fine-scale variations in vegetation and soil properties that would be missed by satellite sensors.

Feature Construction and Selection

From the five raw spectral bands, we calculated 36 remote sensing indices commonly used to represent vegetation vigour, soil brightness, and moisture status. The indices and their formulas are listed in the table below. All indices were computed from the drone-derived reflectance layers.

List of Remote Sensing Indices Derived from UAV Multispectral Data
Index Formula
NDVI $$\frac{NIR-Red}{NIR+Red}$$
EVI $$2.5 \times \frac{NIR-Red}{NIR+6\times Red-7.5\times Blue+1}$$
RVI $$\frac{NIR}{Red}$$
DVI $$NIR – Red$$
NDRE $$\frac{NIR-RE}{NIR+RE}$$
OSAVI $$\frac{NIR-Red}{NIR+Red+0.16}$$
NDI45 $$\frac{RE-Green}{RE+Green}$$
MCARI $$(RE-Red-0.2\times(RE-Green))\times\frac{RE}{Red}$$
BI $$\sqrt{Blue^2+Green^2+Red^2}$$
BI2 $$\frac{Blue^2+Green^2+Red^2}{3}$$
RI $$\frac{Red^2}{Blue+Green+Red}$$
CI $$\frac{Red-Blue}{Red+Blue}$$
GNDVI $$\frac{NIR-Green}{NIR+Green}$$
TVI $$0.5\times(120\times(NIR-Green)-200\times(Red-Green))$$
VARI $$\frac{Green-Red}{Green+Red-Blue}$$
EVI2 $$2.5\times\frac{NIR-Red}{NIR+2.4\times Red+1}$$
RVI2 $$\frac{NIR}{Green}$$
NDVI_gb $$\frac{Green-Blue}{Green+Blue}$$
MSR $$\frac{(\frac{NIR}{Red})-1}{\sqrt{(\frac{NIR}{Red})+1}}$$
NLI $$\frac{NIR^2-Red}{NIR^2+Red}$$
RDVI $$\frac{NIR-Red}{\sqrt{NIR+Red}}$$
EXG $$2\times Green – Red – Blue$$
GI $$\frac{Green}{Red}$$
NGRDI $$\frac{Green-Red}{Green+Red}$$
NRBDI $$\frac{NIR-Blue}{NIR+Blue}$$
WDVI $$NIR – 0.5\times Red$$
TNDVI $$\sqrt{\frac{NIR-Red}{NIR+Red}+0.5}$$
GCI $$\frac{NIR}{Green}-1$$
GDVI $$NIR – Green$$
GLI $$\frac{2\times Green – Red – Blue}{2\times Green + Red + Blue}$$
IPVI $$\frac{NIR}{NIR+Red}$$
NNIR $$\frac{NIR}{NIR+Blue}$$
GRVI $$\frac{Green}{Red}$$
BNDVI $$\frac{NIR-Blue}{NIR+Blue}$$
GBNDVI $$\frac{NIR-(Green+Blue)}{NIR+(Green+Blue)}$$
GRNDVI $$\frac{NIR-(Green+Red)}{NIR+(Green+Red)}$$

To avoid overfitting and reduce multicollinearity among the 41 candidate predictors (5 raw bands + 36 indices), we employed the Boruta algorithm for feature selection. Boruta is a random forest-based method that creates shadow features by shuffling original values and compares the importance of each real feature against the maximum importance among shadow features over multiple iterations (we used 50 iterations). Features with importance consistently higher than the shadow threshold are deemed “confirmed”; those lower are “rejected”. This process yielded a set of 30 confirmed features, listed in the table below. All five raw bands (Blue, Green, Red, Red edge, NIR) were retained, along with 25 indices. Notably, indices such as NRBDI, NDVI_gb, NDRE, BI, EVI, and others demonstrated strong predictive power.

Confirmed Features after Boruta Selection
Category Features
Raw Bands Blue, Green, Red, Red_edge, NIR
Indices NRBDI, NDVI_gb, NDRE, BI, EVI, EXG, BI2, NNIR, GRNDVI, DVI, GDVI, RDVI, GNDVI, EVI2, TNDVI, NDVI, NLI, TVI, MSR, RVI, Blue (twice? note: Blue band already listed; indices named similarly are unique). Actually, from original list: NRBDI, BI, NDVI_gb, NDRE, Blue, EVI, Green, Red, EXG, NIR, BI2, Red_edge, NNIR, GRNDVI, DVI, GDVI, RDVI, GNDVI, EVI2, TNDVI, NDVI, NLI, TVI, MSR, RVI

Modeling Framework

Three machine learning algorithms—RF, XGBoost, and BRT—were selected for SOCD prediction. All models were implemented in R using the caret package with 10-fold cross-validation repeated three times. The training set comprised 70% of the 161 samples (n=113) and the test set 30% (n=48). Hyperparameters were tuned via grid search. For RF, the number of trees (ntree) was set to 500 and the number of variables considered at each split (mtry) was optimized from 2 to 10. For XGBoost, we tuned the number of boosting rounds (nrounds: 100–500), maximum tree depth (max_depth: 3–10), learning rate (eta: 0.01–0.3), and subsample ratio (subsample: 0.6–0.9). For BRT, the number of trees (n.trees: 100–500) and interaction depth (interaction.depth: 1–5) were optimized. The optimal parameters were selected by minimizing the cross-validated root mean square error (RMSE).

To account for spatial autocorrelation in model residuals, we applied Residual Kriging (RK). The procedure consisted of: (i) fitting an ML model to predict SOCD at all 161 points; (ii) computing residuals (observed minus predicted); (iii) fitting a semivariogram to the residuals and performing ordinary kriging interpolation of residuals; and (iv) adding the kriged residual to the ML prediction to obtain the final SOCD estimate:

$$
\hat{Z}_{RK}(\mathbf{s}) = \hat{Z}_{ML}(\mathbf{s}) + \hat{\epsilon}_{OK}(\mathbf{s})
$$

where \(\hat{Z}_{ML}(\mathbf{s})\) is the ML-predicted SOCD at location \(\mathbf{s}\) and \(\hat{\epsilon}_{OK}(\mathbf{s})\) is the ordinary-kriging estimate of the residual. Three hybrid models—RF-RK, XGBoost-RK, and BRT-RK—were thus developed.

Model performance was evaluated using three metrics: coefficient of determination (R²), normalized root mean square error (nRMSE), and ratio of performance to interquartile distance (RPIQ). Formulas are:

$$
R^2 = 1 – \frac{\sum_{i=1}^{n}(y_i – \hat{y}_i)^2}{\sum_{i=1}^{n}(y_i – \bar{y})^2}
$$
$$
nRMSE = \frac{\sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i – \hat{y}_i)^2}}{\bar{y}}
$$
$$
RPIQ = \frac{Q3 – Q1}{nRMSE}
$$

where \(y_i\) and \(\hat{y}_i\) are observed and predicted SOCD, \(\bar{y}\) is the mean observed value, and Q3 and Q1 are the 75th and 25th percentiles of the observed values.

Results and Discussion

Model Performance Comparison

The predictive accuracy of the three ML models and their RK-enhanced counterparts is summarised in the table below. Among the standalone ML models, RF achieved the highest R² of 0.540 in the validation set, followed by XGBoost (0.530) and BRT (0.520). The nRMSE values were 0.196, 0.197, and 0.203, respectively, and RPIQ values were 1.659, 1.657, and 1.610. These results indicate that all three tree-based models were capable of explaining about 52–54% of the SOCD variability, with RF showing a slight edge. The improvement in R² from RF over XGBoost and BRT was approximately 1.9% and 3.8%, respectively.

Validation Performance of ML and Hybrid RK Models
Model Dataset nRMSE RPIQ
RF Training 0.575 0.137 1.895
RF Validation 0.540 0.196 1.659
XGBoost Training 0.541 0.139 1.866
XGBoost Validation 0.530 0.197 1.657
BRT Training 0.537 0.193 1.843
BRT Validation 0.520 0.203 1.610
RF-RK Training 0.867 0.121 2.146
RF-RK Validation 0.814 0.164 1.987
XGBoost-RK Training 0.793 0.130 1.992
XGBoost-RK Validation 0.757 0.176 1.857
BRT-RK Training 0.813 0.135 1.930
BRT-RK Validation 0.720 0.199 1.636

Incorporating RK dramatically improved prediction accuracy. RF-RK achieved a validation R² of 0.814—a 50.7% increase over standalone RF. XGBoost-RK and BRT-RK reached R² values of 0.757 and 0.720, corresponding to improvements of 42.8% and 38.4%, respectively. The nRMSE decreased and RPIQ increased for all hybrid models, confirming that RK effectively captured the spatial structure of the residuals that the ML algorithms could not model. The best overall model was RF-RK, which explained 81.4% of SOCD variance in validation.

Variable Importance

We examined the relative importance of confirmed features in the three ML models. Although the order differed, several features were consistently ranked among the top ten across all algorithms: BI, Blue, NIR, Red, and EVI. This consistency underscores their fundamental role in representing vegetation canopy structure and soil background reflectance that correlate with SOCD. For RF, the most important indices were NRBDI, NDRE, and BI, which capture near-infrared and red-edge responses to plant chlorophyll and biomass. XGBoost relied more heavily on raw bands—Blue, Green, NIR—possibly because its boosting mechanism can exploit subtle spectral differences. BRT showed a balanced pattern, giving high importance to both indices (e.g., NRBDI, RDVI, DVI) and raw bands (NIR, Red, Red_edge). The brightness index BI, which integrates all three visible bands, appeared in the top five of every model, likely because it reflects soil moisture and organic matter content indirectly through surface albedo. The fact that drone technology provides these high-resolution spectral signatures at centimetre scale is what makes such detailed feature analysis feasible.

Spatial Distribution of SOCD

Using the best-performing RF-RK model, we generated a continuous SOCD map (0–30 cm) for the entire study area after masking water bodies with the NDWI index. The predicted SOCD ranged from 0.250 to 7.715 kg·m⁻², with a distinct gradient from inland to seaward. Near the landward boundary, SOCD values were generally higher (>4 kg·m⁻²), while values decreased to below 2 kg·m⁻² in the intertidal zone closest to the ocean. This pattern aligns with the decreasing influence of freshwater and increasing tidal inundation frequency. Notably, low SOCD hotspots were observed along tidal creeks and near road embankments. At these locations, strong hydrodynamic forces and prolonged seawater submergence inhibit the establishment and growth of S. alterniflora. The reduced vegetation cover limits organic matter input from litterfall and root turnover, resulting in lower carbon accumulation. In addition, higher salinity in these depressions further stresses plant growth. Our drone technology-derived map thus highlights the tight coupling between micro-topography, vegetation dynamics, and soil carbon storage.

The high spatial resolution (7 cm) of the drone imagery allowed us to capture these fine-scale features that are generally smoothed out in satellite-based products. Furthermore, the integration of drone technology with machine learning and geostatistics provides a reliable framework for monitoring carbon stocks in heterogeneous coastal landscapes. The strong performance of RF-RK demonstrates that capturing spatial autocorrelation in residuals is critical when using nonparametric models for soil mapping. Our findings also suggest that vegetation indices related to biomass (e.g., NDRE, EVI) and soil brightness (BI) are excellent proxies for SOCD in invaded wetlands.

Conclusions

In this study, we demonstrated that drone technology combined with machine learning and residual kriging can accurately predict and map soil organic carbon density in a coastal wetland invaded by Spartina alterniflora. The following conclusions are drawn:

(1) Among the three machine learning algorithms tested, Random Forest showed the highest predictive ability for SOCD (R² = 0.540), followed by XGBoost and BRT. However, coupling any of these models with Residual Kriging substantially improved performance, with RF-RK achieving an R² of 0.814—an increase of 50.8% over the standalone model. This highlights the importance of accounting for spatial autocorrelation in residuals when using remote sensing-driven predictions.

(2) The Boruta algorithm identified 30 features (5 raw bands and 25 vegetation/soil indices) as important for SOCD prediction. Despite differing importance rankings among algorithms, features such as BI, Blue, NIR, Red, and EVI were consistently found among the top predictors, indicating their universal sensitivity to factors controlling soil organic carbon in these ecosystems.

(3) The derived SOCD map revealed strong spatial heterogeneity, with higher values in landward zones and lower values in tidal creeks and areas subjected to prolonged inundation. This pattern is driven by the interplay of tidal dynamics, hydrodynamic energy, and vegetation growth—factors that are captured effectively through the high spatiotemporal resolution afforded by drone technology.

Our work provides a replicable methodology for wetland carbon assessment, emphasising the synergy between drone-based remote sensing and advanced statistical modeling. Future studies could extend this approach to other coastal wetlands and explore the use of hyperspectral or LiDAR-equipped drones to further refine predictions of soil carbon pools. The integration of drone technology into routine monitoring programs will enable timely and spatially explicit estimates of carbon stocks, supporting climate mitigation strategies and ecological restoration efforts in vulnerable coastal zones.

Scroll to Top