Estimation of Forest Structural Parameters Using UAV Drones and Deep Learning: Model Development and Interpretability Analysis

Forest ecosystems are invaluable natural resources, and their sustainable management relies on accurate and efficient monitoring. Key structural parameters such as tree height, diameter at breast height (DBH), and above-ground biomass (AGB) serve as critical indicators for assessing forest health, functionality, and growth status. Traditionally, acquiring these parameters has been labor-intensive, time-consuming, and often destructive. Remote sensing technologies offer a promising alternative. While passive optical satellites provide valuable planar information about the forest canopy, they struggle to capture detailed vertical structural information. Active Light Detection and Ranging (LiDAR) systems excel in vertical feature extraction but are often constrained by cost (for airborne systems) or low point density (for spaceborne systems).

In this context, UAV drones equipped with high-resolution RGB cameras have emerged as a powerful and flexible tool. Through digital aerial photogrammetry (DAP), UAV drones can capture overlapping images. Using Structure from Motion (SfM) and Multi-View Stereo (MVS) algorithms, these images can be processed to generate three-dimensional (3D) dense point clouds and digital orthophoto mosaics (DOM). DAP point clouds provide structural information analogous to LiDAR, while DOM imagery offers rich spectral data. This combination presents a cost-effective method for forest inventory at various scales.

The challenge then shifts to developing robust models to estimate forest parameters from these rich data sources. While traditional machine learning methods like Random Forest or Support Vector Machines are commonly used, deep learning, particularly Deep Neural Networks (DNNs), offers superior capability in handling complex, non-linear relationships and automatic feature learning from high-dimensional data. However, DNNs are often considered “black-box” models, making it difficult to understand how specific input features influence the predictions. To address this, interpretability frameworks like SHAP (SHapley Additive exPlanations) can be employed to illuminate the model’s decision-making process.

In our study, we explore the integration of UAV drones and deep learning for estimating key forest structural parameters: mean stand diameter at breast height (AD), basal area (BA), Lorey’s mean height (HL), and above-ground biomass (AGB). We develop DNN regression models using features derived from DOM imagery, DAP point clouds, and their combination. Furthermore, we apply the SHAP framework to interpret the models, identifying the most influential features and understanding their contribution to the predictions. This approach aims to provide not only accurate estimations but also actionable insights into the relationship between remotely sensed features and forest structure.

1. Study Area and Data Acquisition

The research was conducted within a Chinese fir (Cunninghamia lanceolata) plantation located in a subtropical region of China. The area features complex topography with elevations ranging between 140 m and 1,203 m. The forest consists primarily of even-aged stands with a high canopy closure.

Data acquisition was performed using a commercial quadcopter UAV drone (DJI Inspire 2) equipped with a high-resolution RGB camera (ZENMUSE X5S, 20 MP). Flights were conducted under clear sky conditions around solar noon to minimize shadows. A flight plan was designed to ensure high overlap (80% side and front lap) at an altitude of 200 m above ground level, resulting in imagery with a ground sampling distance (GSD) of approximately 4 cm. Several Ground Control Points (GCPs) were surveyed using Real-Time Kinematic (RTK) GPS for precise georeferencing.

Using photogrammetric software (Pix4Dmapper), the collected images were processed to generate two primary data products:

Digital Orthophoto Mosaic (DOM): A seamless, geometrically corrected image representing the forest canopy’s top-down view.
Dense 3D Point Cloud: A three-dimensional reconstruction of the forest structure generated via SfM-MVS algorithms.

A total of 151 square sample plots (10 m x 10 m) of pure Chinese fir were established within the flight area for ground truth data collection. Within each plot, the DBH and tree height of every individual tree were measured. These measurements were used to calculate the four stand-level response variables:

Mean Diameter at Breast Height (AD, cm): Calculated as the quadratic mean diameter.
$$ D_g = \sqrt{\frac{1}{N}\sum_{i=1}^{N} D_i^2} $$
where $N$ is the number of trees and $D_i$ is the DBH of the i-th tree.
Basal Area (BA, m²/ha): The total cross-sectional area of all tree stems at breast height per hectare.
Lorey’s Mean Height (HL, m): The mean height weighted by the basal area of each tree.
$$ H_L = \frac{\sum_{i=1}^{N} H_i \cdot g_i}{\sum_{i=1}^{N} g_i} $$
where $H_i$ is the height and $g_i$ is the basal area of the i-th tree.
Above-Ground Biomass (AGB, kg/plot): Calculated by summing the individual tree AGB, which was estimated using a species-specific allometric model:
$$ M_i = 0.032718 \cdot D_i^{2.11093} \cdot H_i^{0.60212} $$
The plot AGB is $AGB = \sum_{i=1}^{N} M_i$.

2. Feature Extraction from UAV Drone Data

Predictor variables were extracted from both the DOM and the point cloud data for each of the 151 plots.

2.1 Spectral Features from DOM

From the RGB bands of the DOM, we calculated 13 commonly used visible-band vegetation indices (VIs) in addition to the raw digital number (DN) values of the R, G, and B channels. These indices are designed to enhance vegetation signal and are sensitive to chlorophyll content and canopy greenness. The calculated indices include Visible Difference Vegetation Index (VDVI), Green-Red Ratio Index (GRRI), Excess Green Index (ExG), Normalized Green-Red Difference Index (NGRDI), among others. In total, 16 spectral features were extracted per plot.

2.2 Structural Features from DAP Point Cloud

The normalized point cloud (ground points classified and removed) was used to calculate a suite of height and density metrics. These metrics are standard proxies for forest vertical structure and canopy density.

Height Metrics: These were calculated from the distribution of point heights above ground within each plot.

Height percentiles (e.g., $H_{25}$, $H_{50}$, $H_{75}$, $H_{95}$, $H_{99}$).
Basic statistics: mean ($H_{mean}$), median ($H_{median}$), standard deviation ($H_{std}$), variance ($H_{var}$), skewness ($H_{skew}$), kurtosis ($H_{kurt}$).
Canopy relief ratio: $H_{CRR} = (H_{mean} – H_{min}) / (H_{max} – H_{min})$.
Coefficient of variation of height: $H_{CV} = H_{std} / H_{mean}$.
Interquartile range: $H_{IQ} = H_{75} – H_{25}$.

Density Metrics: The point cloud was stratified into 10 equal-height intervals (slices). The proportion of points in each slice relative to the total number of points in the plot was calculated, resulting in 10 density variables ($D_1$ to $D_{10}$).

After removing highly correlated features (absolute Pearson correlation > 0.95), a final set of predictor variables was retained for modeling. The count of retained features for each data source combination is summarized below.

Table 1: Summary of Retained Predictor Variables for Different Data Sources
Response Variable	DOM Features	DAP Point Cloud Features	Total Combined Features
AD	16	22	38
BA	16	28	44
HL	16	25	41
AGB	16	29	45

3. Methodology: Deep Learning and Interpretability

3.1 Deep Neural Network (DNN) Regression Model

We designed a fully connected DNN to perform non-linear regression from the extracted features to the target forest parameters. The model architecture was built to progressively learn hierarchical representations of the input data.

Architecture: The network consisted of an input layer, six hidden layers with an increasing number of neurons (64, 128, 256, 512, 1024, 1024), and a final linear output layer. The LeakyReLU activation function was used in all hidden layers to introduce non-linearity while mitigating the “dying ReLU” problem. Its form is given by:
$$ \text{LeakyReLU}(x) = \max(0, x) + \alpha \cdot \min(0, x) $$
where $\alpha$ is a small positive constant (e.g., 0.01).

Regularization: To prevent overfitting, we incorporated Dropout and L2 weight decay (ridge regularization). Dropout rates were set between 0.2 and 0.5 across layers. The Dropout operation during training can be expressed as:
$$ y^{[l]} = a^{[l]} \cdot \text{Bernoulli}(1 – p^{[l]}) $$
where $a^{[l]}$ is the activation vector at layer $l$, $p^{[l]}$ is the dropout probability for that layer, and $\cdot$ denotes element-wise multiplication.

Training: Models were trained separately for each response variable (AD, BA, HL, AGB) and for three different input configurations: DOM features only, DAP point cloud features only, and Combined (DOM+DAP) features. We employed a 10-fold cross-validation strategy for robust evaluation. Hyperparameters including optimizer (Adam or RMSprop), learning rate, and learning rate decay were tuned via grid search. The best configuration was selected based on the average coefficient of determination ($R^2$) on the validation folds. The mean absolute error (MAE) was primarily used as the loss function $\mathcal{L}$:
$$ \mathcal{L}_{MAE} = \frac{1}{N} \sum_{i=1}^{N} | y_i – \hat{y}_i | $$
where $y_i$ is the observed value and $\hat{y}_i$ is the predicted value.

3.2 SHAP (SHapley Additive exPlanations) for Model Interpretation

To interpret the “black-box” DNN models, we employed the Kernel SHAP method, a model-agnostic approach based on cooperative game theory. SHAP assigns each feature an importance value (SHAP value) for a specific prediction, representing the feature’s marginal contribution to the prediction compared to the average prediction.

The explanation model $g$ for an instance $x$ is defined as:
$$ g(z’) = \phi_0 + \sum_{i=1}^{M} \phi_i z’_i $$
where $z’ \in \{0, 1\}^M$ is a simplified binary vector indicating the presence ($1$) or absence ($0$) of each of the $M$ features, $\phi_0$ is the base value (the average model output over the training dataset), and $\phi_i$ is the SHAP value for feature $i$.

The SHAP value $\phi_i$ is calculated as:
$$ \phi_i = \sum_{S \subseteq M \setminus \{i\}} \frac{|S|! (M – |S| – 1)!}{M!} [f_x(S \cup \{i\}) – f_x(S)] $$
where $S$ is a subset of features, $M$ is the total number of features, and $f_x(S)$ is the model prediction for the instance $x$ using only the feature subset $S$. This formulation ensures that the contribution of each feature is fairly distributed across all possible feature combinations.

We applied Kernel SHAP to the best-performing DNN model (trained on the combined features) for each response variable. This allowed us to generate:

Global Interpretability: Ranking features by their mean absolute SHAP value across all samples to understand overall feature importance.
Local Interpretability: Examining the SHAP force plot for individual samples to see how each feature pushed the prediction away from the base value.

4. Results

4.1 Model Estimation Performance

The performance of the DNN models for the four forest parameters across the three data sources is summarized in Table 2. The evaluation metrics are the average values from the 10-fold cross-validation. Clearly, models using the combined data source (DOM + DAP point cloud) consistently achieved the highest prediction accuracy across all parameters.

Table 2: Performance of DNN Models for Estimating Forest Structural Parameters
Parameter	Data Source	R²	RMSE	rRMSE (%)
AD	DOM	0.337	1.12 cm	4.81
	DAP Point Cloud	0.391	1.07 cm	4.62
	Combined	0.613	0.95 cm	4.07
BA	DOM	0.482	5.85 m²/ha	10.93
	DAP Point Cloud	0.471	5.90 m²/ha	11.03
	Combined	0.744	4.37 m²/ha	8.16
HL	DOM	0.338	1.24 m	5.51
	DAP Point Cloud	0.582	0.99 m	4.38
	Combined	0.728	0.75 m	3.33
AGB	DOM	0.616	195.99 kg/plot	9.88
	DAP Point Cloud	0.502	223.70 kg/plot	11.28
	Combined	0.776	151.22 kg/plot	7.62

The order of data source performance varied by parameter. For BA and AGB, the ranking was Combined > DOM > DAP Point Cloud. For AD and HL, the ranking was Combined > DAP Point Cloud > DOM. This indicates that spectral features from DOM alone were more informative for estimating BA and AGB than point cloud features alone, whereas for height-related parameters (AD, HL), structural features from the point cloud were more critical. The synergy of both data sources always yielded the best result, with the AGB model achieving the highest $R^2$ of 0.776.

4.2 SHAP Interpretability Analysis

We applied SHAP analysis to the best-performing combined-feature DNN models. The global feature importance, ranked by the mean absolute SHAP value, is visualized for each parameter in Figure 1 (described narratively below). A key finding was the consistent dominance of point cloud-derived features in the top ranks.

Global Importance: For all four response variables, the point cloud height coefficient of variation ($H_{CV}$) was the most important feature. Other height percentiles (e.g., $H_{95}$, $H_{99}$, $H_{75}$) and metrics like canopy relief ratio ($H_{CRR}$) also ranked highly. Spectral features from the DOM, such as certain vegetation indices, appeared further down the list but still contributed to the models, particularly for AGB and BA estimation.

Table 3: Top 5 Most Important Features from SHAP Analysis for Combined Models
Rank	AD	BA	HL	AGB
1	H_CV (Point Cloud)	H_CV (Point Cloud)	H_CV (Point Cloud)	H_CV (Point Cloud)
2	H_CRR (Point Cloud)	H_95 (Point Cloud)	H_95 (Point Cloud)	H_95 (Point Cloud)
3	VEG Index (DOM)	H_99 (Point Cloud)	H_99 (Point Cloud)	H_99 (Point Cloud)
4	H_95 (Point Cloud)	H_MEAN (Point Cloud)	H_MEAN (Point Cloud)	H_MEAN (Point Cloud)
5	H_99 (Point Cloud)	GRRI Index (DOM)	H_CRR (Point Cloud)	H_CRR (Point Cloud)

Feature Effect Direction: The SHAP summary plots (bee swarm plots) revealed the direction of feature effects. For example, high values of $H_{CV}$ (indicating greater heterogeneity in canopy height) generally had a positive SHAP value for HL and AGB, meaning they increased the model’s prediction of these parameters. Conversely, for AD, the relationship was more complex and sometimes negative, suggesting that in these stands, more variable heights might be associated with smaller average diameters due to competition dynamics.

Local Interpretation: Force plots for individual samples illustrated how features combined to yield a specific prediction. For instance, in a plot with very high observed AGB, features like high $H_{95}$, high $H_{CV}$, and a high value of a specific vegetation index all contributed positive “pushes” to raise the prediction from the base value to near the observed value.

5. Discussion

Our study demonstrates the significant potential of integrating UAV drones with deep learning for forest structural parameter estimation. The DNN models effectively learned the complex, non-linear relationships between multisource remote sensing features and ground-measured parameters. The superior performance of models using combined DOM and DAP point cloud features underscores the complementary nature of spectral and structural information. Spectral indices capture canopy health and density, while point cloud metrics directly quantify height distribution and canopy architecture. This synergy is crucial for overcoming the saturation effect often encountered when using optical data alone for estimating high-biomass forests.

The DNN’s architecture, with its multiple hidden layers and non-linear activation functions, proved capable of integrating these diverse feature sets. Even with a moderately sized dataset (151 plots), the model achieved robust performance through cross-validation and regularization techniques like Dropout. This suggests that deep learning can be effectively applied in forestry even when large labeled datasets are not available, provided the model complexity is appropriately regularized.

The application of SHAP provided critical insights that move beyond a simple performance metric. The analysis confirmed our hypothesis about the importance of structural features: point cloud height metrics, particularly $H_{CV}$, were universally the most influential predictors. This aligns with ecological understanding, as the variation in tree height within a stand is a fundamental descriptor of its structure, competition, and successional stage, and is intrinsically linked to volume and biomass. The interpretability offered by SHAP transforms the DNN from a black-box into a tool for generating ecological hypotheses. For example, the strong role of $H_{CV}$ invites further investigation into how stand heterogeneity, possibly driven by management history or micro-topography, influences aggregate parameters like mean diameter.

Some limitations must be acknowledged. The study focused on a single species (Chinese fir) in a plantation setting. The generalizability of the specific models to other species or natural forests requires further testing. The accuracy for AD estimation, while improved by the combined model, was lower than for the other parameters. This is likely because DBH is a stem-level attribute less directly sensed by canopy-top measurements from UAV drones. Its estimation relies more on correlative relationships with canopy size and structure, which can be noisier. Future work could explore more advanced network architectures (e.g., Convolutional Neural Networks) to directly process image patches or point cloud segments, or the integration of multi-temporal UAV drone data to capture growth dynamics.

6. Conclusion

In this research, we developed and evaluated a framework for estimating key forest structural parameters using data from UAV drones and deep neural networks. The main conclusions are:

UAV drones equipped with standard RGB cameras can generate both high-resolution spectral imagery (DOM) and detailed 3D point clouds (DAP) that are rich sources of information for forest inventory.
Deep Neural Networks (DNNs) are highly effective at modeling the non-linear relationships between features extracted from UAV drone data and forest parameters such as mean diameter, basal area, height, and above-ground biomass.
A combined data source approach, utilizing both spectral and structural features, consistently yields the highest estimation accuracy, highlighting the complementary value of these data types.
The SHAP interpretability framework successfully elucidates the “black-box” nature of the DNN models. It quantifies feature importance and reveals that point cloud-derived metrics, especially the coefficient of variation of height ($H_{CV}$), are the primary drivers of model predictions, providing ecologically meaningful insights.

This integrated approach of UAV drones, deep learning, and interpretable AI offers a powerful, accurate, and insightful methodology for modern forest resource monitoring and management. It provides the necessary technical support for operationalizing efficient, low-cost, and repeatable forest inventories.