Tree Species Identification Using Drone-Based Hyperspectral and LiDAR Data

Accurately identifying tree species and their spatial distribution is fundamental for monitoring forest diversity, which is critical for forest conservation, management, and sustainable development. Traditional field surveys, while accurate, are often costly, time-consuming, and limited in scale. The advent of remote sensing technologies, particularly those deployed on Unmanned Aerial Vehicles (UAVs), offers a promising alternative. UAVs, or drones, provide high-resolution data collection capabilities with significant flexibility and lower operational costs compared to manned aircraft or satellites. In this study, I explore the integration of drone-acquired hyperspectral imagery and LiDAR (Light Detection and Ranging) point cloud data for precise tree species identification in mixed coniferous-broadleaf forests. By developing an enhanced convolutional neural network (CNN) model, I aim to improve classification accuracy and subsequently assess tree species diversity within the study area.

The utilization of drone technology for forestry applications has gained momentum due to its ability to capture detailed structural and spectral information. Hyperspectral sensors onboard UAVs capture data across numerous narrow bands, enabling the detection of subtle spectral differences among tree species. LiDAR systems, similarly deployed on drones, provide precise three-dimensional information about forest structure, such as tree height and canopy morphology. When combined, these datasets offer a comprehensive view of the forest, facilitating more accurate species discrimination. This research leverages these advantages by fusing hyperspectral and LiDAR data at the individual tree level, extracted using a Canopy Height Model (CHM). The core of my methodology involves a modified CNN architecture, termed CNN-EGNet, which incorporates an Efficient Channel Attention (ECA) mechanism and a Global Average Pooling (GAP) layer to enhance feature learning and classification performance.

My investigation focuses on a temperate mixed forest region, where I collected UAV-based hyperspectral and LiDAR data. After preprocessing and individual tree segmentation, I extracted features from both data sources. The fused feature set was used to train and evaluate the proposed CNN-EGNet model against traditional CNN models like VGG16, VGG19, and GoogLeNet. The results demonstrate the superior performance of my approach, achieving high accuracy in tree species identification. Based on the classification results, I computed various diversity indices to characterize the forest’s species composition. This work underscores the potential of integrating multi-source drone data with advanced deep learning techniques for efficient and large-scale forest monitoring.

Data Acquisition and Preprocessing

The data for this study were acquired using drone technology in a mixed forest area. A UAV platform equipped with a LiDAR sensor was flown at an altitude of 80 meters, capturing high-density point clouds. Simultaneously, a hyperspectral camera mounted on another drone was operated at 200 meters altitude, collecting imagery across 112 spectral bands ranging from 400 to 1000 nm. The use of Unmanned Aerial Vehicles ensured high spatial resolution and precise georeferencing through integrated GPS and IMU systems. Field surveys were conducted to collect ground truth data, including tree species and locations, which were used for labeling the dataset.

Preprocessing of the LiDAR data involved noise removal, ground point classification, and generation of Digital Elevation Models (DEM) and Digital Surface Models (DSM). The Canopy Height Model (CHM) was derived by subtracting DEM from DSM, represented as:

$$ \text{CHM} = \text{DSM} – \text{DEM} $$

Individual tree crowns were segmented from the CHM using a seed-based approach. The segmentation accuracy was evaluated based on the number of correctly identified trees compared to field measurements. For the hyperspectral data, geometric correction was performed by co-registering the images with UAV-captured RGB imagery using spline functions in GIS software. The individual tree polygons obtained from LiDAR segmentation were then used to extract corresponding hyperspectral image patches for each tree, ensuring alignment between the structural and spectral data.

Feature Extraction and Dataset Construction

From the LiDAR point clouds, I extracted 95 features for each individual tree, encompassing height and intensity metrics. Height variables included statistics such as mean, standard deviation, percentiles, and canopy roughness. Intensity variables described the return signal strength, including mean intensity, intensity percentiles, and cumulative distribution measures. These features capture the three-dimensional structural characteristics of trees, which are crucial for species discrimination.

For the hyperspectral data, I extracted the spectral reflectance values across all 112 bands for the central pixel of each tree crown to minimize mixed pixel effects. The spectral profiles of different tree species show distinct patterns, particularly in the visible (500-600 nm) and near-infrared (700-1000 nm) regions. The fusion of LiDAR and hyperspectral features resulted in a comprehensive dataset where each sample (tree) is represented by a combined feature vector. The dataset was labeled with species information from field surveys and split into training (80%) and validation (20%) sets for model development.

Summary of Extracted Features from LiDAR and Hyperspectral Data
Data Type	Feature Category	Number of Features	Examples
LiDAR	Height Variables	50	Mean height, Height percentiles (10th, 20th, …, 90th), Canopy relief ratio
LiDAR	Intensity Variables	45	Mean intensity, Intensity percentiles, Standard deviation of intensity
Hyperspectral	Spectral Bands	112	Reflectance at 400 nm, 401 nm, …, 1000 nm

Model Development: CNN-EGNet

Convolutional Neural Networks are particularly effective for automated feature learning from complex data. My proposed CNN-EGNet model builds upon the VGG16 architecture but incorporates several enhancements to improve performance and efficiency for tree species classification. The key modifications include the integration of an ECA attention module and replacement of fully connected layers with a Global Average Pooling layer.

The ECA module enhances feature representation by performing efficient channel-wise attention without dimensionality reduction. It applies a 1D convolution with adaptive kernel size to capture cross-channel interactions, which helps the model focus on informative spectral and structural features. The attention weights for each channel are computed as:

$$ \omega = \sigma(\text{Conv1D}(g)) $$

where $ g $ is the global averaged feature, $ \text{Conv1D} $ is a one-dimensional convolution, and $ \sigma $ is the sigmoid activation function. These weights are then used to recalibrate the input features.

The Global Average Pooling layer replaces the first two fully connected layers of VGG16, reducing the number of parameters and mitigating overfitting. For a feature map $ F $ of size $ H \times W \times C $, GAP computes the spatial average for each channel:

$$ G_c = \frac{1}{H \times W} \sum_{i=1}^{H} \sum_{j=1}^{W} F_{i,j,c} $$

The CNN-EGNet architecture consists of multiple convolutional layers with ECA modules, followed by max-pooling layers and dropout for regularization. Batch normalization is applied to accelerate convergence. The final layer uses a softmax activation for multi-class classification. The model was trained using the Adam optimizer with a learning rate of 0.001 and a categorical cross-entropy loss function over 150 epochs.

CNN-EGNet Architecture Details
Layer Type	Configuration	Output Size
Input	Fused Feature Vector	207 × 1
Convolution + ECA	64 filters, 3×3, ReLU	207 × 64
Max Pooling	2×2	103 × 64
Convolution + ECA	128 filters, 3×3, ReLU	103 × 128
Max Pooling	2×2	51 × 128
Convolution + ECA	256 filters, 3×3, ReLU	51 × 256
Max Pooling	2×2	25 × 256
Global Average Pooling	–	256
Dropout (0.5)	–	256
Softmax	6 classes	6

Experimental Results and Analysis

The performance of CNN-EGNet was compared against three established CNN models: VGG16, VGG19, and GoogLeNet. All models were trained and evaluated on the same dataset using weighted precision, recall, F1-score, overall accuracy (OA), and Kappa coefficient as metrics.

The training loss curves indicated that CNN-EGNet converged smoothly after approximately 120 epochs, whereas other models exhibited fluctuations or stagnation. This demonstrates the stability and efficiency of the proposed architecture. In terms of computational time, CNN-EGNet required 240.10 seconds for training, which is significantly less than VGG16 (518.10 s) and VGG19 (548.44 s), though GoogLeNet was faster at 93.75 seconds. However, GoogLeNet’s classification performance was inferior.

Performance Comparison of Different CNN Models
Model	Weighted Precision	Weighted Recall	Weighted F1-score	Overall Accuracy (%)	Kappa Coefficient	Training Time (s)
CNN-EGNet	0.90	0.89	0.89	89.58	0.8661	240.10
VGG16	0.81	0.80	0.79	80.21	0.7486	518.10
VGG19	0.85	0.84	0.84	84.38	0.8009	548.44
GoogLeNet	0.77	0.75	0.75	75.00	0.6765	93.75

CNN-EGNet achieved an overall accuracy of 89.58% and a Kappa coefficient of 0.8661, outperforming the other models. Specifically, it improved OA by 9.37%, 5.20%, and 14.58% over VGG16, VGG19, and GoogLeNet, respectively. The confusion matrix analysis revealed that CNN-EGNet excelled in distinguishing species like Elm, Walnut, and Birch, which were often misclassified by other models. This highlights the effectiveness of the ECA mechanism in capturing discriminative features.

To validate the contributions of individual components, I conducted an ablation study. Training a model with only the GAP modification yielded a weighted F1-score of 0.86, while a model with only ECA achieved 0.77. The combined approach in CNN-EGNet produced the best results, confirming the synergistic effect of both modifications. Furthermore, the parameter reduction from using GAP was substantial, decreasing the number of parameters in the final fully connected layer by 21,504, which enhances model efficiency.

The robustness of CNN-EGNet was assessed through 5-fold cross-validation. The average overall accuracy across folds was 84.78%, with a mean Kappa of 0.8089, indicating consistent performance despite variations in the data splits.

Tree Species Identification and Diversity Assessment

Using the trained CNN-EGNet model, I predicted tree species for the entire study area. The classification map shows distinct spatial patterns, with dominant species like Birch, Larch, Walnut, and Elm forming clustered distributions. Birch was the most widespread, while Larch was concentrated in the northwestern section. Walnut and Elm were primarily found in the southeastern part. Other species, grouped as “Others,” were sporadically distributed.

Based on the species identification results, I calculated tree species diversity indices for 40 m × 40 m grid cells covering the study area. The indices included Shannon-Wiener diversity index ($ H’ $), Simpson diversity index ($ D $), Pielou’s evenness index ($ J’ $), and Species richness ($ S $). These were computed as follows:

Shannon-Wiener Index:

$$ H’ = -\sum_{i=1}^{S} p_i \ln p_i $$

Simpson Index:

$$ D = 1 – \sum_{i=1}^{S} p_i^2 $$

Pielou’s Evenness Index:

$$ J’ = \frac{H’}{\ln S} $$

Species Richness:

$$ S = \text{Number of species in the plot} $$

where $ p_i $ is the proportion of individuals belonging to species $ i $.

The diversity analysis revealed that the Shannon-Wiener index mostly ranged between 0.8 and 1.4, indicating moderate diversity. Simpson index values were concentrated between 0.5 and 0.7, suggesting that a few dominant species are present. Pielou’s index was generally high (0.7-0.95), reflecting relatively even species distribution in most plots. Species richness varied from 3 to 5, confirming that the forest is composed of several co-occurring species but with some dominance. The spatial distribution of these indices can be visualized through interpolation maps, providing insights into biodiversity patterns across the landscape.

Summary of Tree Species Diversity Indices in the Study Area
Diversity Index	Range	Interpretation
Shannon-Wiener ($ H’ $)	0.8 – 1.4	Moderate species diversity
Simpson ($ D $)	0.5 – 0.7	Presence of dominant species
Pielou ($ J’ $)	0.7 – 0.95	High species evenness
Species Richness ($ S $)	3 – 5	Moderate number of species per plot

Discussion

The integration of drone-based hyperspectral and LiDAR data has proven highly effective for tree species identification. The complementary nature of spectral and structural information allows for better discrimination among species with similar appearances. My proposed CNN-EGNet model successfully leverages this fused data through its enhanced architecture. The ECA attention mechanism enables the model to prioritize informative features adaptively, while the GAP layer reduces overfitting and computational burden. The significant improvement in accuracy over traditional CNNs underscores the importance of model optimization for specific remote sensing tasks.

The application of Unmanned Aerial Vehicle technology in forestry is rapidly evolving, and this study contributes to its advancement by demonstrating a robust workflow for species-level mapping. The ability to accurately identify trees from UAV data opens up possibilities for large-scale biodiversity assessments, forest health monitoring, and precision forestry. The diversity indices derived from the classification results provide ecologically meaningful metrics that can inform conservation strategies and management practices.

Challenges remain, such as improving individual tree segmentation in dense canopies and extending the approach to larger areas. Future work could explore the use of multi-temporal drone data to capture phenological variations, which may further enhance species discrimination. Additionally, transferring the model to other forest types would test its generalizability.

Conclusion

In this research, I developed a novel CNN-based framework, CNN-EGNet, for tree species identification using fused hyperspectral and LiDAR data acquired by drones. The model incorporates an ECA attention module and Global Average Pooling to improve feature learning and classification performance. Experimental results confirm that CNN-EGNet outperforms conventional CNN models, achieving high accuracy and reliability. The subsequent diversity assessment reveals patterns of species distribution and dominance in the mixed forest. This work highlights the value of integrating multi-source remote sensing data from Unmanned Aerial Vehicles with advanced deep learning techniques for efficient forest monitoring and biodiversity conservation. The methodologies presented here can be adapted for similar applications in other ecosystems, contributing to sustainable forest management efforts.