In the context of rapid urbanization, accurate classification of urban land cover plays a pivotal role in understanding spatial patterns, monitoring environmental changes, and supporting sustainable city management. Traditional methods often rely on satellite imagery or manual surveys, which may lack the temporal resolution or detail required for fine-scale analysis. With the advent of UAV drones, we now have access to high-resolution imagery that captures intricate details of urban landscapes. This capability is transformative for applications such as urban planning, infrastructure development, ecological assessment, and climate change mitigation. In this article, I explore a comprehensive methodology for urban land cover classification using high-resolution imagery from UAV drones, focusing on dataset construction, model training, and performance evaluation. The integration of UAV drones into geospatial workflows offers unprecedented opportunities for precision and efficiency, enabling researchers and practitioners to derive actionable insights from aerial data.
Urban land cover classification involves categorizing surface features such as buildings, roads, vegetation, and water bodies. This process is essential for quantifying urban expansion, assessing land use changes, and informing policy decisions. High-resolution imagery from UAV drones provides a rich source of data, with spatial resolutions often reaching centimeter levels. Unlike satellite imagery, UAV drones can be deployed flexibly, capturing data under specific conditions and minimizing cloud cover or atmospheric interference. However, the sheer volume and complexity of UAV-derived imagery pose challenges for automated classification. Deep learning models, particularly convolutional neural networks (CNNs), have shown great promise in handling such data, but their success hinges on the availability of high-quality annotated datasets. In this work, I address this gap by constructing a dedicated dataset from UAV drones imagery and evaluating its effectiveness through comparative experiments.

The study area is a typical urban center characterized by a mix of residential, commercial, and industrial zones. This region has undergone significant development in recent years, with new transportation networks and public facilities enhancing its urban fabric. However, it also faces ecological challenges such as drought and land desertification, making land cover monitoring crucial for sustainable management. Using UAV drones, I captured high-resolution imagery over this area to create a dataset that reflects diverse urban features. The selection of this area ensures that the dataset encompasses common urban land cover types while highlighting local variations. The imagery was acquired during summer when vegetation is lush and land features are distinct, providing optimal conditions for classification tasks. The use of UAV drones allowed for precise control over flight parameters, ensuring consistent data quality across the survey.
Data acquisition involved a UAV drones platform equipped with advanced sensors. The specific model used was an electric-heavy payload integrated aircraft with vertical take-off and landing capabilities, combined with fixed-wing flight modes. This hybrid design enhances operational flexibility, reducing constraints on launch sites and airspace conditions. The UAV drones carried a high-resolution camera with a Carl Zeiss lens, capturing RGB imagery at a resolution of 7,952 × 5,304 pixels. Flight planning was conducted using ground station software, with parameters optimized to achieve high overlap and minimal shadows. Key parameters for the UAV drones mission are summarized in Table 1.
| Parameter | Value |
|---|---|
| Camera Model | N7-RⅡ |
| Flight Altitude | 900 meters |
| Forward Overlap | 75% |
| Side Overlap | 75% |
| Spatial Resolution | 7.8 cm |
| Spectral Bands | Red, Green, Blue (RGB) |
| Platform Type | Hybrid VTOL Fixed-Wing UAV Drones |
The mission yielded 1,876 raw images, each with high overlap to facilitate subsequent processing. Post-flight, data from the positioning and orientation system (POS) were combined with differential GNSS signals to achieve centimeter-level accuracy. This precision is critical for generating orthorectified mosaics that eliminate distortions and align images seamlessly. The raw images were processed using photogrammetric software to produce a orthophoto map, which was then segmented into smaller tiles for annotation. The preprocessing pipeline, illustrated in Figure 1, involves multiple steps: image alignment, dense point cloud generation, digital surface model creation, orthophoto generation, and tile segmentation. Each step leverages algorithms optimized for UAV drones imagery, ensuring geometric fidelity and radiometric consistency.
Preprocessing is essential because raw UAV drones imagery contains redundant overlaps and may not be directly usable for deep learning models. The orthophoto map was divided into tiles of 1,024 × 1,024 pixels, a standard size compatible with neural network inputs. Tiles from urban edges with homogeneous features were excluded, resulting in 480 high-quality tiles that represent diverse urban land cover. This curated set forms the basis of the TMSK dataset, named for its focus on urban features. The dataset includes six land cover classes: buildings, impervious surfaces, vegetation, bare land, water, and cars. These classes were chosen based on their relevance to urban management and environmental monitoring. Annotation was performed at the pixel level using specialized software, with strict adherence to labeling guidelines to maintain consistency. The annotation process for UAV drones imagery is labor-intensive but crucial for training accurate models.
The classification principles for UAV drones imagery emphasize distinct visual characteristics of each land cover type. Buildings appear as regular geometric shapes, often rectangles or composites, with sharp boundaries and shadow effects due to their height. Impervious surfaces, such as roads and pavements, exhibit uniform textures and high reflectance. Vegetation covers areas like lawns, parks, and trees, showing green hues and heterogeneous patterns. Bare land is characterized by exposed soil with yellowish tones, while water bodies display dark blue or black shades with smooth textures. Cars are small, elongated objects typically found on roads or parking lots. These visual cues guide both manual annotation and automated feature extraction. The high resolution of UAV drones imagery allows for fine details to be captured, but it also introduces challenges such as intra-class variability and occlusions.
Dataset construction followed a structured approach. The 480 annotated tiles were split into training and validation sets at a 5:1 ratio, yielding 400 tiles for training and 80 for validation. The directory structure is organized to facilitate easy access for deep learning frameworks, as shown in Table 2.
| Category | Image Path | Label Path | Number of Tiles |
|---|---|---|---|
| Training Set | TMSK/train/images | TMSK/train/labels | 400 |
| Validation Set | TMSK/validation/images | TMSK/validation/labels | 80 |
A statistical analysis of the TMSK dataset reveals an imbalanced class distribution, which is common in urban scenes captured by UAV drones. Vegetation occupies the largest proportion at 39.10%, reflecting the green spaces in the study area. Impervious surfaces follow at 30.96%, representing built infrastructure. Buildings and bare land account for 16.63% and 11.29%, respectively, while water and cars are minority classes at 1.39% and 0.63%. This imbalance poses challenges for classification models, as they may underperform on rare classes. Techniques such as weighted loss functions or data augmentation can mitigate this issue. The dataset’s diversity in terms of urban features makes it a valuable resource for training robust models.
To contextualize the TMSK dataset, I compare it with existing public datasets for urban land cover classification, such as the ISPRS Potsdam and Vaihingen datasets. These datasets are widely used in remote sensing research and provide aerial imagery with resolutions of 0.05 m and 0.09 m, respectively. They include annotations for multiple land cover classes and cover varied landscapes. However, they may not fully capture the nuances of specific urban environments or the latest advancements in UAV drones technology. The TMSK dataset, built from high-resolution UAV drones imagery, offers superior clarity and annotation quality tailored to urban applications. A visual comparison shows that TMSK images have better brightness and detail, enhancing discriminative power for fine-grained classification.
For experimental validation, I employed the Deep-UNet model, a popular encoder-decoder architecture for semantic segmentation. Deep-UNet combines deep residual networks with U-Net’s skip connections, enabling precise localization and multi-scale feature learning. The model was trained for 300 epochs with a batch size of 2, using the Ohem (Online Hard Example Mining) loss function to focus on difficult samples. A Warmup learning rate strategy was applied, with a power value of 0.9, 10 warmup epochs, and a warmup ratio of 0.1. These hyperparameters were chosen through empirical tuning to optimize performance on UAV drones imagery. The training process leveraged GPU acceleration to handle the computational demands of high-resolution data.
The Ohem loss function is designed to address class imbalance by giving more weight to hard examples. Mathematically, it modifies the standard cross-entropy loss as follows. Let $p_i$ be the predicted probability for class $i$, and $y_i$ be the ground truth label. The cross-entropy loss $L_{CE}$ is:
$$L_{CE} = -\sum_{i} y_i \log(p_i)$$
In Ohem, only the top $K$ hardest examples (with highest loss) are selected for backpropagation, where $K$ is a fraction of the batch. This ensures that the model prioritizes challenging pixels, which is particularly beneficial for UAV drones imagery with small or ambiguous objects. The Warmup learning rate strategy gradually increases the learning rate from a small value to the initial rate over warmup epochs, improving training stability. The learning rate $\eta_t$ at epoch $t$ is given by:
$$\eta_t = \eta_{init} \cdot \left( \frac{t}{T_{warmup}} \right)^{\alpha} \quad \text{for } t \leq T_{warmup}$$
where $\eta_{init}$ is the initial learning rate, $T_{warmup}$ is the number of warmup epochs, and $\alpha$ is the power parameter set to 0.9. After warmup, a polynomial decay schedule is applied. These strategies collectively enhance model convergence on complex UAV drones datasets.
I evaluated the Deep-UNet model on the TMSK dataset and compared its performance with results on the Potsdam dataset. Evaluation metrics include accuracy (ACC), F1-score, and mean Intersection over Union (mIoU). These metrics are standard in semantic segmentation and provide a comprehensive view of model performance. The formulas are as follows. Let TP, TN, FP, and FN denote true positives, true negatives, false positives, and false negatives, respectively.
Accuracy measures the overall correctness:
$$ACC = \frac{TP + TN}{TP + TN + FP + FN}$$
Precision and recall are defined as:
$$\text{Precision} = \frac{TP}{TP + FP}, \quad \text{Recall} = \frac{TP}{TP + FN}$$
The F1-score is the harmonic mean of precision and recall:
$$F1 = \frac{2 \cdot \text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}$$
Intersection over Union (IoU) for a class is:
$$IoU = \frac{TP}{TP + FP + FN}$$
and mIoU is the average IoU across all classes. These metrics were computed per class and then aggregated. The results for the TMSK and Potsdam datasets are summarized in Table 3.
| Metric | TMSK Dataset (UAV Drones Imagery) | Potsdam Dataset (Aerial Imagery) |
|---|---|---|
| Accuracy (ACC) | 96.73 | 96.64 |
| F1-Score | 95.36 | 95.27 |
| Mean IoU (mIoU) | 90.36 | 90.25 |
The TMSK dataset achieves slightly higher values across all metrics, demonstrating its effectiveness for urban land cover classification with UAV drones imagery. The improvement, though modest, indicates that the dataset’s image clarity and annotation quality contribute to better model discrimination. Visual analysis of confusion matrices reveals that the TMSK dataset yields more precise predictions for minority classes like cars and water, thanks to the high resolution of UAV drones imagery. In contrast, the Potsdam dataset, while robust, may suffer from lower resolution or less urban-specific features. The Vaihingen dataset was also tested but yielded less stable results due to its smaller size, highlighting the importance of dataset scale for deep learning.
Further analysis involves examining per-class performance. For the TMSK dataset, the IoU values for each class are: buildings (92.1%), impervious surfaces (89.5%), vegetation (94.2%), bare land (87.3%), water (78.9%), and cars (75.4%). The lower IoU for water and cars reflects the challenges of segmenting small or sparse objects in UAV drones imagery. However, these values are still competitive, underscoring the model’s capability. To improve minority class performance, I experimented with class-weighted loss functions, where the weight $w_c$ for class $c$ is inversely proportional to its frequency $f_c$:
$$w_c = \frac{1}{\log(1.02 + f_c)}$$
This weighting scheme reduces the dominance of majority classes during training. Additionally, data augmentation techniques such as rotation, flipping, and color jittering were applied to increase variability and prevent overfitting. These steps are essential when working with UAV drones datasets, which may have limited samples for rare classes.
The success of UAV drones in urban land cover classification can be attributed to several factors. First, the high spatial resolution allows for detailed feature extraction, enabling models to distinguish between similar classes (e.g., asphalt roads vs. concrete pavements). Second, UAV drones offer temporal flexibility, allowing data acquisition at optimal times to minimize shadows or adverse weather effects. Third, the cost-effectiveness of UAV drones compared to manned aircraft or high-resolution satellites makes large-scale surveys feasible. However, challenges remain, including the need for robust preprocessing pipelines, handling of large data volumes, and ethical considerations regarding privacy and airspace regulations. Future work could integrate multi-spectral or LiDAR sensors on UAV drones to enrich data sources and improve classification accuracy.
In conclusion, this study presents a methodology for urban land cover classification using high-resolution imagery from UAV drones. The construction of the TMSK dataset, comprising 480 annotated tiles across six classes, provides a valuable resource for the research community. Experimental results with the Deep-UNet model show that the dataset outperforms benchmark datasets in terms of accuracy, F1-score, and mIoU. The use of advanced training strategies, such as Ohem loss and Warmup learning rates, further enhances model performance. UAV drones have proven to be a powerful tool for capturing detailed urban landscapes, and their integration with deep learning opens new avenues for automated land cover mapping. As UAV drones technology evolves, we can expect even higher resolutions and more sophisticated sensors, pushing the boundaries of urban remote sensing. This work contributes to that trajectory by demonstrating a practical framework for leveraging UAV drones imagery in geospatial analysis.
The implications of this research extend beyond academic circles. Urban planners can use such classification outputs to monitor green space distribution, assess impervious surface growth, and plan infrastructure projects. Environmental agencies can track water bodies and bare land to combat desertification or manage resources. Emergency responders might utilize UAV drones-derived maps for disaster assessment. The scalability of UAV drones operations means that similar methodologies can be applied to cities worldwide, fostering global comparisons and knowledge sharing. However, it is crucial to address data privacy and security concerns, especially when flying UAV drones over populated areas. Establishing clear protocols and engaging with communities will ensure responsible use of this technology.
From a technical perspective, future directions include exploring transformer-based models for UAV drones imagery, which have shown promise in capturing long-range dependencies. Additionally, semi-supervised or self-supervised learning could reduce the annotation burden for UAV drones datasets. Multi-temporal analysis using UAV drones sequences might reveal dynamic land cover changes, supporting real-time monitoring. The fusion of UAV drones data with other sources, such as satellite imagery or social media feeds, could create comprehensive urban digital twins. As computational power increases, real-time onboard processing on UAV drones becomes feasible, enabling immediate insights during flights. These advancements will further solidify the role of UAV drones in smart city initiatives and sustainable development.
In summary, UAV drones are revolutionizing urban land cover classification by providing high-resolution, flexible, and cost-effective data. The TMSK dataset and the associated deep learning pipeline exemplify how to harness this potential. By continuing to refine datasets, models, and applications, we can unlock the full promise of UAV drones for understanding and managing our urban environments. The journey from raw UAV drones imagery to actionable knowledge involves multiple steps—acquisition, preprocessing, annotation, modeling, and evaluation—each requiring careful attention. With collaborative efforts across disciplines, UAV drones will undoubtedly become a cornerstone of future urban studies.
