Landslide hazards are characterized by their sudden onset and devastating impact, making accurate deformation monitoring crucial for early warning and emergency decision-making. Unmanned Aerial Vehicle (UAV) remote sensing technology has emerged as a pivotal tool in this domain, leveraging its high mobility, high-resolution imaging capabilities, and cost-effectiveness. By capturing multi-temporal image datasets, three-dimensional point clouds and Digital Elevation Models (DEMs) can be generated. These products, when analyzed using algorithms like the Multiscale Model-to-Model Cloud Comparison (M3C2), enable centimeter-level displacement monitoring of landslide surfaces. However, UAV imaging is often compromised by environmental and sensor-related factors such as haze, fluctuating illumination, and system noise. These interferences manifest as blurry images, pronounced noise, low contrast, and indistinct edges, which can introduce significant errors into deformation calculations and reduce the overall reliability of monitoring systems. Therefore, developing an efficient image preprocessing pipeline is a fundamental prerequisite for high-precision landslide monitoring and research.

Current preprocessing techniques for UAV imagery can be broadly categorized into single-modal improvements and multimodal fusion. Single-modal methods focus on addressing a specific type of degradation—such as haze, noise, or low contrast—based on a single functional goal or feature space. For instance, specific dehazing algorithms based on atmospheric scattering models or advanced denoising filters fall into this category. While effective for their targeted issue, these methods often struggle to balance competing objectives like noise suppression, detail preservation, and spectral fidelity. Conversely, multimodal fusion technologies, which integrate various functional targets and algorithms across different feature spaces, have become a research hotspot for holistic image quality enhancement. Examples include methods combining depth information for low-light enhancement or employing generative adversarial networks for adaptive restoration. Despite progress, existing multimodal approaches often face challenges in optimal modality feature synergy and maintaining the spectral authenticity of the original scene, which is vital for subsequent geotechnical interpretation.
To overcome these limitations, this paper introduces a novel preprocessing methodology based on multimodal collaboration, specifically designed for the complex imaging conditions encountered in landslide monitoring using China UAV drone platforms. The core of this method is a progressive, synergistic pipeline structured around denoising, dehazing, enhancement, and fusion. This workflow systematically addresses compound degradations to provide superior quality data for downstream deformation analysis.
1. The Multimodal Collaborative Preprocessing Framework
The proposed framework is built upon a logic of progressive optimization, corresponding to three core stages: interference suppression, feature enhancement, and information fidelity. Each stage employs carefully selected or improved algorithms that work in concert to refine image quality.
1.1 Denoising Stage: Improved Non-Local Means (NLM) Algorithm
UAV imagery is inherently susceptible to noise from sensor electronics and transmission, which obscures critical landslide features like cracks and texture boundaries. Effective noise suppression is the first critical step. The standard Non-Local Means (NLM) algorithm denoises an image by averaging pixels across the entire image, weighted by the similarity of their surrounding patches. While effective for flat regions, it tends to over-smooth edges. Our improved NLM algorithm introduces a Sobel-edge-weighted adjustment to the filtering parameter \(h\).
First, a Sobel operator extracts edge information, and an edge weight map \(e(x)\) is constructed based on edge intensity. The denoising parameter \(h\) is then dynamically adjusted: smaller \(h\) values are used in edge regions to reduce smoothing strength and preserve details, while larger \(h\) values are applied in smooth regions for stronger noise suppression. This edge-aware modification prevents the “indiscriminate smoothing” of traditional NLM. The denoised pixel value \(NLM(I(x))\) is calculated as:
$$ NLM(I(x)) = \sum_{y \in \Omega(x)} \omega(x, y) \cdot I(y) $$
where \(I(x)\) is the original pixel intensity at location \(x\), \(\Omega(x)\) is the search window, and \(\omega(x, y)\) is the similarity weight. The modified weight \(\omega'(x, y)\) incorporating the edge weight \(e(x)\) is:
$$ \omega'(x, y) = \frac{\omega(x, y) \cdot e(x)}{\sum_{y \in \Omega(x)} \omega(x, y) \cdot e(x)} $$
This step provides a high signal-to-noise-ratio image base for subsequent processing, preventing noise amplification in later stages. The efficacy of denoising is evaluated using Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and a noise metric.
1.2 Dehazing Stage: Adaptive Dark Channel Prior (DCP)
Landslide-prone areas, especially in regions like Southwest China, are frequently shrouded in high-humidity haze, severely reducing image clarity and scene discernibility. The Dark Channel Prior (DCP) is a powerful physics-based dehazing method. It relies on the observation that in most non-sky patches of a haze-free image, at least one color channel has very low intensity. The standard DCP uses a constant parameter \(\omega\) to retain a slight amount of haze for naturalness, which lacks adaptability.
Our adaptive DCP algorithm introduces a fog-concentration-based dynamic adjustment for \(\omega\). The fog concentration is estimated from the mean value of the dark channel. A 7×7 median filter is applied to optimize the dark channel estimation, and the atmospheric light \(A\) is estimated from the top 0.1% brightest pixels in the dark channel. The transmission map \(t(x)\) is then estimated as:
$$ t(x) = 1 – \omega \cdot \min_{c} \left( \frac{I^{c}(x)}{A^{c}} \right) $$
where \(c\) is the color channel (R, G, B). The parameter \(\omega\) is dynamically set:
- \(\omega = 0.15\) if fog concentration > 0.6 (dense fog).
- \(\omega = 0.30\) if fog concentration < 0.3 (light fog).
- \(\omega = 0.20\) (default) otherwise.
The transmission \(t(x)\) is clamped to a minimum of 0.5 to prevent over-dehazing artifacts. Finally, the haze-free image \(J(x)\) is recovered using the atmospheric scattering model:
$$ J(x) = \frac{I(x) – A}{\max(t(x), t_{0})} + A $$
where \(t_{0}\) is a lower bound (typically 0.1). This step is crucial for restoring the true radiometric and geometric features of landslide surfaces, providing clear input for the enhancement stage. Evaluation metrics include sharpness, contrast, and SSIM.
1.3 Enhancement Stage: CLAHE-GC Synergy and Unsharp Masking
After denoising and dehazing, images are clearer but may still lack sufficient local contrast and sharpness for detecting subtle deformation features. We employ a two-pronged enhancement strategy.
First, a Contrast Limited Adaptive Histogram Equalization-Gamma Correction (CLAHE-GC)协同 algorithm is used to optimize local contrast. Standard CLAHE with a fixed clip limit can amplify noise in homogeneous regions or fail to sufficiently enhance textured areas. Our innovation is an “entropy-driven” dynamic parameter adjustment. The global entropy \(E_{global}\) of the grayscale image quantifies overall texture complexity. The clip limit for CLAHE and the gamma value for subsequent GC are then adaptively determined:
$$ \text{entropy\_norm} = \min\left(\max\left(\frac{E_{global}}{E_{max}}, 0\right), 1\right) $$
$$ \text{clip\_limit} = 1.0 + 2.0 \times \text{entropy\_norm} $$
$$ \text{gamma} = 1.2 – 0.6 \times \text{entropy\_norm} $$
where \(E_{max}\) is the maximum theoretical entropy for an 8-bit image. This ensures that low-texture (high entropy) areas receive gentle enhancement to avoid noise, while high-texture (low entropy) areas receive stronger contrast boost.
Second, Unsharp Masking (USM) is applied to sharpen edges and accentuate micro-textures essential for landslide monitoring, such as fissure boundaries and soil granularity. The sharpened image \(I_{sharpened}\) is obtained by:
$$ I_{sharpened} = I_{original} + \alpha \cdot (I_{original} – I_{blurred}) $$
where \(I_{original}\) is the image from the previous step, \(I_{blurred}\) is its Gaussian-blurred version, and \(\alpha\) is a weighting coefficient controlling the sharpening strength. The performance of the enhancement stage is assessed using contrast, angular second moment (energy), and homogeneity metrics.
1.4 Fusion Stage: Dynamic Image Fusion (IF) Based on Saturation
The preceding steps significantly improve detail and clarity but may alter the original spectral signatures or over-enhance local areas. To preserve the fidelity of spectral information—critical for vegetation identification and material classification in landslide analysis—a dynamic fusion with the original image is performed.
Considering the typical high vegetation cover in landslide areas, the fusion is guided by the saturation channel in the HSV color space. Regions are classified as vegetation-dominant or non-vegetation based on saturation. A dynamic weight \(\alpha\) is applied in the fusion equation:
$$ F(x,y) = \alpha \times P(x,y) + (1-\alpha) \times O(x,y) $$
where \(F(x,y)\) is the final fused image, \(P(x,y)\) is the processed image (after denoising, dehazing, enhancement), and \(O(x,y)\) is the original UAV image.
- For vegetation-dominant areas (saturation > threshold, typically covering >50% of a region block), \(\alpha = 0.3\). This prioritizes the original spectral information (70% weight) to minimize distortion of vegetation signatures.
- For non-vegetation areas (bare soil, rock, roads), \(\alpha = 0.8\). This prioritizes the enhanced structural and textural details (80% weight) from the processed image.
This stage ensures the final output retains the optimal balance between enhanced landslide features and authentic spectral properties. The Spectral Angle Mapper (SAM) metric is used to evaluate spectral fidelity preservation.
2. Case Study: Application to the Shuigoutou Landslide
The proposed methodology was validated using data from the Shuigoutou landslide in Hanyuan County, Sichuan Province, China. This site presents a challenging environment with high humidity (often >80%), significant illumination variation between sunny and shaded slopes (>35% irradiance difference), and dense vegetation cover (NDVI > 0.7 over 62% of the area). These conditions create the perfect testbed for the multimodal preprocessing framework.
Data acquisition was performed using a DJI Matrice 300 RTK China UAV drone equipped with a Zenmuse P1 35mm full-frame camera. Two flight campaigns were conducted on July 30 and August 1, 2024, following a terrain-following nadir photography mission plan. Key flight parameters are summarized below:
| Parameter | Value | Parameter | Value |
|---|---|---|---|
| Terrain-following Height | 150 m | Maximum Aperture | f/2.97 |
| Image Resolution | 8192 × 5460 px | Exposure Time | 1/1000 s |
| Ground Sampling Distance (GSD) | 1.88 cm/px | Forward Overlap | 80% |
| ISO Speed | ISO-570 | Side Overlap | 70% |
| Focal Length | 35 mm | Total Flight Line Length | 12711 m |
For each date, 681 images were processed. The raw imagery exhibited visible haze, uneven illumination, and sensor noise, confirming the necessity for robust preprocessing.
2.1 Preprocessing Results and Quantitative Evaluation
The proposed method was compared against three representative combination algorithms to demonstrate its superiority:
- Combination A: Median Filter + Adaptive DCP + USM (representing traditional engineering preprocessing).
- Combination B: Histogram Equalization (HE) + Adaptive DCP + Dynamic IF (representing a common multimodal approach).
- Combination C: DnCNN (a deep learning denoiser) + Adaptive DCP + CLAHE-GC (representing a deep learning-enhanced pipeline).
Visually, our method produced results with vibrant colors, clearly enhanced shadow details, sharp edges, and excellent noise suppression. In contrast, Combination A resulted in blurred edges; Combination B suffered from over-enhancement and color distortion; and Combination C yielded darker images with residual noise.
A quantitative evaluation was conducted on a representative sample image, measuring eight key quality metrics. The results conclusively demonstrate the advantages of our multimodal collaborative approach.
| Method | Sharpness | Noise | Contrast | Angular Second Moment | Homogeneity | PSNR (dB) | SSIM | SAM |
|---|---|---|---|---|---|---|---|---|
| Original Image | 10633.9 | 6.736 | 444.45 | 0.0189 | 0.1491 | – | – | – |
| Combination A | 1957.5 | 6.700 | 113.41 | 0.0346 | 0.3215 | 15.55 | 0.4634 | 0.1238 |
| Combination B | 17855.3 | 8.418 | 787.77 | 0.0170 | 0.1230 | 17.86 | 0.9101 | 0.0768 |
| Combination C | 10243.0 | 6.536 | 429.07 | 0.0201 | 0.1713 | 20.02 | 0.8215 | 0.0851 |
| Proposed Method | 21247.0 | 5.879 | 853.06 | 0.0359 | 0.1742 | 22.48 | 0.9233 | 0.0529 |
Key improvements of our method include:
- Compared to the original image: Sharpness increased by ~99.8%, contrast by ~92.1%, noise reduced by ~12.7%, and angular second moment increased by ~89.9%.
- Compared to the next-best method (Combination B): PSNR improved by ~25.8%, SSIM by ~1.5%, and SAM reduced by ~31.1%, indicating superior detail recovery, structural fidelity, and spectral preservation.
The high-quality processed imagery was then used in Agisoft Metashape software to generate superior Digital Orthophoto Maps (DOMs) and Digital Elevation Models (DEMs), providing an excellent data foundation for deformation analysis.
2.2 Landslide Deformation Monitoring via M3C2
To assess the impact of preprocessing on final monitoring accuracy, dense point clouds were generated from the original and preprocessed image sets for both dates. After co-registration using the Iterative Closest Point (ICP) algorithm and filtering of non-ground points (vegetation, buildings) based on a Cloud-to-Mesh distance criterion, the M3C2 algorithm was applied to quantify surface change. The M3C2 parameters were: projection distance = 0.584 m, normal scale = 1.165 m.
The M3C2 results for the main landslide body revealed a clear deformation pattern: soil depletion in the source area (volume decrease ~0.0067 km³, elevation drop 1.01–2.67 m) and accumulation in the toe area (volume increase ~0.0116 km³, elevation rise up to 1.62 m), indicating active sliding. The visual and quantitative differences between using original and preprocessed data were striking:
- Original Data Result: Exhibited a “salt-and-pepper” effect with numerous scattered red/yellow false-positive points (mismatches), severely undermining reliability.
- Proposed Method Result: Showed a much cleaner deformation map. While more black areas appeared (indicating no significant change or successfully filtered noise), the true extreme deformation signals were not masked. Statistical analysis showed a 51.2% reduction in false deformation points compared to the original data result.
To rigorously evaluate precision, a topographically stable area was selected for M3C2 comparison. The distribution of M3C2 distances in this area should ideally be a normal distribution centered near zero. Gaussian fitting and Kolmogorov-Smirnov (K-S) normality tests were performed on 50,000 randomly sampled points:
| Data Source | Mean Deformation (m) | Standard Deviation (m) | K-S Statistic | p-value (α=0.05) |
|---|---|---|---|---|
| Original Imagery | 0.0157 | 0.4992 | 0.156 | 0.028 (Not Normal) |
| Combination B | 0.0112 | 0.4952 | 0.123 | 0.087 (Near Normal) |
| Proposed Method | 0.0062 | 0.4907 | 0.087 | 0.142 (Normal) |
The results are conclusive. The deformation mean from our processed data (0.0062 m) is 60.9% closer to zero than the original result, and the error distribution passes the normality test (p > 0.05). This indicates that residual errors are random, not systematic, confirming the high reliability of the monitoring results derived from preprocessed China UAV drone imagery. Furthermore, false matches in the stable area were reduced by 42.1%.
3. Conclusion and Future Work
This paper presents a comprehensive multimodal collaborative preprocessing framework specifically designed to address the compound degradation challenges in UAV-based landslide monitoring. The sequential denoising-dehazing-enhancement-fusion pipeline, incorporating algorithm improvements like edge-aware NLM, fog-adaptive DCP, and entropy-driven CLAHE-GC, demonstrably enhances image quality beyond conventional methods.
The case study on the Shuigoutou landslide proves the method’s practical value. Quantitative metrics show significant improvements in sharpness, contrast, noise reduction, and spectral fidelity. Most importantly, the downstream M3C2 analysis reveals that preprocessing reduces the mean deformation error in stable areas by 60.9% and cuts false matches by over 40%, directly translating to higher monitoring accuracy and reliability. This workflow enables China UAV drone technology to achieve its full potential in delivering precise, actionable data for landslide hazard assessment.
While highly effective in humid, vegetated, complex-terrain environments like Southwest China, the method’s adaptability to other settings (arid northwest, rainy southeast) warrants further investigation. Future work will focus on: 1) integrating deep learning to improve the adaptive mechanisms within the pipeline; 2) exploring joint preprocessing with multi-source data (e.g., LiDAR); 3) conducting regionalized algorithm optimization for specific landslide geomorphologies; and 4) rigorous validation against extensive ground-based measurement data to solidify the framework’s engineering application credentials.
