In recent years, the use of lighting UAVs and lighting drones for aerial photography has expanded into various fields such as military surveillance, forest fire monitoring, power line inspection, and disaster response. However, images captured under low-light conditions often suffer from insufficient brightness, blurred details, and noise interference, which degrade visual quality and hinder subsequent computer vision tasks like object detection and recognition. Enhancing low-light images is thus a critical preprocessing step. Traditional methods, including histogram-based and Retinex-based approaches, rely on handcrafted constraints and struggle with complex scenes. Deep learning techniques have shown promise, but many depend on paired low/normal-light data, which is challenging to acquire in real-world scenarios. To address these issues, we propose a novel two-stage enhancement algorithm that leverages low-light instances for training without requiring paired data.
Our approach is grounded in Retinex theory, which posits that an image can be decomposed into illumination and reflection components. For two low-light images with identical content, the reflection components should be consistent. We exploit this by using pairs of low-light images to learn adaptive priors. The first stage involves a decomposition module that estimates illumination and reflection components, followed by a Detail Enhancement Module (DEM) to sharpen details and edges. The second stage includes an Illumination Enhancement Module (IEM) guided by an illumination attention map to prevent overexposure in high-contrast regions, and a Denoise Module (DM) to suppress noise in the reflection component. Experimental results demonstrate that our method outperforms state-of-the-art algorithms in both visual quality and objective metrics on multiple datasets, and it effectively aids target detection in low-light conditions for lighting UAV applications.

1. Introduction
Lighting UAVs and lighting drones are increasingly deployed for aerial photography due to their unique perspectives and rich information capture. However, images taken in poor lighting conditions exhibit low brightness, noise, and loss of detail, which impair visual perception and downstream tasks. Low-light image enhancement aims to improve brightness, contrast, and detail visibility while reducing noise. Existing methods can be categorized into traditional and deep learning-based approaches. Traditional methods, such as histogram equalization and Retinex-based algorithms, often lack generalization and require manual tuning. Deep learning methods, including supervised, unsupervised, and zero-shot learning strategies, have achieved significant progress. Supervised methods like Retinex-Net and KinD rely on paired data, which is scarce. Unsupervised methods like EnlightenGAN and Zero-DCE avoid paired data but may suffer from training instability or limited information from single images.
We introduce a two-stage algorithm that uses pairs of low-light images for training, eliminating the need for normal-light references. The first stage decomposes images and enhances details through a novel DEM, which captures multi-scale features and fuses global and local information. The second stage adjusts illumination using an IEM with an attention mechanism to avoid overexposure and incorporates a DM for noise reduction. This approach is particularly suited for lighting UAV scenarios, where images often contain point light sources and wide dynamic ranges. Our contributions include: (1) a DEM for improved detail perception, (2) a DM for noise suppression, and (3) an IEM with illumination attention for balanced brightness enhancement.
2. Related Work
Low-light image enhancement has been extensively studied. Traditional methods include histogram-based techniques like Adaptive Histogram Equalization (AHE) and Contrast Limited Adaptive Histogram Equalization (CLAHE), which adjust pixel distributions but may amplify noise. Retinex-based methods, such as Multi-Scale Retinex (MSR) and LIME, decompose images into illumination and reflection components, but they often rely on handcrafted priors and struggle with complex scenes.
Deep learning approaches have revolutionized this field. Supervised methods, like Retinex-Net and URetinex-Net, use paired data for training but face data acquisition challenges. Unsupervised methods, such as EnlightenGAN and Zero-DCE, employ generative adversarial networks or curve estimation without paired data, but they may exhibit instability or artifacts. Zero-shot methods, like SCI, optimize on single images but depend on handcrafted priors. Recently, PairLIE proposed using pairs of low-light images to learn consistent reflection components, but it does not explicitly handle details or overexposure in lighting UAV contexts. Our work builds on this idea by incorporating dedicated modules for detail enhancement, denoising, and illumination adjustment, tailored for lighting drone imagery.
3. Proposed Method
Our two-stage algorithm addresses the limitations of existing methods for lighting UAV applications. The overall architecture is illustrated in Figure 1, which shows the decomposition, detail enhancement, denoising, and illumination enhancement steps.
3.1 Decomposition Module
Based on Retinex theory, we decompose an input low-light image \( I \) into illumination component \( L \) and reflection component \( R \), such that \( I = R \cdot L \). For two low-light images \( I_1 \) and \( I_2 \) with identical content, the reflection components should be consistent: \( R_1 = R_2 \). We design an Illumination Estimation Network (IENet) and a Reflection Estimation Network (RENet), each consisting of four convolutional layers with ReLU activation and a final convolutional layer with Sigmoid. The loss function for decomposition is defined as:
$$ L_{dc} = \lambda_{re} \sum_{c \in \{R,G,B\}} \| R \cdot L – I \|_2^2 + \lambda_{ii} \| L – \max_{c} I \|_2^2 + \lambda_{tv} \| \nabla L \|_1 + \lambda_{rc} \| R_1 – R_2 \|_2^2 $$
where \( \lambda_{re} = 1 \), \( \lambda_{ii} = 0.1 \), \( \lambda_{tv} = 1 \), and \( \lambda_{rc} = 1 \). The terms represent reconstruction loss, illumination initialization, total variation loss for smoothing, and reflection consistency loss, respectively.
3.2 Detail Enhancement Module (DEM)
To enhance details in the reflection component, we propose a DEM comprising a Multi-scale Feature Extract Module (MFEM) and a Feature Fusion Module (FFM). The MFEM downsamples the reflection component to scales of 32×32, 64×64, and 128×128, capturing multi-scale features through max-pooling and convolution. The FFM integrates global and local features by using global average pooling and fully connected layers to weight important channels, followed by convolutional layers for further refinement. The loss function for DEM is:
$$ L_{detail} = \lambda_{re} \| \hat{R} \cdot L – I \|_2^2 + \lambda_{c} \| \hat{R}_1 – \hat{R}_2 \|_2^2 + \lambda_{ss} (1 – \text{SSIM}(\hat{R}, R)) $$
where \( \hat{R} \) is the enhanced reflection component, \( \lambda_{re} = 1 \), \( \lambda_{c} = 1 \), and \( \lambda_{ss} = 0.1 \). This includes reconstruction, consistency, and structural similarity losses.
3.3 Denoise Module (DM)
Low-light images often contain Gaussian noise, which can be amplified during enhancement. Our DM uses a symmetric skip-connected convolutional network with nine layers to suppress noise. The loss function is:
$$ L_{de} = \lambda_{c} \| \hat{R}’ – R \|_2^2 + \lambda_{ss} (1 – \text{SSIM}(\hat{R}’, R)) $$
where \( \hat{R}’ \) is the denoised reflection component, \( \lambda_{c} = 5 \), and \( \lambda_{ss} = 1 \). This ensures consistency and structural preservation.
3.4 Illumination Enhancement Module (IEM)
To enhance illumination without overexposure, we employ a multi-exposure fusion strategy. The low-light illumination component \( L_{\text{low}} \) is adjusted using exposure ratios \( x_i \) to generate multiple exposures: \( L_{\text{low}} = L \cdot x_i \). An illumination attention map, defined as \( 1 – L \) (normalized), guides the enhancement to focus on dark regions. The IEM consists of convolutional layers with LeakyReLU and ReLU activations. The loss function is:
$$ L_{en} = \| \hat{L} – L_{\text{high}} \|_2^2 $$
where \( \hat{L} \) is the enhanced illumination and \( L_{\text{high}} \) is the fused normal-light illumination.
The total loss for the first stage is \( L_{\text{stageI}} = L_{dc} + L_{detail} \), and for the second stage, \( L_{\text{stageII}} = L_{de} + L_{en} \).
4. Experiments and Analysis
We evaluate our method on both self-collected and public datasets, comparing it with 16 state-of-the-art algorithms. Our training uses 324 pairs of low-light images from LOL and SICE datasets, and testing includes 300 real-world lighting UAV images and public datasets like LIME, MEF, NPE, and DICM.
4.1 Experimental Setup
We use Adam optimizer with a learning rate of \( 1 \times 10^{-4} \). The decomposition and DEM are trained for 400 epochs, while DM and IEM are trained for 200 epochs. Hardware includes an Intel Core i9-13900HX processor and NVIDIA RTX 4060 GPU. Evaluation metrics include LOE, BRISQUE, NIQE, PSNR, SSIM, and processing time.
4.2 Objective Evaluation
Table 1 shows results on our self-collected lighting UAV dataset. Our method achieves the best NIQE, second-best LOE, and competitive BRISQUE. Processing time is 0.013 seconds per image, suitable for real-time applications. Parameter count is 0.773M, balancing quality and efficiency.
| Method | LOE | BRISQUE | NIQE | Time (s) | Parameters (M) |
|---|---|---|---|---|---|
| RetinexNet | 702.80 | 31.47 | 3.663 | 0.120 | 0.555 |
| MBLLEN | 361.15 | 32.92 | 3.849 | 13.991 | 0.450 |
| KinD | 723.10 | 25.22 | 3.904 | 0.148 | 8.160 |
| DRBN | 462.32 | 28.69 | 3.450 | 0.878 | 0.577 |
| TBEFN | 406.56 | 19.00 | 3.419 | 0.050 | 0.486 |
| Zero-DCE | 263.77 | 29.66 | 3.550 | 0.003 | 0.079 |
| DSLR | 200.29 | 18.41 | 3.438 | 0.074 | 14.931 |
| KinD++ | 609.83 | 23.05 | 3.813 | 1.068 | 8.275 |
| RUAS | 614.95 | 51.41 | 6.071 | 0.006 | 0.003 |
| Zero-DCE++ | 287.97 | 24.91 | 3.423 | 0.001 | 0.011 |
| EnlightenGAN | 408.01 | 17.64 | 3.297 | 0.008 | 8.637 |
| R2RNet | 227.57 | 28.55 | 3.896 | 0.667 | 35.74 |
| URetinex-Net | 153.29 | 19.83 | 3.434 | 0.037 | 0.917 |
| SCI | 222.67 | 20.14 | 3.568 | 0.002 | 0.011 |
| PairLIE | 322.69 | 20.59 | 3.892 | 0.003 | 0.342 |
| Self-DACE | 288.52 | 30.19 | 3.990 | 0.007 | 0.699 |
| Ours | 169.87 | 18.86 | 3.276 | 0.013 | 0.773 |
Table 2 presents NIQE results on public datasets. Our method achieves the best scores on LIME and NPE, and second-best on DICM, demonstrating strong generalization.
| Method | LIME | MEF | NPE | DICM |
|---|---|---|---|---|
| RetinexNet | 4.887 | 4.923 | 4.447 | 4.695 |
| MBLLEN | 4.458 | 4.949 | 4.385 | 4.868 |
| KinD | 4.323 | 3.716 | 4.379 | 3.824 |
| DRBN | 4.143 | 3.775 | 4.096 | 3.787 |
| TBEFN | 3.954 | 3.560 | 4.028 | 3.670 |
| Zero-DCE | 3.769 | 3.754 | 3.976 | 3.446 |
| DSLR | 4.231 | 4.025 | 4.313 | 4.519 |
| KinD++ | 4.197 | 3.567 | 4.223 | 3.687 |
| RUAS | 4.358 | 4.746 | 4.483 | 4.471 |
| Zero-DCE++ | 3.967 | 3.629 | 4.022 | 3.337 |
| EnlightenGAN | 3.719 | 3.317 | 4.113 | 3.567 |
| R2RNet | 4.272 | 4.460 | 4.213 | 4.358 |
| URetinex-Net | 4.241 | 3.987 | 4.041 | 3.707 |
| SCI | 4.180 | 3.723 | 4.370 | 4.005 |
| PairLIE | 4.294 | 3.975 | 4.107 | 3.796 |
| Self-DACE | 4.143 | 4.562 | 4.367 | 3.949 |
| Ours | 3.617 | 3.491 | 3.964 | 3.512 |
4.3 Subjective Evaluation
Visual comparisons on lighting UAV datasets show that our method enhances brightness and details while reducing noise and overexposure. For instance, in images with point light sources, our IEM prevents overexposure, and the DEM preserves edges in structures like buildings and trees. On public datasets, our results exhibit natural colors and clear details, outperforming methods like URetinex-Net and SCI, which suffer from overexposure or excessive brightness.
4.4 Ablation Studies
We conduct ablation experiments to validate each module’s contribution. Table 3 shows PSNR and SSIM results on the LOL dataset. Adding DEM and DM improves image quality, with DM effectively suppressing noise introduced by DEM.
| DEM | DM | PSNR | SSIM |
|---|---|---|---|
| × | × | 19.53 | 0.729 |
| × | √ | 20.23 | 0.779 |
| √ | × | 19.16 | 0.735 |
| √ | √ | 19.92 | 0.761 |
Visual ablations confirm that DEM enhances details, IEM avoids overexposure, and DM reduces noise, as shown in sample images.
4.5 Application to Target Detection
We test enhanced images on target detection using YOLOv7 and YOLOv8 on the ExDark dataset. Enhanced images improve detection precision and recall for objects like bicycles, people, and boats, reducing false negatives and misdetections. Tables 4 and 5 summarize the results, demonstrating that our enhancement aids downstream tasks for lighting UAV imagery.
| Method | Bicycle | People | Boat | Car |
|---|---|---|---|---|
| YOLOv7 (Original) | 0.630 | 0.603 | 0.571 | 0.426 |
| YOLOv7 (Enhanced) | 0.715 | 0.684 | 0.685 | 0.509 |
| YOLOv8 (Original) | 0.794 | 0.793 | 0.724 | 0.807 |
| YOLOv8 (Enhanced) | 0.810 | 0.819 | 0.764 | 0.812 |
| Method | Bicycle | People | Boat | Car |
|---|---|---|---|---|
| YOLOv7 (Original) | 0.605 | 0.616 | 0.501 | 0.661 |
| YOLOv7 (Enhanced) | 0.711 | 0.671 | 0.652 | 0.746 |
| YOLOv8 (Original) | 0.684 | 0.571 | 0.579 | 0.682 |
| YOLOv8 (Enhanced) | 0.708 | 0.618 | 0.690 | 0.718 |
5. Conclusion
We present a two-stage low-light image enhancement algorithm tailored for lighting UAV and lighting drone applications. By leveraging pairs of low-light images, we avoid the need for paired data and learn adaptive priors. The DEM enhances details through multi-scale feature extraction, the DM suppresses noise, and the IEM with illumination attention prevents overexposure. Experiments on multiple datasets show superior performance in visual quality and objective metrics. Enhanced images improve target detection accuracy, demonstrating practical utility. Future work will focus on optimizing computational efficiency for real-time lighting UAV deployments and exploring adaptive modules for dynamic environments.
