In China, the transformation of water conservancy management from traditional approaches to digital and intelligent systems has accelerated in recent years. The implementation of the River and Lake Chief System demands comprehensive, real‑time monitoring of vast water networks. China UAV drones have emerged as a critical mobile sensing platform, offering remarkable flexibility and wide coverage. However, practical applications in complex aquatic environments face persistent challenges: limited field of view from single sensors, severe geometric distortion caused by wide‑angle lenses, unstable image registration over texture‑sparse water surfaces, and temporal jitter induced by UAV attitude fluctuations during flight. To address these issues, we have researched and implemented a panoramic video fusion technology specifically tailored for river and lake supervision using China UAV drones. Our approach integrates self‑calibration distortion correction based on natural geometric features, robust feature matching with multiple geometric constraints, and dynamic registration combined with layered blending to produce seamless, spatiotemporally coherent panoramic videos. Extensive field experiments in the typical water‑network region of Jinxi Town demonstrate significant improvements in inspection efficiency and provide solid digital support for tasks such as algae bloom identification and intelligent water management. This paper presents the full methodology, system design, and validation results, highlighting the pivotal role of China UAV drones in advancing smart water conservancy.
1. Introduction
The water conservancy sector in China is undergoing a paradigm shift toward digitization and intelligent supervision. Building a smart water system with “four pre‑functions” (forecast, early‑warning, rehearsal, and contingency plan) has become a strategic goal for high‑quality development. Under the deepening of the River and Lake Chief System, authorities require all‑dimensional, all‑weather fine‑grained monitoring of river and lake shorelines, water surfaces, and surrounding infrastructure. Traditional point‑to‑point manual inspections and fixed cameras cannot meet the demands of large‑scale, dynamic water‑network monitoring. An efficient non‑contact sensing system is urgently needed.
China UAV drones, as mobile sensing units within the “sky‑air‑ground‑water‑engineering” integrated monitoring architecture, provide superior maneuverability and wide field of view. They are widely deployed for river patrols, water quality assessment, and disaster response. Nevertheless, when used for high‑altitude surveillance, several technical bottlenecks remain: (1) wide‑angle lenses cause severe radial and tangential distortion, leading to geometric misalignment in stitched panoramas; (2) water surfaces are texture‑poor and prone to specular reflections, making traditional feature‑based registration unreliable; (3) UAV attitude changes due to wind gusts introduce frame‑to‑frame jitter, resulting in unpleasant flickering and temporal inconsistency in the output video.
Current general‑purpose video stitching technologies, well developed for street‑view maps and surveillance systems, cannot be directly transferred to river‑lake scenarios. The lack of artificial calibration targets in field environments makes conventional camera calibration impractical. Moreover, the dynamic nature of water bodies and the presence of structures like bridges introduce parallax issues that frustrate classical algorithms (e.g., SIFT, ORB). Real‑time video streaming further demands low latency and smooth temporal evolution, which static image‑stitching logics fail to satisfy.
To overcome these challenges, we propose a dedicated panoramic video fusion technology for river and lake supervision using China UAV drones. Our contributions include:
- An edge‑constrained self‑calibration method that leverages natural geometric features such as riverbanks to correct wide‑angle distortion, eliminating the need for external calibration targets.
- An optimized ORB‑RANSAC registration framework enhanced with a multi‑geometry model switching mechanism, enabling robust pixel‑level alignment over weakly‑textured water surfaces.
- A dynamic registration and layered blending strategy that suppresses video jitter caused by UAV motion and minimizes photometric seams, ensuring temporally and spatially consistent panoramic output.
2. Methodology
2.1 Edge‑Constrained Self‑Calibration for Distortion Correction
Accurate camera calibration is the foundation for any image stitching pipeline. For China UAV drones equipped with wide‑angle lenses, the pinhole camera model is adopted, and the lens distortion is modeled using the classical radial‑tangential polynomial. Let (x, y) be the ideal normalized image coordinates and (xd, yd) the distorted coordinates. The distortion model is:
$$
\begin{aligned}
x_d &= x \left(1 + k_1 r^2 + k_2 r^4 + k_3 r^6\right) + 2p_1 x y + p_2 \left(r^2 + 2x^2\right), \\
y_d &= y \left(1 + k_1 r^2 + k_2 r^4 + k_3 r^6\right) + p_1 \left(r^2 + 2y^2\right) + 2p_2 x y,
\end{aligned}
\tag{1}
$$
where \(r^2 = x^2 + y^2\); \(k_1, k_2, k_3\) are radial distortion coefficients, and \(p_1, p_2\) are tangential distortion coefficients.
Instead of relying on a physical calibration target (often infeasible in riverine environments), we propose to use natural geometric features extracted from the scene—specifically the riverbank lines and straight edges of man‑made structures. These edges are detected in each video frame and fitted as straight‑line constraints. The optimal intrinsic parameters (focal length, principal point, distortion coefficients) are then estimated by minimizing the reprojection error:
$$
\min \sum_i \|\mathbf{q}_i – \hat{\mathbf{q}}_i\|^2,
\tag{2}
$$
where \(\mathbf{q}_i\) are the observed edge points and \(\hat{\mathbf{q}}_i\) are the projected points from the ideal straight line model after applying the estimated distortion. This self‑calibration procedure effectively removes the barrel or pincushion distortion typical of wide‑angle lenses on China UAV drones, ensuring that subsequent alignment operates on a geometrically consistent plane.
2.2 Multi‑Geometry Constrained ORB‑RANSAC Registration
Feature extraction and matching are core to image alignment. We employ the ORB (Oriented FAST and Rotated BRIEF) descriptor for its computational efficiency and rotational invariance. Keypoints are detected using a multi‑scale FAST corner detector with Harris response filtering, and binary descriptors are computed. For two overlapping views, candidate matches are established using Hamming distance. To reject outliers, we apply the RANSAC (Random Sample Consensus) algorithm to solve for the homography matrix \(\mathbf{H}\) that relates pixel coordinates between source and target frames:
$$
\begin{bmatrix}
u’ \\
v’ \\
1
\end{bmatrix} = \mathbf{H} \begin{bmatrix}
u \\
v \\
1
\end{bmatrix},
\tag{3}
$$
where \((u, v)\) and \((u’, v’)\) are corresponding points in the source and target images, respectively.
However, in water‑dominated scenes, the texture is sparse and repetitive, and many false matches survive standard RANSAC. To improve robustness, we introduce a multi‑geometry constraint mechanism. In addition to the homography (planar model), we also enforce the fundamental matrix \(\mathbf{F}\) and epipolar geometry constraints:
$$
\mathbf{q}_i^T \mathbf{F} \mathbf{p}_i = 0,
\tag{4}
$$
where \(\mathbf{p}_i = [u_i, v_i, 1]^T\) and \(\mathbf{q}_i = [u’_i, v’_i, 1]^T\) are homogeneous coordinates of matched points. Intuitively, Eq. (4) states that a point in the source image must lie on the corresponding epipolar line in the target image. When the scene contains significant 3‑D structures (e.g., bridges, trees) that violate the planar assumption, the algorithm automatically switches from a homography‑based alignment to a fundamental‑matrix‑based alignment, which can better accommodate local parallax distortions. This adaptive switching significantly reduces registration failures in complex river‑lake environments where China UAV drones operate.
2.3 Dynamic Registration and Layered Blending
After per‑frame geometric alignment, a panoramic video must maintain temporal coherence. UAV body shake during flight can cause abrupt changes in the estimated homography between consecutive frames. To address this, we implement a dynamic registration mechanism that propagates the transformation from the previous frame to the current one with a smoothness constraint. Let \(\mathbf{H}_t\) be the homography at time \(t\). The predicted homography for frame \(t+1\) is initialized as \(\mathbf{H}_{t+1}^0 = \mathbf{H}_t\), then refined using the feature matches of the new frame while penalizing large deviations. A Kalman filter‑like update ensures that the geometric parameters evolve smoothly over the timeline, suppressing visual jitter.
Photometric differences between cameras (e.g., exposure, white balance) and vignetting effects produce visible seams in the overlapping region. We apply a layered blending strategy. The simplest form is weighted averaging:
$$
I_{\text{fused}}(x, y) = \alpha(x, y) I_A(x, y) + \left[1 – \alpha(x, y)\right] I_B(x, y),
\tag{5}
$$
where \(\alpha(x, y)\) is a weight function that linearly transitions from 1 to 0 across the overlap zone. To further reduce artifacts, we employ multi‑band blending (Laplacian pyramid decomposition) that merges low‑frequency components smoothly while preserving high‑frequency details. The combination of dynamic registration and layered blending yields a seamless, flicker‑free panoramic video stream suitable for real‑time monitoring from China UAV drones.
3. System Architecture and Field Validation
3.1 System Overview
We developed an integrated system comprising three layers: front‑end acquisition, edge processing, and cloud application. The front‑end consists of multiple China UAV drones flying in formation, each equipped with a high‑resolution wide‑angle camera and a GNSS/RTK module for precise time synchronization and geo‑referencing. The drones follow pre‑planned flight paths along the river, ensuring at least 30% overlap between adjacent views. Compressed video streams are transmitted in real time to a ground‑based edge computing unit.
The edge processing unit executes the core algorithms: distortion correction, ORB feature extraction, RANSAC matching with multi‑geometry switching, homography estimation, dynamic registration, and layered blending. All modules are hardware‑accelerated using GPUs to achieve millisecond‑level latency. The resulting panoramic video is then pushed to a cloud platform where it is overlaid on a digital twin map of the water network. The user interface provides real‑time panoramic views, historical playback, and multi‑window collaborative displays, significantly enhancing situational awareness for river and lake managers.
3.2 Experimental Setup
Field experiments were conducted in the Jinxi Town area, a typical water‑network region in Jiangsu Province, China. The area features winding rivers, dense channels, bridges, and a mix of water and built structures, making it an ideal testbed for evaluating the proposed technology. Four China UAV drones (DJI M300 RTK with Zenmuse H20 cameras) were deployed. The flight altitude was 120 m, and the flight speed was 5 m/s. The camera field of view was 82.6°, producing significant barrel distortion. Video resolution was 1920×1080 at 30 fps. A total of 15 flight sorties were carried out, covering 12 km of river length.
3.3 Performance Evaluation
We compared our method against three baselines: (1) direct stitching without distortion correction, (2) standard SIFT+RANSAC, and (3) a commercial panorama tool. Key metrics include root‑mean‑square reprojection error (RMSE), structural similarity index (SSIM) in the overlapping region, temporal jitter (measured as standard deviation of frame‑to‑frame translation), and processing speed. Results are summarized in Table 1.
| Method | RMSE (pixels) | SSIM | Jitter Std (pixels) | Processing Speed (fps) |
|---|---|---|---|---|
| No correction + SIFT | 25.3 | 0.782 | 8.7 | 12 |
| SIFT + RANSAC (no dynamic) | 8.1 | 0.889 | 5.4 | 14 |
| Commercial tool | 6.5 | 0.914 | 3.2 | 18 |
| Proposed method | 3.2 | 0.965 | 1.1 | 25 |
The proposed method achieves the lowest reprojection error (3.2 pixels), highest structural similarity (0.965), and minimal temporal jitter (1.1 pixels), while maintaining a real‑time processing speed of 25 fps. The significant improvements are attributed to the synergistic effect of self‑calibration, multi‑geometry ORB‑RANSAC, and dynamic registration. In particular, the jitter suppression is critical for producing watchable panoramic video from China UAV drones under real flight conditions.

3.4 Qualitative Results
The panoramic video produced by our system exhibits seamless transitions between adjacent drone views. Even in areas where water occupies more than 70% of the frame, the riverbank lines align perfectly, and there is no ghosting or double edges at the stitching boundaries. The layered blending effectively eliminates color differences caused by varying exposure settings. When the UAV experiences sudden yaw or roll due to gusts, the dynamic registration mechanism smoothly adjusts the panorama, avoiding the jarring jumps present in baseline methods. Observers reported that the resulting video feels like a single, stable continuous shot from a virtual wide‑angle camera flying above the river. This immersive experience greatly facilitates visual inspection for algal blooms, illegal construction, and shoreline erosion.
4. Conclusion
We have presented a comprehensive panoramic video fusion technology specifically designed for river and lake supervision using China UAV drones. By addressing the three core challenges—wide‑angle distortion, weak‑texture water registration, and temporal jitter—our method delivers high‑quality, real‑time panoramic video that is both geometrically accurate and visually pleasing. Field experiments in a representative Chinese water network validate that the system significantly improves inspection efficiency and provides robust digital support for tasks such as cyanobacteria detection and intelligent water management.
The key innovations include: (1) an edge‑constrained self‑calibration that exploits natural geometric features of riverbanks, eliminating the need for artificial targets; (2) a multi‑geometry ORB‑RANSAC framework that adaptively selects homography or fundamental matrix constraints to handle planar and non‑planar scenes; (3) a dynamic registration mechanism with layered blending that ensures temporal smoothness and seamless photometric fusion. These components collectively enable China UAV drones to serve as reliable, high‑performance components in the emerging digital twin-based smart water conservancy system.
Future work will focus on extending the technology to low‑light and adverse weather conditions by integrating image enhancement algorithms, as well as exploring deep learning‑based feature matching to further improve robustness. We believe that the widespread adoption of this panoramic fusion technique will accelerate the digital transformation of water management in China, supporting the vision of safe, clean, and beautiful rivers and lakes.
