In modern infrastructure management, the early detection and quantification of surface cracks on bridge structures are critical for ensuring long‑term safety and serviceability. Traditional manual inspection methods rely on expensive bridge inspection vehicles or scaffolding, which are time‑consuming, labor‑intensive, and pose significant safety risks for workers operating at height. In recent years, the integration of drone technology with advanced computer vision has emerged as a promising alternative. This study presents a comprehensive framework that combines unmanned aerial vehicles (UAVs) with deep learning for automated crack identification and monitoring. By leveraging high‑resolution aerial imagery and a refined convolutional neural network, the proposed system achieves superior detection accuracy, measurement precision, and operational efficiency compared to conventional approaches.
The paper is organized as follows: Section 1 reviews the background and importance of bridge crack inspection, highlighting the potential of drone technology. Section 2 describes the proposed methodology, including UAV data acquisition, image preprocessing, and the segmentation network. Section 3 details the system architecture and algorithm optimization. Section 4 presents experimental results from a real‑world bridge test. Section 5 concludes the work and discusses future directions.
1. Research Background
Bridge structures are continuously subjected to vehicle dynamic loads, temperature gradients, shrinkage, creep, and environmental erosion. Among various deterioration forms, surface cracks are the most common and indicative of structural damage. Transverse cracks often develop in negative‑moment zones due to tensile failure of concrete; longitudinal cracks are frequently associated with corrosion expansion of prestressing tendons or alkali‑silica reaction; and map‑cracking directly reflects carbonation or freeze‑thaw damage of the cover concrete. Once a crack penetrates the protective layer, it accelerates steel corrosion, reduces the effective cross‑section, and can eventually lead to a sudden drop in load‑bearing capacity. Therefore, timely and accurate acquisition of crack geometric parameters—location, width, length, and orientation—is of great engineering significance for condition assessment and maintenance planning.
Traditional inspection methods include manual visual observation, crack‑width gauge measurement, and bridge‑inspection‑vehicle‑assisted close‑up observation. Manual visual inspection is highly subjective and cannot provide quantitative records. Crack‑width gauges can accurately measure width but require point‑by‑point measurement, making them inefficient for large areas. Bridge inspection vehicles require lane closure and professional operators, and a single bridge inspection can take several days. These methods are particularly limited for tall piers, long‑span bridges, or heavily trafficked routes. Moreover, most inspection data are stored as paper records, which hinders spatio‑temporal evolution analysis and digital management.
Drone technology offers a transformative solution. Multi‑rotor UAVs can perform vertical take‑off, hover at fixed points, and cruise at low speed, enabling close‑range inspection of bridge components that are difficult to access, such as soffits, web plates, and cable stays. Equipped with high‑resolution cameras, thermal imagers, or LiDAR, drones can cover thousands of square meters per flight. Real‑time video transmission and centimeter‑level differential GPS allow precise georeferencing of imagery, making multi‑epoch deformation monitoring feasible.
2. Methodology: UAV‑based Crack Identification
2.1 UAV Image Acquisition
Flight mission planning must consider the bridge structural type, required ground sample distance (GSD), and airspace regulations. For typical girder bridges, we adopt a “longitudinal back‑and‑forth” flight pattern covering the soffit and web areas, with flight height controlled between 8 m and 15 m to balance resolution and coverage. For cable‑stayed or suspension bridges, an additional spiral ascent around the tower and cables is programmed. Camera parameters are set to minimize motion blur: shutter speed ≥ 1/800 s, ISO adjusted adaptively according to ambient illumination. The overlap ratio is set to ≥ 70% along the flight direction and ≥ 60% laterally, satisfying the requirements for image stitching and stereo matching. The onboard real‑time kinematic (RTK) module simultaneously records the camera position and attitude at each exposure moment, providing high‑accuracy exterior orientation elements for image geocoding.

The figure above illustrates a typical multi‑rotor UAV equipped with an industrial camera and RTK antenna used in our field tests.
2.2 Image Preprocessing and Enhancement
Raw aerial images must be preprocessed to eliminate the effects of uneven illumination and lens distortion on crack recognition. We apply contrast‑limited adaptive histogram equalization (CLAHE) to enhance local contrast and highlight the difference between cracks and the background. Radial distortion correction is performed using a calibrated camera model to remove edge deformation introduced by wide‑angle lenses. The preprocessed images are then fed into a semantic segmentation network for pixel‑level crack extraction.
2.3 Deep Convolutional Neural Network for Crack Segmentation
The core of our identification system is an encoder‑decoder convolutional neural network (CNN). The encoder path consists of stacked residual modules that progressively extract multi‑scale feature maps. The decoder path uses transposed convolutions for upsampling, and skip connections fuse decoder features with corresponding encoder features to recover fine details around crack edges. The loss function is a weighted combination of binary cross‑entropy and Dice coefficient:
$$
L = -\frac{1}{N}\sum_{i=1}^{N}\left[ y_i \log \hat{y}_i + (1-y_i) \log(1-\hat{y}_i) \right] + \left(1 – \frac{2\sum_{i=1}^{N} y_i \hat{y}_i}{\sum_{i=1}^{N} y_i + \sum_{i=1}^{N} \hat{y}_i}\right)
$$
where \(N\) is the total number of pixels, \(y_i\) is the true label (1 for crack, 0 for background), and \(\hat{y}_i\) is the predicted probability in [0,1]. This composite loss balances pixel‑wise accuracy and region‑level overlap, making it robust to the severe class imbalance typical in crack segmentation tasks. After segmentation, morphological closing is applied to connect nearby crack fragments, and a skeletonization algorithm extracts the crack centerline.
2.4 Crack Width Measurement
Width measurement is performed by analyzing the gray‑level profile perpendicular to the skeleton. To reduce the influence of skeleton localization errors, we adopt a weighted gray‑centroid method. The average crack width is computed as:
$$
W = \frac{1}{M} \sum_{j=1}^{M} d_j \cdot \sqrt{1 + \left( \frac{dy}{dx} \right)_j^2}
$$
where \(W\) is the average width in millimeters, \(M\) is the total number of profiles sampled along the skeleton, \(d_j\) is the pixel span of the crack measured along the profile perpendicular to the skeleton at point \(j\) (in pixels), and \(\left( \frac{dy}{dx} \right)_j\) is the tangent slope of the skeleton at that point. The slope correction term compensates for the projection shortening effect when cracks are not horizontal. Multiplying \(d_j\) by the GSD converts the pixel span into physical width. This algorithm significantly improves measurement accuracy, especially for oblique cracks.
3. System Design and Implementation
3.1 Hierarchical Architecture
We design the bridge crack intelligent monitoring system with a three‑layer architecture: data acquisition layer, data processing layer, and application service layer.
| Layer | Main Components | Functions |
|---|---|---|
| Data acquisition | Hexacopter UAV, 42 MP industrial camera, RTK module | Collect high‑resolution imagery with precise geotags; transmit video via 5.8 GHz digital link. |
| Data processing | Ground workstation with GPU | Perform image distortion correction, CLAHE enhancement, CNN semantic segmentation, skeleton extraction, width measurement, multi‑epoch registration for change detection. |
| Application service | Web portal, mobile alert | Present crack spatial distribution and evolution trends; classify severity based on standards; push warnings to maintenance personnel. |
3.2 Sensor Selection and Configuration
The onboard imaging sensor is a full‑frame CMOS industrial camera with effective resolution of 42.4 megapixels and sensor size 35.9 mm × 24.0 mm. When paired with a 35 mm fixed‑focus lens, the field of view is approximately 54°. At a flight altitude of 10 m, the GSD is about 0.85 mm/pixel, enabling detection of cracks as narrow as 0.2 mm. The camera is mounted on a three‑axis brushless gimbal with stabilization accuracy better than ±0.01° on pitch, roll, and yaw axes, effectively canceling flight vibrations. The RTK module uses a dual‑frequency multi‑constellation receiver (GPS+BeiDou+Galileo), achieving planar accuracy of 1 cm + 1 ppm and vertical accuracy of 1.5 cm + 1 ppm.
3.3 Data Acquisition and Processing Workflow
The field campaign follows a “global first, local second” principle. A high‑altitude survey (e.g., 20 m) captures the entire bridge panorama. After identifying potential defect clusters via preliminary processing, the UAV descends to a lower altitude (8–10 m) for refined re‑inspection. Raw images are stored in RAW format to preserve full dynamic range for later exposure correction. The processing pipeline includes:
- Image quality screening and geocoding
- Radial distortion correction using calibration parameters
- Contrast enhancement (CLAHE)
- CNN semantic segmentation
- Morphological post‑processing and skeletonization
- Width measurement using equation (2)
- Vector output in GeoJSON format: fields include crack ID, center coordinates, length, average width, maximum width, orientation, and timestamp.
With GPU acceleration, a single 42‑megapixel image requires approximately 0.8 s for the entire segmentation step. The vectorized results can be directly imported into a GIS platform for overlay analysis.
3.4 Optimization for Real‑world Conditions
To handle varying lighting conditions, we introduce data augmentation during training: random brightness, contrast, and hue jittering. During inference, a sliding‑window strategy with edge overlapping fusion is employed for large‑format images to avoid segmentation fractures at tile boundaries. Furthermore, the weighted gray‑centroid method for width measurement significantly reduces errors caused by skeleton localization inaccuracy.
4. Experiments and Results
4.1 Field Experiment Design
A field test was conducted on a 28‑year‑old prestressed concrete continuous girder bridge with five spans, total length 186 m, and deck width 12.5 m. The experiment was carried out under clear skies and light wind (wind speed ≈ 3 m/s) with stable illumination. A hexacopter UAV equipped with a 42 MP camera flew at a constant altitude of 8 m along the bridge longitudinal direction at a speed of 2 m/s. The forward overlap was 75% and side overlap 65%. A total of 412 valid images were collected covering the soffit, webs, and deck pavement. Six checkerboard targets were placed on the ground as control points; their 3D coordinates were measured with a network RTK system (planar accuracy ±1.5 cm, vertical accuracy ±2.0 cm).
4.2 Crack Identification Results
The trained CNN model achieved a mean intersection‑over‑union (mIoU) of 0.82 on our validation dataset after 2,000 training epochs. After morphological post‑processing and skeletonization, the geometric parameters of each crack were computed. The test identified 47 cracks in total. Table 2 lists the quantified parameters for typical cracks.
| Crack ID | Component Location | Orientation Type | Length (mm) | Max Width (mm) | Avg Width (mm) |
|---|---|---|---|---|---|
| C‑01 | Span 2 soffit | Transverse | 1258 | 0.42 | 0.31 |
| C‑02 | Span 2 soffit | Transverse | 986 | 0.38 | 0.27 |
| C‑03 | Span 3 web | Diagonal | 1542 | 0.56 | 0.41 |
| C‑04 | Span 3 soffit | Transverse | 2134 | 0.68 | 0.49 |
| C‑05 | Span 4 web | Longitudinal | 876 | 0.35 | 0.24 |
| C‑06 | Span 4 soffit | Transverse | 1687 | 0.51 | 0.38 |
| C‑07 | Span 5 web | Diagonal | 1123 | 0.44 | 0.33 |
| C‑08 | Deck pavement | Map | 3256 | 0.72 | 0.53 |
The data show that transverse cracks dominate, especially in the soffit negative‑moment zones. The maximum width (0.72 mm) appears in the deck pavement map‑cracking region. Spans 3 and 4 exhibit more concentrated damage.
4.3 Performance Comparison with Manual Inspection
To evaluate the proposed system, we compared its results with those obtained by a team of three certified inspectors using a crack‑width gauge and a bridge inspection vehicle. Both methods examined the same set of 47 cracks. Table 3 summarizes the performance metrics.
| Metric | UAV Smart Inspection | Manual Inspection | Improvement (%) |
|---|---|---|---|
| Crack detection rate (%) | 94.6 | 87.2 | +8.5 |
| Width measurement error (mm) | ±0.05 | ±0.08 | −37.5 |
| Time per span (min) | 12 | 95 | −87.4 |
| Total bridge inspection time (h) | 3.5 | 28 | −87.5 |
| Data digitization rate (%) | 100 | 35 | +185.7 |
| Missed detection rate (%) | 5.4 | 12.8 | −57.8 |
| High‑altitude worker hours (h) | 0 | 22 | −100 |
The UAV‑based system achieved an 8.5% higher detection rate and reduced the missed detection rate from 12.8% to 5.4%. Width measurement error was ±0.05 mm, outperforming the manual ±0.08 mm. Inspection time per span dropped from 95 min to 12 min, and the total bridge inspection time was compressed from 28 h to 3.5 h—an eight‑fold efficiency gain. All data were stored digitally (100% digitization), eliminating the traceability issues of paper records. Most importantly, the system completely eliminated the safety risk of working at height, as indicated by zero high‑altitude worker hours.
4.4 Ablation Study on Network Components
To further demonstrate the effectiveness of our optimized segmentation network, we conducted an ablation experiment using the same field dataset. Table 4 shows the results.
| Configuration | mIoU | Precision | Recall | F1 Score |
|---|---|---|---|---|
| Baseline (simple CNN) | 0.68 | 0.71 | 0.65 | 0.68 |
| + Skip connections | 0.75 | 0.78 | 0.73 | 0.75 |
| + Residual modules | 0.79 | 0.81 | 0.77 | 0.79 |
| + CLAHE preprocessing | 0.81 | 0.83 | 0.79 | 0.81 |
| + Data augmentation | 0.83 | 0.85 | 0.81 | 0.83 |
| Full model (all above) | 0.84 | 0.86 | 0.82 | 0.84 |
The full model achieved an mIoU of 0.84, a significant improvement over the baseline 0.68. Each component contributed positively, with skip connections and residual modules being crucial for recovering fine details, and CLAHE + data augmentation further boosting robustness under variable lighting.
5. Conclusion
In this work, we have developed an integrated system that harnesses drone technology and deep learning for efficient bridge crack identification and monitoring. The proposed framework covers the entire workflow from aerial image acquisition, preprocessing, semantic segmentation, to geometric quantification and multi‑epoch change detection. Real‑world validation on a 28‑year‑old concrete girder bridge demonstrates that the system outperforms traditional manual inspection in detection rate, measurement accuracy, operational efficiency, and data digitization. The use of drone technology not only reduces inspection time by approximately eight times but also completely eliminates the safety hazards associated with working at height, making it a highly promising tool for modern infrastructure asset management.
Nevertheless, some limitations remain. Because the system relies on visible‑light imaging, it can only detect surface‑exposed cracks; internal or covered defects (e.g., delamination, hidden voids) are not captured. Moreover, the training dataset was mainly collected from concrete girder bridges, so the generalization to steel bridges or masonry arch bridges requires further validation. Future research should explore the fusion of infrared thermography for subsurface defect detection, as well as the expansion of training datasets covering multiple bridge types to enhance model robustness. Additionally, integrating drone technology with real‑time edge computing could further streamline on‑site processing and enable instant decision‑making for emergency inspections.
In summary, the synergy between drone technology and artificial intelligence offers a paradigm shift in bridge condition assessment. The results reported in this paper confirm that such a combination can achieve high‑accuracy, high‑efficiency, and safe bridge crack monitoring, paving the way for intelligent maintenance of civil infrastructure worldwide.
