UAV-Based Intelligent Monitoring for Coastal Litter Management

We present an intelligent monitoring and mapping method for coastal litter based on unmanned aerial vehicle (UAV) imagery and deep learning. The approach integrates YOLOv5 object detection with SAM (Segment Anything Model) image segmentation to achieve automatic identification and coverage area calculation of litter targets. High-resolution images are captured by UAVs, processed through the detection-segmentation pipeline, and then imported into a Geographic Information System (GIS) for pixel-based statistics and area estimation. Experimental results demonstrate an overall accuracy of 0.95 and a Kappa coefficient of 0.88, with area estimation accuracy reaching 95.5% compared to manual measurements. This study provides an efficient and automated technical solution supporting coastal litter monitoring and drone regulation in environmental management.

Coastal zones are among the most dynamic and ecologically significant regions globally, hosting rich biodiversity and serving as hubs for diverse human economic activities. However, rapid industrialization and urbanization have intensified environmental pressures, with litter pollution emerging as a critical threat to coastal ecosystem health and socioeconomic stability. The sources of coastal litter are complex and varied, including marine floating debris, terrestrial discharges, and residues from tourism activities, primarily composed of plastics, metals, and glass. The accumulation of such litter not only degrades natural landscapes and impairs ecological functions but also poses direct risks to the livelihoods of coastal communities. Consequently, efficient monitoring and precise management of coastal litter have become urgent priorities in contemporary environmental governance.

Traditional monitoring methods, such as manual surveys and fixed-point sampling, have inherent limitations. They are labor-intensive, inefficient, and often fail to cover vast coastal areas due to complex geographical conditions. Moreover, manual assessments are subjective and sensitive to environmental fluctuations, compromising data objectivity and precision. For extensive and intricate coastal zones, conventional approaches are increasingly inadequate. The rapid development of UAV remote sensing and artificial intelligence offers new pathways. UAVs provide high maneuverability, broad coverage, and low operational costs, enabling the acquisition of high-resolution imagery over large coastal stretches. Meanwhile, deep learning has made breakthroughs in object detection and image segmentation, allowing automatic litter identification from image data. Deep learning algorithms extract high-level features from large datasets, enabling efficient detection and precise localization of litter even under complex backgrounds.

Numerous studies have explored UAV and deep learning techniques for coastal litter monitoring. Researchers have optimized flight parameters such as altitude, forward/side overlap to acquire near-shore high-resolution images, validating the feasibility of UAV surveys. The impact of ground sampling distance (GSD) on litter identification has been systematically investigated, providing guidelines for standardized data collection. In terms of deep learning models, Faster R-CNN has been applied to beach litter detection, but its two-stage architecture limits inference speed for large-area high-resolution imagery. To address efficiency, YOLOv5-based models with optimized feature extraction networks have improved detection speed. Image segmentation models like U-Net and Mask R-CNN have also been introduced for contour extraction, yet balancing accuracy and computational efficiency for small or sparsely distributed litter remains challenging.

Despite progress, existing methods often rely on single-task models for either detection or segmentation, making it difficult to achieve both high efficiency and fine-grained precision under complex backgrounds. Furthermore, a complete and stable workflow for quantifying litter coverage area has not been fully established, limiting the quantitative support of UAV monitoring for environmental management and policy decisions. To bridge these gaps, we propose a novel approach that combines YOLOv5l with SAM to realize automated detection, segmentation, and area calculation of coastal litter. This method leverages the rapid detection capability of YOLOv5l and the precise segmentation power of SAM, offering a synergistic solution that enhances both efficiency and accuracy. The following sections detail the methodology, experiments, results, and implications for drone regulation and coastal environmental protection.

1. Methodology

Our framework comprises three main stages: (1) object detection via YOLOv5l, (2) fine segmentation via SAM, and (3) post-processing for coverage area calculation. The overall pipeline is illustrated conceptually.

Stage 1: Object Detection with YOLOv5l. YOLOv5 is a single-stage object detection model that performs end-to-end training, fast inference, and flexible deployment. Its network architecture includes a backbone for multi-scale feature extraction, a neck for feature fusion, and a head for classification and bounding box regression. Among its variants, YOLOv5l (large) has deeper layers and stronger feature representation capabilities, enabling high-precision detection of dense or small-scale targets in complex backgrounds. Given the diversity of coastal litter in terms of shape, size, and distribution, YOLOv5l was chosen as the core detection model. The model is trained on a custom dataset of annotated UAV images, producing bounding boxes with confidence scores for litter objects.

Stage 2: Segmentation with SAM. SAM (Segment Anything Model) is a general-purpose segmentation model based on the Transformer architecture, pre-trained on massive datasets to achieve strong zero-shot segmentation ability. SAM can generate high-quality segmentation masks for novel objects and scenes using prompts such as points or boxes. By feeding the bounding boxes produced by YOLOv5l as prompts to SAM, we obtain precise pixel-level contours of each litter object. This combination allows us to handle complex backgrounds (e.g., sand, rocks, vegetation) while maintaining computational efficiency.

Stage 3: Coverage Area Calculation. The segmentation masks are binary images where pixel value 1 indicates litter and 0 denotes background. These masks are saved as GeoTIFF files and imported into GIS software (e.g., ArcMap 10.8). The number of litter pixels is counted, and the coverage area is computed using the formula:

$$A = N \times (GSD)^2$$

where $A$ is the total litter coverage area, $N$ is the total number of litter pixels, and $GSD$ is the ground sampling distance (m/pixel). The area estimation is then compared with manual measurements for validation.

2. Experiments

2.1 Data Acquisition

Data were collected at five coastal sites (beaches and seawalls) in June 2024 using a DJI Matrice 3TD UAV equipped with a high-resolution camera. The flight altitude was 30 m, yielding a GSD of 1.5 cm/px. Forward overlap was set to 60%, side overlap to 30%, ensuring seamless image stitching. The flight speed was 6 m/s, and images were captured every 5 seconds. A total of approximately 2,500 high-resolution images were acquired.

2.2 Dataset Preparation

To ensure training quality, images were cropped into 3×3 tiles and screened to remove blurry, overexposed, or indiscernible samples. This process yielded 513 valid images covering typical coastal scenes (beaches, seawalls) with varied litter distributions (clustered and scattered). Every litter instance was manually annotated using LabelImg, resulting in 2,534 litter objects. Data augmentation strategies (rotation, translation, scaling, flipping) expanded the dataset to 5,000 training samples.

2.3 Model Training

The training environment comprised Windows 10, Intel Core i9-12900K CPU, NVIDIA GeForce RTX 3080 GPU (10 GB VRAM), and 64 GB RAM. Python 3.8.16 with PyTorch 1.8.0 was used. The YOLOv5l model was trained for 100 epochs with a batch size of 18. The optimizer was SGD with initial learning rate lr0 = 0.01, learning rate decay factor lrf = 0.1, and a warm-up strategy for the first 3 epochs. Weight decay was set to 0.0005.

2.4 Evaluation Metrics

We employed the following metrics for model performance:

Precision: $$P = \frac{TP}{TP+FP}$$
Recall: $$R = \frac{TP}{TP+FN}$$
mAP@0.5: mean Average Precision at IoU threshold 0.5: $$mAP_{0.5} = \frac{1}{C}\sum_{c=1}^{C} AP_c$$
mAP@0.5:0.95: mean AP averaged over IoU thresholds from 0.5 to 0.95 (step 0.05): $$mAP_{0.5:0.95} = \frac{1}{n}\sum_{i=1}^{n} AP_{IoU=0.5+0.05(i-1)}$$
Overall Accuracy: $$Accuracy = \frac{TP+TN}{Total}$$
Kappa Coefficient: $$\kappa = \frac{P_o – P_e}{1 – P_e}$$ where $P_o$ is observed accuracy and $P_e$ is expected accuracy.
Area Estimation Error: $$Error\% = \frac{|A_{pred} – A_{gt}|}{A_{gt}} \times 100\%$$

3. Results

3.1 Detection Performance

Training loss curves (box loss, object loss, classification loss) on both training and validation sets decreased steadily, converging after 100 epochs. Box loss reached ~0.02, object loss ~0.01, and classification loss near zero. The validation set showed similar behavior, indicating no overfitting. Precision stabilized at 0.94, recall at 0.92. mAP@0.5 reached 0.98, and mAP@0.5:0.95 reached 0.88, demonstrating strong detection capability across various IoU thresholds. Table 1 summarizes key performance metrics.

Table 1: YOLOv5l Detection Performance
Metric	Value
Precision	0.94
Recall	0.92
mAP@0.5	0.98
mAP@0.5:0.95	0.88
Training Box Loss (final)	0.02
Validation Box Loss (final)	0.02

3.2 Real-world Validation

We applied the trained model to unseen high-resolution UAV images covering diverse scenes (seawalls and tidal flats). The model successfully detected litter targets (plastic foam, floating debris, wood fragments) under varying backgrounds. The segmentation masks showed high consistency with manual annotations. A confusion matrix for overall classification (litter vs. non-litter) is given in Table 2.

Table 2: Confusion Matrix for Litter Classification
Actual \ Predicted	Litter	Non-litter	Total
Litter	60	10	70
Non-litter	10	354	364
Total	70	364	434

From Table 2, overall accuracy = (60+354)/434 = 0.954, and Kappa coefficient = (0.954 – (70/434 * 70/434 + 364/434 * 364/434)) / (1 – …) = 0.88. Detailed scene-wise accuracy is shown in Table 3.

Table 3: Accuracy per Scene Type
Scene	Sample Count	Overall Accuracy	Kappa	Area Error (%)
Seawall	236	0.96	0.90	5.8
Tidal Flat	198	0.93	0.85	4.2
Overall	434	0.95	0.88	4.5

Processing efficiency: average detection time per image was 0.35 s, segmentation time 0.28 s, total ~0.63 s per image. Area estimation relative error averaged 4.5% across all scenes. These results indicate that our method is both accurate and efficient for operational use.

4. Discussion

Our study demonstrates that the integration of YOLOv5l and SAM provides a robust framework for automated coastal litter detection and coverage quantification. The high overall accuracy (0.95) and Kappa (0.88) confirm the model’s consistency and stability. The area estimation error of 4.5% is acceptable for environmental management purposes, especially given the complex backgrounds typical of coastal zones. Compared to traditional manual surveys, our approach significantly reduces labor and time costs while improving objectivity and scalability.

The synergy between YOLOv5l and SAM is a key innovation. YOLOv5l quickly generates bounding boxes, and SAM refines them into precise masks, handling edge cases where litter is partially occluded or has irregular shapes. This two-stage design balances speed and precision. In previous works, using a single model for both detection and segmentation often led to compromises. For example, single-stage segmentation methods may struggle with small objects, while detection-only models cannot provide area measurements. Our pipeline overcomes these limitations.

The performance on different scenes varied slightly: seawall areas (within drone regulation zones that often require rigorous monitoring) yielded an accuracy of 0.96, while tidal flats achieved 0.93. The lower accuracy on tidal flats can be attributed to stronger background interference (sand texture, tidal channels, sparse vegetation). However, the model still performed reliably, indicating good generalization. The area estimation error was lower on tidal flats (4.2%) than on seawalls (5.8%), possibly due to more consistent lighting and less shadowing. Future work should incorporate more diverse training samples from different seasons, tidal levels, and geographic regions to enhance robustness. Additionally, the current workflow requires a GIS step for area calculation, which prevents end-to-end automation. Developing a fully integrated pipeline that directly outputs area estimates would improve operational efficiency.

Our method has significant implications for drone regulation and coastal environmental policy. Accurate litter mapping enables targeted cleanup operations, resource allocation, and trend analysis. By providing quantitative evidence, authorities can enforce drone regulation more effectively, ensuring that monitoring flights comply with airspace rules while delivering actionable data. Furthermore, the low processing time (~0.63 s per image) allows near-real-time monitoring, which is crucial for rapid response to pollution events. The integration of such intelligent systems with existing drone regulation frameworks can streamline decision-making and promote sustainable coastal management.

Nevertheless, limitations remain. The training dataset was limited to five sites, and samples may not cover extreme conditions (e.g., heavy rainfall, low light, or severe occlusion). Cross-regional transferability of deep learning models is known to degrade without diverse training data. To address this, we plan to collaborate with other coastal monitoring programs to build a large-scale, multi-scene dataset. Moreover, our method currently classifies all litter as a single category. Expanding to multiple litter types (plastic, metal, glass, wood) would provide richer information for recycling and management. This would require additional annotated data and possibly a multi-class detection model.

From a practical perspective, the adoption of UAV-based monitoring must align with drone regulation policies, such as flight height restrictions, no-fly zones, and privacy concerns. Our methodology can be adapted to operate within these constraints by adjusting flight parameters and data processing workflows. For instance, lower flight altitudes may improve GSD but increase flight time and regulatory complexity. Balancing these factors will be essential for operational deployment. The results of this study provide a foundation for integrating AI-driven monitoring into existing drone regulation guidelines, ultimately supporting more effective coastal protection.

5. Conclusion

We have developed and validated an intelligent method for coastal litter monitoring that combines UAV imagery with YOLOv5l object detection and SAM image segmentation. The method achieves an overall accuracy of 0.95, a Kappa coefficient of 0.88, and an area estimation error of 4.5%. The approach significantly reduces manual effort, improves efficiency, and provides reliable quantitative data for litter management. Our work highlights the potential of integrating deep learning with UAV technology to support drone regulation and environmental governance in coastal zones. Future efforts will focus on expanding dataset diversity, enabling multi-class litter classification, and building an end-to-end automated system for real-time monitoring. By advancing these technologies, we can better safeguard coastal ecosystems and promote sustainable development.