Publications / CCC 2025 - Zadar, Croatia
Object detection and segmentation are crucial for managing construction sites, aiding in tasks such as progress tracking, material management, and safety assurance. However, conventional methods encounter persistent challenges, including occlusion, variable lighting conditions, and the labor-intensive nature of dataset creation, which limit their adaptability to dynamic construction environments. This study introduces a novel zero-shot object detection and segmentation framework designed specifically for construction-related objects, including machinery, workers, and materials. The proposed framework integrates three state-of-the-art models: Florence-2, Llama3.2-Vision, and the Segment Anything Model 2 (SAM2). Florence-2 generates region proposals for previously unseen objects using textual descriptions; Llama3.2-Vision predicts and refines accurate labels for detected regions based on textual queries; and SAM2 produces high-precision segmentation masks. The effectiveness of this approach was validated through both qualitative and quantitative experiments. While parts of this framework and qualitative experiments were previously presented, this paper extends our previous work by providing a more detailed methodology and including additional quantitative experiments. Qualitative experiments using images from a specific tunnel excavation site, demonstrating robust detection and segmentation performance under challenging conditions such as occlusion and variable lighting. Quantitative experiments using the Alberta Construction Image Dataset (ACID) showed that the proposed multi-model method significantly outperformed Florence-2 alone, particularly for large objects, despite not achieving the accuracy of the fine-tuned YOLOv11 model. The proposed framework eliminates the need for extensive retraining and manual dataset creation by leveraging the complementary strengths of these models. This scalable and flexible solution offers practical applications in progress tracking, material management, and safety monitoring and thereby addresses the unique complexities of dynamic construction environments.