Publications / 2024 Proceedings of the 41st ISARC, Lille, France

GPT-based Logic Reasoning for Hazard Identification in Construction Site using CCTV Data

Dai Quoc Tran, Yuntae Jeon, Minsoo Park, Seunghee Park
Pages 291-298 (2024 Proceedings of the 41st ISARC, Lille, France, ISBN 978-0-6458322-1-1, ISSN 2413-5844)
Abstract:

The applications of deep learning-based robust surveillance are vital for improving safety at construction sites, with closed-circuit television (CCTV) systems serving as a pivotal tool in achieving this goal. Despite the recent progress in state-of-the-art deep learning models, the task of hazard identification remains a persistent difficulty due to the complexity of the working environment. This paper presents a novel end-to-end pipeline termed Image-to-Hazard that aims to address the disparity between individual single-model predictions. The pipeline incorporates multimodal inputs and uses logical reasoning to establish connections. The pipeline integrates a model based on GPT architecture from the OpenAI API, encompassing various tasks such as detection, depth estimation, danger identification, and logical reasoning. Firstly, an actual video dataset was obtained from construction sites and annotated. Subsequently, customized object detection models were trained and optimized. Afterward, a thorough extraction of visual features was conducted by utilizing pre-trained models for tasks such as semantic segmentation and depth estimation. Subsequently, prompt engineering was conducted to seamlessly include the input of visual feature information, and these structures were integrated into OpenAI GPT-based models to enhance their capacity for logical reasoning. As a result, a proposed approach showed its robustness in integrating the GPT-based model and vision model for automated hazard identification and management at construction sites.

Keywords: safety management, deep learning