Publications / 2024 Proceedings of the 41st ISARC, Lille, France

Automated Inspection Report Generation Using Multimodal Large Language Models and Set-of-Mark Prompting

Hongxu Pu, Xincong Yang, Zhongqi Shi, Nan Jin
Pages 1003-1009 (2024 Proceedings of the 41st ISARC, Lille, France, ISBN 978-0-6458322-1-1, ISSN 2413-5844)
Abstract:

In the context of the increasing expansion and complexity of civil engineering projects, construction inspection plays a crucial role in ensuring project quality and safety. The traditional construction inspection report writing process mainly relies on the manual records of on-site inspectors. This process is not only time-consuming but also easily affected by personal subjective judgments. In the current rapidly evolving construction environment, there are obvious limitations to this traditional method, especially in terms of the accuracy and timeliness of the reports. In view of this, this study proposes an innovative approach that combines the Set-of-Mark (SoM) prompting technology and the multimodal Large Language Models (LLMs), aiming to automate the construction inspection report generation process and improve the efficient and effectiveness of the onsite inspection. The case study shows that the method can fulfill the basic requirements of construction inspection reports and further improves the quality of the report in complicated scene through SoM prompting. The core of this method is to conduct a more accurate analysis of the conditions of the construction site by overlaying marks on key areas of the construction inspection images and using the multimodal LLMs to capture the region of interest (ROI), and then automatically generate detailed construction inspection reports. This technological innovation not only significantly improves the efficiency of construction inspection report writing, but also greatly enhances the quality and credibility of the report content through in-depth image analysis and text generation.

Keywords: Construction inspection, multimodal large language model, Set-of-Mark prompting, automated report generation