Publications / 2025 Proceedings of the 42nd ISARC, Montreal, Canada

RAG-Enhanced Safety Information Retrieval for Construction: Integration of Large Language Models with Domain-Specific Information

Xianxiang Zhao, Advik Mehta, Falak Sethi, Brian Gue, Qipei Mei, Lingzi Wu
Pages 674-682 (2025 Proceedings of the 42nd ISARC, Montreal, Canada, ISBN 978-0-6458322-2-8, ISSN 2413-5844)
Abstract:

In the construction industry, critical safety information is often scattered across numerous documents, standards, and regulations, making it challenging for practitioners to access and comprehend safety knowledge in their daily operations efficiently. To address this challenge, we propose an intelligent and reliable question-answering system for information retrieval and response generation on the construction health, safety, and environment documents via retrieval-augmented generation. Specifically, our system combines a finetuned LLaMA-3-8B base model with a vector database constructed using embedding models, enabling accurate information retrieval and enhancing the generated responses' reliability. Initial validation using cosine similarity analysis demonstrates promising results, with our system achieving a cosine similarity score of 0.936, outperforming the LLAMA3-8B base model's score of 0.884 in processing construction safety documentation. The preliminary findings show that: 1) our RAG-enhanced system provides safety information access, and 2) our specialized preprocessing techniques effectively synthesize and retrieve safety information, reducing fragmentation and access time.

Keywords: Construction Safety; Information Retrieval; Large Language Model (LLM); Retrieval Augmented Generation (RAG); Question Answering System; Vector Database