Publications / 2025 Proceedings of the 42nd ISARC, Montreal, Canada
In the construction industry, critical safety information is often scattered across numerous documents, standards, and regulations, making it challenging for practitioners to access and comprehend safety knowledge in their daily operations efficiently. To address this challenge, we propose an intelligent and reliable question-answering system for information retrieval and response generation on the construction health, safety, and environment documents via retrieval-augmented generation. Specifically, our system combines a finetuned LLaMA-3-8B base model with a vector database constructed using embedding models, enabling accurate information retrieval and enhancing the generated responses' reliability. Initial validation using cosine similarity analysis demonstrates promising results, with our system achieving a cosine similarity score of 0.936, outperforming the LLAMA3-8B base model's score of 0.884 in processing construction safety documentation. The preliminary findings show that: 1) our RAG-enhanced system provides safety information access, and 2) our specialized preprocessing techniques effectively synthesize and retrieve safety information, reducing fragmentation and access time.