Publications / CCC 2025 - Zadar, Croatia
The rapid integration of Large Language Models (LLMs) into AI-driven project management systems is transforming the construction industry by enhancing efficiency, automation and decision-making. However, the use of LLMs in the processing of sensitive construction documents raises critical privacy and data security concerns. This paper explores the challenges of handling sensitive information with a focus on methods for removing sensitive data from files before they are processed for LLM applications. Before text data is tokenised and integrated into an LLM, it is important to implement pre-processing techniques that ensure data privacy. Sensitive information, such as financial details, personal data and project-specific proprietary content, must be identified and removed or masked at document level. Techniques such as Named Entity Recognition (NER) can be used to identify personally identifiable information, which can then be redacted or replaced with anonymised placeholders. Automated text redaction and metadata removal tools further enhance security by preventing the unintentional disclosure of confidential content. By ensuring that sensitive data is removed before the documents are processed by LLMs, the construction industry can utilise AI-powered tools while adhering to strict data privacy and security standards. This paper evaluates the effectiveness of these pre-processing techniques and discusses their importance for construction project management.