Publications / CCC 2025 - Zadar, Croatia
Large Language Models (LLMs) are increasingly being explored for robotics applications, enabling more natural and intuitive communication between humans and machines. In the construction industry, where automation has lagged behind other sectors and the workforce often lacks specialized robotics training, LLMs offer a promising solution for improving human-robot interaction. However, most existing implementations rely on cloud-based processing, which introduces challenges such as network latency, unreliable connectivity at remote sites, and concerns over security and privacy when transmitting sensitive project data. These limitations make cloud-dependent LLMs impractical for real-world construction environments, which are dynamic and unpredictable, requiring robust and responsive systems. With recent improvements in model efficiency, LLMs are becoming lighter and with better performance, making local execution on edge devices increasingly feasible. This study investigates the feasibility of deploying LLMs entirely on-device for construction robotics. We propose a locally executed framework that processes multimodal inputs such as speech and vision, directly on construction robots. By eliminating reliance on external servers, this approach ensures close to real-time responses, greater autonomy, and enhanced resilience in bandwidth-constrained settings. The research describes the framework, hardware integration, and the application of straightforward natural language prompts to enable practical multimodal processing in field conditions. Our results demonstrate how locally executed LLMs can provide a robust, responsive, and accessible interface for human-robot collaboration in construction. Additionally, we demonstrate in a case study how few-shot prompting can be used within our framework to enable a Local Intelligent Safety Assistant (LISA) that inspects work areas, interacts with workers in real time, and helps ensure compliance with on-site safety measures.