Publications / 2024 Proceedings of the 41st ISARC, Lille, France

Assisting in the identification of ergonomic risks for workers: a large vision-language model approach

Chao Fan, Qipei Mei, Xinming Li
Pages 1010-1017 (2024 Proceedings of the 41st ISARC, Lille, France, ISBN 978-0-6458322-1-1, ISSN 2413-5844)
Abstract:

In the construction industry, due to workers frequently engaging in highly physically demanding tasks and using various tools, workers are often exposed to ergonomic risks and safety hazards. Various observation-based traditional or computer vision-based artificial intelligence methods have been applied in the field of construction to assess ergonomic risks. However, the method of assessing ergonomic risks using Generative Pretrained Transformers (GPT) based visual language models has not been thoroughly explored. This study explores its unique ability in visual-text interaction to extract ergonomic risk information from images and generate corresponding human-like language descriptions. To test the feasibility and performance of the proposed method, two datasets were created. Each dataset contained 100 different scenarios with ergonomic risk information for finetuning and testing. Performance after finetuning the vision-language model with the finetuning dataset outperformed the model before finetuning; the results showed that the fine-tuned model achieved an accuracy of 81%, while the model before finetuning only achieved 28% accuracy. Therefore, the proposed method offers an automated, real-time, non-traditional artificial intelligence approach for identifying ergonomic risks and providing human-like language descriptions. This expands the perspective of health and safety-related problem-solving and promotes the prevention of work-related musculoskeletal disorders (WMSD) in the construction industry.

Keywords: Ergonomic Risks Identification, Work Safety, GPT, Vision Language Model, Construction Safety