Effect of Missing Data on Machine-Learning Algorithms for Real-time Safety Monitoring in Scaffolds

Laura Alvarez; Mahendra Ghimire; Jee Woong Park

Abstract:

In the context of real-time data acquisition and processing, dealing with missing data (MD) is a common challenge that can compromise the quality and effectiveness of machine learning (ML) algorithms. Previous research focuses on creating a real-time safety monitoring system that predicts safety conditions in scaffolds by analyzing strain measurements from sensors placed in the structure's columns. However, it does not address the effect of sensor failures and the resulting MD. This paper explores how the presence of MD, caused by faulty sensors, affects the performance of eight ML algorithms in a safety monitoring scaffolding system: gaussian naive Bayes (GNB), random forest (RF), multi-layer perceptron (MLP), support vector machine (SVM), decision tree (DT), XGBoost (XGB), logistic regression (LR), and linear support vector classification (LSVC). This study identifies how these algorithms perform when processing datasets with missing values. As the amount of MD in the datasets increases, there is a consistent negative influence on the performance of each algorithm, resulting in reduced predictive accuracy. Among all the tested ML algorithms, RF and DT have shown to be the most sensitive to MD.