Publications / 2024 Proceedings of the 41st ISARC, Lille, France
Manual excavator activity monitoring to evaluate their performance and productivity is laborious, time-consuming, and error-prone. To address these problems, many automated computer vision-based frameworks have been developed for the detection of excavators and the classification of their activities. Most of the current methods consist of several separately optimized modules that are applied to the input video sequentially. Recently, single-stage spatiotemporal activity recognition methods are gaining more popularity in the construction community. You Only Watch Once (YOWO) network and its variation (i.e., YOWO53) have proved to be superior to the three-stage approaches for activity recognition of construction workers. This paper investigates the benefits of using YOWO and YOWO53 over the three-stage methods for the activity recognition of excavators, by utilizing a large custom dataset of 1,060 video clips collected from both local construction sites and YouTube, with different camera angles, illuminations, occlusions, weather conditions, and video resolutions. The results demonstrate 88.9 and 88.7% classification accuracy and F1-score, respectively for the YOWO method compared to 70.4% and 69.8% classification accuracy and F1-score for the three-stage method. This indicates the feasibility and benefits of deploying the single-stage methods to near real-time applications.