Automation for retrieving relevant contents without human interventions has been considered as an essential task in the construction industry. Computer vision has grasped attention to be employed for providing rich data from the surrounding environment and automation of such critical tasks. Nonetheless, various challenges, including the detection of complicated and changing interactions and processing largescale data remain unresolved. Deep learning methods have led to satisfactory achievement in providing progress monitoring systems, especially with detecting complex human motions and activities in construction scenes. However, further research contributions for vision-based safety monitoring are required to determine existing limitations and gaps in the construction and infrastructure field. In this paper, through some bibliographics and scientometrics analyses, more research backgrounds are suggested for application of computer vision and especially, deep learning, in construction robotics. Moreover, the accuracy of various computer vision methods is analyzed and compared by considering publishing year, countries, institutes. This demonstrates that Deep Learning application is still premature in the construction context, and current researches lack robust and swift image processing techniques.