Estimating the productivity of construction operations is one of the most challenging tasks for project managers. Therefore, the construction industry always looks toward new advancements for automating this process. New automated methods for productivity estimation aim to detect the types, locations, and activities of construction equipment based on sensory data. Computer Vision (CV) is one of the most promising automated methods and it provides an affordable opportunity for estimating the productivity since it only requires regular surveillance cameras for data collection, which are available on many construction sites. One of the widely-used CV methods for classifying equipment is Histogram of Oriented Gradient (HOG). Additionally, Bag of Words (BoWs) and Local Binary Pattern (LBP) are other types of descriptors widely used for the object classification. However, these methods reduce the dimensions of the image features to train the classifiers for object detection, which may reduce the reliability of the results. Convolutional Neural Networks (CNN), which are a special type of Artificial Neural Networks (ANN) with deeper layer structure, provide a better approach for object detection compared to the conventional methods due to their deeper understanding of the object features. Furthermore, the advancements in Graphical Processing Units (GPU) made this computationally heavy method more applicable in practice. This paper aims to evaluate the performance of CNN for detecting equipment on construction sites. Several configurations of CNN are trained for detecting multiple equipment (i.e. dump trucks, excavators and loaders). The results of these configurations are compared with those of conventional detectors.