Crane operators fatigue is one of the significant constraints should be monitored. Otherwise, it may lead to inefficient crane operations and safety issues. Recently, many deep neural networks have been developed for fatigue monitoring of vehicle drivers by processing the image or video data. However, the challenge is to distinguish the slight variations of facial features among still and motion frames (e.g., nodding and head tilt, yawning and talking). It can be exacerbated in the scenarios for crane operators due to their constant head moving to track the loads position and recurrent communication (talking) with crane banksman. In contrast to previous approaches, which models spatial information and traditional temporal information for sequential processing, this study proposes a hybrid model can not only extract the spatial features by customized convolutional neural networks (CNN) but also enrich the modeling dynamic motions in the temporal dimension through the deep bidirectional long short-term memory (DB-LSTM). This hybrid model is trained and evaluated on the very popular dataset NTHU-DDD, and the results show that the proposed architecture achieves 93.6% overall accuracy and outperform the previous models in the literature.