Segmentation tasks in computer vision have been adopted in various studies in the civil engineering domain to provide accurate object locations in images. However, preparing an-notation to train segmentation models is a time consuming and costly process, which hinders the use of segmentation models in vision-based applications. To address the problem, this study proposes a fusion model integrating self-supervised equivariant attention mechanism (SEAM) and sub-category exploration (SC-CAM) to generate pseudo labels in the form of polygon annotation from bounding box annotation that is relatively easy to obtain. To test the performance of the fusion model, a public data set - Advanced Infrastructure Management Group (AIM) dataset - for construction object detection was selected to generate pseudo labels; the effec-tiveness of pseudo labels was measured by the segmentation performance of a feature pyramid network (FPN) trained with the pseudo labels. FPN showed the mean intersection over union (mIoU) score of 86.03%, demonstrating the po-tential of the proposed fusion model to reduce the manual annotation efforts in preparing training data for segmenta-tion models.