Video recordings of earthmoving construction operations provide understandable data that can be used for benchmarking and analyzing their performance. These recordings further support project managers to take corrective actions on performance deviations and in turn improve operational efficiency. Despite these benefits, manual stopwatch studies of previously recorded videos can be labor-intensive, may suffer from biases of the observers, and are impractical after substantial period of observations. This research presents a new computer vision based algorithm for recognizing single actions of earthmoving construction equipment. This is particularly a challenging task as equipment can be partially occluded in site video streams and usually come in wide variety of sizes and appearances. The scale and pose of the equipment actions can also significantly vary based on the camera configurations. In the proposed method, a video is initially represented as a collection of spatio-temporal visual features by extracting space–time interest points and describing each feature with a Histogram of Oriented Gradients (HOG). The algorithm
automatically learns the distributions of the spatio-temporal features and action categories using a multiclass Support Vector Machine (SVM) classifier. Given a video sequence captured from a fixed camera, the multi-class SVM classifier recognizes and localizes equipment actions. The experimental results with average accuracy of 86.33% and 98.33% show that our supervised method outperforms previous algorithms for excavator and truck action recognition. The results hold the promise for applicability of the proposed method for construction activity analysis.
Automated 2D Detection and 3D Localization
As a step towards fully automated performance assessment methods, we focused on automated 2D detection and 3D localization of construction equipment from onsite video streams. In the proposed framework by Memarzadeh et al. (2012)*, a network of fixed high-definition and calibrated cameras is used to record daily construction operations. The video feeds are continuously processed to directly detect frames that contain construction workers and equipment (from now on will be called “resources”). Using low-level features based on Histogram Of Gradients and Hue-Saturation Colors (HOG+C), a new multiple Support Vector Machine (SVM) resource classifier is developed which can recognize and track the dynamic resources in 2D video frames (i.e., worker vs. equipment). Next, using a minimum of two cameras, the detected resources in video frames are processed and are localized in 3D using Direct Linear Transform (DLT) algorithm followed by a non-linear optimization.
- Golparvar-Fard, Heydarian, A., and Niebles, JC. (2013). “Automated Action Recognition of Earthmoving Equipment Using Vision Based Spatio-Temporal Features and Support Vector Machine Classifiers.” Elsevier Journal of Advanced Engineering Informatics.
- Heydarian, A., Golparvar-Fard, M., and Niebles, JC. (2012). “Automated visual recognition of construction equipment actions using spatio-temporal features and multiple binary support vector machines.” Proc. Construction Research Congress, West Lafayett, IN.
- Heydarian A., Memarzadeh, M., and Golparvar-Fard M. (2012). “Automated benchmarking and monitoring of earthmoving operations carbon footprint using video cameras and GHG estimation model.” The International Workshop on Computing in Civil Engineering, Clear Waters, FL.
- Memarzadeh, M., Heydarian, A., Golparvar-Fard, M., and Niebles, JC. (2012). “Real-time and automated 2D recognition and tracking of workers and equipment from Site video streams for construction performance assessment.” The International Workshop on Computing in Civil Engineering, Clear Waters, FL
- Heydarian A., Golparvar-Fard M. (2011). “A Visual Monitoring Framework for Integrated Productivity and Carbon Footprint Control of Construction Operations.” ASCE Int. Workshop of Comp in Civil Eng