The Face-to-Face Still-Face (FFSF) task is a validated and commonly used observational measure of mother-infant socio-emotional interactions. With the ascendence of deep learning-based facial emotion recognition, it is possible that common complex tasks, such as the coding of FFSF videos, could be coded with a high degree of accuracy by deep neural networks (DNNs). The primary objective of this study was to test the accuracy of four DNN image classification models against the coding of infant engagement conducted by two trained independent manual raters.