2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017) (2017)
Washington, DC, DC, USA
May 30, 2017 to June 3, 2017
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/FG.2017.59
Model selection methods based on stochastic regularization have been widely used in deep learning due to their simplicity and effectiveness. The well-known Dropout method treats all units, visible or hidden, in the same way, thus ignoring any a priori information related to grouping or structure. Such structure is present in multi-modal learning applications such as affect analysis and gesture recognition, where subsets of units may correspond to individual modalities. Here we describe Modout, a model selection method based on stochastic regularization, which is particularly useful in the multi-modal setting. Different from other forms of stochastic regularization, it is capable of learning whether or when to fuse two modalities in a layer, which is usually considered to be an architectural hyper-parameter by deep learning researchers and practitioners. Modout is evaluated on two real multi-modal datasets. The results indicate improved performance compared to other forms of stochastic regularization. The result on the Montalbano dataset shows that learning a fusion structure by Modout is on par with a state-of-the-art carefully designed architecture.
F. Li, N. Neverova, C. Wolf and G. Taylor, "Modout: Learning Multi-Modal Architectures by Stochastic Regularization," 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017)(FG), Washington, DC, DC, USA, 2017, pp. 422-429.