DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2017.156
Dense prediction is concerned with predicting a label for each of the input units, such as pixels of an image. Accurate dense prediction for time-varying inputs finds applications in a variety of domains, such as video analysis and medical imaging. Such tasks need to preserve both spatial and temporal structures that are consistent with the inputs. Despite the success of deep learning methods in a wide range of artificial intelligence tasks, time-varying dense prediction is still a less explored domain. Here, we proposed a general encoder-decoder network architecture that aims to addressing time-varying dense prediction problems. Given that there are both intra-image spatial structure information and temporal context information to be processed simultaneously in such tasks, we integrated fully convolutional networks (FCNs) with recurrent neural networks (RNNs) to build a recurrent encoder-decoder network. The proposed network is capable of jointly processing two types of information. Specifically, we developed convolutional RNN (CRNN) to allow dense sequence processing. More importantly, we designed CRNN bottleneck modules for alleviating the excessive computational cost incurred by carrying out multiple convolutions in the CRNN layer. This novel design is shown to be a critical innovation in building very flexible and efficient deep models for time-varying dense prediction. Altogether, the proposed model handles time-varying information with the CRNN layers and spatial structure information with the FCN architectures. The multiple heterogeneous modules can be integrated into the same network, which can be trained end-to-end to perform time-varying dense prediction. Experimental results showed that our model is able to capture both high-resolution spatial information and relatively low-resolution temporal information as compared to other existing models.