Unsupervised Cross-Modal Deep-Model Adaptation for Audio-Visual Re-identification with Wearable Cameras
2017 IEEE International Conference on Computer Vision Workshop (ICCVW) (2017)
Oct. 22, 2017 to Oct. 29, 2017
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICCVW.2017.59
Model adaptation is important for the analysis of audiovisual data from body worn cameras in order to cope with rapidly changing scene conditions, varying object appearance and limited training data. In this paper, we propose a new approach for the on-line and unsupervised adaptation of deep-learning models for audio-visual target reidentification. Specifically, we adapt each mono-modal model using the unsupervised labelling provided by the other modality. To limit the detrimental effects of erroneous labels, we use a regularisation term based on the Kullback-Leibler divergence between the initial model and the one being adapted. The proposed adaptation strategy complements common audio-visual late fusion approaches and is beneficial also when one modality is no longer reliable. We show the contribution of the proposed strategy in improving the overall re-identification performance on a challenging public dataset captured with body worn cameras.
Adaptation models, Visualization, Feature extraction, Labeling, Speech recognition, Training, Cameras
A. Brutti and A. Cavallaro, "Unsupervised Cross-Modal Deep-Model Adaptation for Audio-Visual Re-identification with Wearable Cameras," 2017 IEEE International Conference on Computer Vision Workshop (ICCVW), Venice, Italy, 2017, pp. 438-445.