The Community for Technology Leaders
2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698) (2003)
Baltimore, MD, USA
July 6, 2003 to July 9, 2003
ISBN: 0-7803-7965-9
pp: 57-60
A. Mohan , Perceptual Interfaces & Reality Lab., Maryland Univ., College Park, MD, USA
R. Duraiswami , Perceptual Interfaces & Reality Lab., Maryland Univ., College Park, MD, USA
D.N. Zotkin , Perceptual Interfaces & Reality Lab., Maryland Univ., College Park, MD, USA
D. DeMenthon , Perceptual Interfaces & Reality Lab., Maryland Univ., College Park, MD, USA
L.S. Davis , Perceptual Interfaces & Reality Lab., Maryland Univ., College Park, MD, USA
ABSTRACT
Creating high quality virtual spatial audio over headphones requires real-time head tracking, personalized head-related transfer functions (HRTFs) and customized room response models. While there are expensive solutions to address these issues based on costly head trackers, measured personalized HRTFs and room responses, these are not suitable for widespread or easy deployment and use. We report on the development of a system that uses computer vision to produce customizable models for both the HRTF and the room response, and to achieve head-tracking. The system uses relatively inexpensive cameras and widely available personal computers. Computer-vision based anthropometric measurements of the head, torso, and the external ears are used for HRTF customization. For low-frequency HRTF customization we employ a simple head-and-torso model developed recently [V. R. Algazi et al., 2002]. For high frequency customization we employ measured pinna characteristics as an index into a database of HRTFs [D. N. Zotkin et al., 2002]. For head tracking we employ an online implementation of the POSIT algorithm [D. DeMenthon and L. Davis, 1995] along with active markers to compute head pose in real-time. The system provides an enhanced virtual listening experience at low cost.
INDEX TERMS
CITATION

A. Mohan, D. DeMenthon, D. Zotkin, R. Duraiswami and L. Davis, "Using computer vision to generate customized spatial audio," 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)(ICME), Baltimore, MD, USA, 2003, pp. 57-60.
doi:10.1109/ICME.2003.1221247
85 ms
(Ver 3.3 (11022016))