Issue No. 09 - September (2007 vol. 56)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TC.2007.1081
It has been shown recently that the pronunciation characteristics of speakers can be represented by articulatory feature-based conditional pronunciation models (AFCPMs). However, the pronunciation models are phoneme-dependent, which may lead to speaker models with low discriminative power when the amount of enrollment data is limited. This paper proposes to mitigate this problem by grouping similar phonemes into phonetic classes and representing background and speaker models as phonetic-class dependent density functions. Phonemes are grouped by (1) vector quantizing the discrete densities in the phoneme-dependent universal background models, (2) using the phone properties specified in the classical phoneme tree, or (3) combining vector quantization and phone properties. Evaluations based on 2000 NIST SRE show that this phonetic-class approach effectively alleviates the data spareness problem encountered in conventional AFCPM, which results in better performance when fused with acoustic features.
Speaker verification, pronunciation modeling, articulatory features, phonetic classes, NIST speaker recognition evaluation
H. Meng, S. Zhang and M. Mak, "Speaker Verification via High-Level Feature Based Phonetic-Class Pronunciation Modeling," in IEEE Transactions on Computers, vol. 56, no. , pp. 1189-1198, 2007.