Acoustics, Speech, and Signal Processing, IEEE International Conference on (2001)
Salt Lake City, UT, USA
May 7, 2001 to May 11, 2001
G. Saon , IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
We extend the well-known technique of constrained maximum likelihood linear regression (MLLR) to compute a projection (instead of a full rank transformation) on the feature vectors of the adaptation data. We model the projected features with phone-dependent Gaussian distributions and also model the complement of the projected space with a single class-independent, speaker-specific Gaussian distribution. Subsequently, we compute the projection and its complement using maximum likelihood techniques. The resulting ML transformation is shown to be equivalent to performing a speaker-dependent heteroscedastic discriminant (or HDA) projection. Our method is in contrast to traditional approaches which use a single speaker-independent projection, and execute speaker adaptation in the resulting subspace. Experimental results on Switchboard show a 3% relative improvement in the word error rate over constrained MLLR in the projected subspace only.
G. Zweig, M. Padmanabhan and G. Saon, "Linear feature space projections for speaker adaptation," 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings(ICASSP), Salt Lake City, UT, USA, 2001, pp. 325-328.