The Community for Technology Leaders
Acoustics, Speech, and Signal Processing, IEEE International Conference on (2001)
Salt Lake City, UT, USA
May 7, 2001 to May 11, 2001
ISBN: 0-7803-7041-4
TABLE OF CONTENTS

The IBM Personal Speech Assistant (Abstract)

L. Comerford , IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
pp. 1-4

Speaker- and language-independent speech recognition in mobile communication systems (Abstract)

I. Viikki , Speech & Audio Syst. Lab., Nokia Res. Center, Tampere, Finland
pp. 5-8

Portability of syntactic structure for language modeling (Abstract)

C. Chelba , Microsoft Speech.Net/Res., Redmond, WA, USA
pp. a-d

Ubiquitous speech processing (Abstract)

S. Furui , Dept. of Comput. Sci., Tokyo Inst. of Technol., Japan
pp. 13-16

Automatic transcription of voicemail at AT&T (Abstract)

M. Bacchiani , AT&T Labs-Research, Florham Park, NJ, USA
pp. 25-28

Error corrective mechanisms for speech recognition (Abstract)

L. Mangu , IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
pp. 29-32

Explicit word error minimization using word hypothesis posterior probabilities (Abstract)

F. Wessel , Comput. Sci. Dept., RWTH Aachen - Univ. of Technol., Germany
pp. 33-36

From broadcast news to spontaneous dialogue transcription: portability issues (Abstract)

N. Bertoldi , ITC-irst, Centro per la Ricerca Scientifica e Tecnologica, Povo di Trento, Italy
pp. 37-40

Improvements in linear transform based speaker adaptation (Abstract)

L.F. Uebel , Dept. of Eng., Cambridge Univ., UK
pp. 49-52

Innovative approaches for large vocabulary name recognition (Abstract)

Yuqing Gao , IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
pp. 53-56

Recognize tone languages using pitch information on the main vowel of each syllable (Abstract)

C.J. Chen , IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
pp. 61-64

Gaussian mixture selection using context-independent HMM (Abstract)

A. Lee , Nara Inst. of Sci. & Technol., Ikoma, Japan
pp. 69-72

Computing Mel-frequency cepstral coefficients on the power spectrum (Abstract)

S. Molau , Lehrstuhl fur Inf. VI, Rheinisch-Westfalische Tech. Hochschule Aachen, Germany
pp. 73-76

PLP coefficients can be quantized at 400 bps (Abstract)

W. Gunawan , Dept. of Electr. & Comput. Eng., Illinois Univ., Urbana, IL, USA
pp. 77-80

Robust feature extraction using subband spectral centroid histograms (Abstract)

B. Gajic , Sch. of Microelectron. Eng., Griffith Univ., Brisbane, Qld., Australia
pp. 85-88

A novel syllable duration modeling approach for Mandarin speech (Abstract)

Wen-Hsing Lai , Dept. of Commun. Eng., Nat. Chiao Tung Univ., Hsinchu, Taiwan
pp. 93-96

All-pole modelling of mixed excitation signals (Abstract)

P. Kabal , Dept. of Electr. & Comput. Eng., Concordia Univ., Montreal, Que., Canada
pp. 97-100

On the effect of stress on certain modulation parameters of speech (Abstract)

K. Gopalan , Dept. of Eng., Purdue Univ. Calumet, Hammond, IN, USA
pp. 101-104

The statistical structures of male and female speech signals (Abstract)

Te-Won Lee , Computational Neurobiol. Lab., Salk Inst., La Jolla, CA, USA
pp. 105-108

Pole zero estimation from speech signals by an iterative procedure (Abstract)

K. Schnell , Inst. fur Angewandte Phys., Johann Wolfgang Goethe Univ., Frankfurt, Germany
pp. 109-112

An efficient and scalable 2D DCT-based feature coding scheme for remote speech recognition (Abstract)

Qifeng Zhu , Dept. of Electr. Eng., California Univ., Los Angeles, CA, USA
pp. 113-116

Perceptual harmonic cepstral coefficients for speech recognition in noisy environment (Abstract)

L. Gu , Dept. of Electr. & Comput. Eng., California Univ., Santa Barbara, CA, USA
pp. 125-128

Peripheral features for HMM-based speech recognition (Abstract)

T. Fukuda , Graduate School of Eng., Toyohashi Univ. of Technol., Japan
pp. 129-132

Using phase spectrum information for improved speech recognition performance (Abstract)

R. Schluter , Lehrstuhl fur Informatik V1, RWTH Aachen, Germany
pp. 133-136

A study of two dimensional linear discriminants for ASR (Abstract)

S.S. Kajarekar , Oregon Graduate Inst. of Sci. & Technol., Beaverton, OR, USA
pp. 137-140

Formant weighted cepstral feature for LSP-based speech recognition (Abstract)

H. Hermansky , Dept. of Electron. Eng., Pusan Nat. Univ., South Korea
pp. 141-144

On the use of matrix derivatives in integrated design of dynamic feature parameters for speech recognition (Abstract)

R. Chengalvarayan , Lucent Speech Solutions, Lucent Technol. Inc., Naperville, IL, USA
pp. 145-148

Subband feature extraction using lapped orthogonal transform for speech recognition (Abstract)

Z. Tufekci , Dept. of Electr. & Comput. Eng., Clemson Univ., SC, USA
pp. 149-152

Visual speech synthesis using quadtree splines (Abstract)

Xue-Wen Chen , Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA
pp. 153-156

Noise compensation in a multi-modal verification system (Abstract)

C. Sanderson , Sch. of Microelectron. Eng., Griffith Univ., Brisbane, Qld., Australia
pp. 157-160

Optimal weighting of posteriors for audio-visual speech recognition (Abstract)

M. Heckmann , Inst. de la Commuinication Parlee, Inst. Nat. Polytech. de Grenoble, France
pp. 161-164

Hierarchical discriminant features for audio-visual LVCSR (Abstract)

G. Potamianos , IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
pp. 165-168

Measuring the relation between speech acoustics and 2D facial motion (Abstract)

A.V. Barbosa , Center for Res. on Speech, Acoustics, Language & Music, Univ. Fed. de Minas Gerais, Belo Horizonte, Brazil
pp. 181-184

Microphone array sub-band speech recognition (Abstract)

I.A. McCowan , Speech Res. Lab., Queensland Univ. of Technol., Brisbane, Qld., Australia
pp. 185-188

Speech enhancement by multiple beamforming with reflection signal equalization (Abstract)

T. Nishiura , ATR Spoken Language Translation Res. Labs., Kyoto, Japan
pp. 189-192

Microphone array speech dereverberation using coarse channel modeling (Abstract)

S.M. Griebel , Div. of Eng. & Appl. Sci., Harvard Univ., Cambridge, MA, USA
pp. 201-204

A multi-microphone signal subspace approach for speech enhancement (Abstract)

F. Jabloun , Dept. of Electr. & Comput. Eng., McGill Univ., Montreal, Que., Canada
pp. 205-208

Hierarchical stochastic feature matching for robust speech recognition (Abstract)

Hui Jiang , Multimedia Commun. Res. Lab., Lucent Technol. Bell Labs., Murray Hill, NJ, USA
pp. 217-220

Model-combination-based acoustic mapping (Abstract)

M. Westphal , Interactive Syst. Labs., Karlsruhe Univ., Germany
pp. 221-224

Sequential noise estimation with optimal forgetting for robust speech recognition (Abstract)

M. Afify , Multimedia Commun. Res. Lab, Lucent Technol. Bell Labs., Murray Hill, NJ, USA
pp. 229-232

Robust, real-time endpoint detector with energy normalization for ASR in adverse environments (Abstract)

Qi Li , Multimedia Commun. Res. Lab., Lucent Technol. Bell Labs., Murray Hill, NJ, USA
pp. 233-236

Robust speech/non-speech detection using LDA applied to MFCC (Abstract)

A. Martin , France Telecom R&D, Lannion, France
pp. 237-240

Continuous speech recognition without end-point detection (Abstract)

O. Segawa , Graduate Sch. of Eng., Nagoya Univ., Japan
pp. 245-248

Robust end-of-utterance detection for real-time speech recognition applications (Abstract)

R. Hariharan , Speech & Audio Syst. Lab., Nokia Res. Center, Tampere, Finland
pp. 249-252

Multi-stream ASR trained with heterogeneous reverberant environments (Abstract)

M.L. Shire , Int. Comput. Sci. Inst., Univ. of California at Berkeley, CA, USA
pp. 253-256

Adaptive ML-weighting in multi-band recombination of Gaussian mixture ASR (Abstract)

A. Hagen , LIA, Ecole Polytech. Fed. de Lausanne, Switzerland
pp. 257-260

Robust speech recognition in burst-like packet loss (Abstract)

B. Milner , BT Adastral Park, Martlesham Heath, UK
pp. 261-264

Automatic transcription of compressed broadcast audio (Abstract)

C. Barras , Lab. d'Informatique pour la Mecanique et les Sci. de l'Ingenieur, CNRS, Orsay, France
pp. 265-268

Soft-feature decoding for speech recognition over wireless channels (Abstract)

A. Potamianos , Lucent Technol. Bell Labs., Murray Hill, NJ, USA
pp. 269-272

Speech in Noisy Environments: robust automatic segmentation, feature extraction, and hypothesis combination (Abstract)

R. Singh , Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA
pp. 273-276

Adaptive transition bias for robust low complexity speech recognition (Abstract)

K. Koumpis , Digital Signal Process. Group, Nokia Mobile Phones, Copenhagen, Denmark
pp. 277-280

Maximum-likelihood compensation of zero-memory nonlinearities in speech signals (Abstract)

R.W. Morris , Center for Signal & Image Processing, Georgia Inst. of Technol., Atlanta, GA, USA
pp. 289-292

Improved noise robustness by corrective and rival training (Abstract)

C. Meyer , Philips Res. Lab., Aachen, Germany
pp. 293-296

SNR-dependent waveform processing for improving the robustness of ASR front-end (Abstract)

D. Macho , Human Interface Labs., Motorola Inc., Schaumburg, IL, USA
pp. 305-308

MVDR based feature extraction for robust speech recognition (Abstract)

S. Dharanipragada , Human Language Technol., IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
pp. 309-312

Duration normalization for improved recognition of spontaneous and read speech via missing feature methods (Abstract)

J.P. Nedel , Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA
pp. 313-316

EMAP-based speaker adaptation with robust correlation estimation (Abstract)

Eugene Jon , Sch. of Electr. & Comput. Eng., Seoul Nat. Univ., South Korea
pp. 321-324

Linear feature space projections for speaker adaptation (Abstract)

G. Saon , IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
pp. 325-328

Online speaker adaptation based on quasi-Bayes linear regression (Abstract)

Jen-Tzung Chien , Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan
pp. 329-332

Rapid adaptation using penalized-likelihood methods (Abstract)

H. Erdoan , IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
pp. 333-336

Hypothesis-driven adaptation (Hydra): a flexible eigenvoice architecture (Abstract)

S.D. Peters , Nuance Commun., Montreal, Que., Canada
pp. 349-352

Multiple-cluster adaptive training schemes (Abstract)

M.J.F. Gales , Dept. of Eng., Cambridge Univ., UK
pp. 361-364

Neural-network-based HMM adaptation for noisy speech (Abstract)

S. Furui , Dept. of Comput. Sci., Tokyo Inst. of Technol., Japan
pp. 365-368

Speaker compensation with sine-log all-pass transforms (Abstract)

J. McDonough , Interactive Syst. Labs., Karlsruhe Univ., Germany
pp. 369-372

Very fast adaptation with a compact context-dependent eigenvoice model (Abstract)

R. Kuhn , Panasonic Speech Technol. Lab., Panasonic Technol. Inc, Santa Barbara, CA, USA
pp. 373-376

A one-pass strategy for keyword spotting and verification (Abstract)

C.S. Lai , Dept. of Electr. & Electron. Eng., Hong Kong Univ. of Sci. & Technol., Kowloon, China
pp. 377-380

A support vector machines-based rejection technique for speech recognition (Abstract)

Changxue Ma , Human Interface Lab., Motorola Labs., Schaumburg, IL, USA
pp. 381-384

On combining recognizers for improved recognition of spelled names (Abstract)

D. Jouvet , France Te1ecom R&D, Lannion, France
pp. 385-388

Robust confidence annotation and rejection for continuous speech recognition (Abstract)

B. Maison , IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
pp. 389-392

Confidence measures for spoken dialogue systems (Abstract)

R. San-Segundo , Center for Spoken Language Res., Colorado Univ., Boulder, CO, USA
pp. 393-396

Comparison of different objective functions for optimal linear combination of classifiers for speaker identification (Abstract)

H. Altincay , Dept. of Comput. Eng., Eastern Mediterranean Univ., Gazi Magusa, Cyprus
pp. 401-404

Fractal dimension applied to speaker identification (Abstract)

A. Petry , Instituto de Informatica, Univ. Fed. do Rio Grande do Sul, Porto Alegre, Brazil
pp. 405-408

Source and system features for speaker recognition using AANN models (Abstract)

B. Yegnanarayana , Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Madras, India
pp. 409-412

Speaker change detection and speaker clustering using VQ distortion for broadcast news speech recognition (Abstract)

K. Mori , Dept. of Inf. & Comput. Sci., Toyohashi Univ. of Technol., Aichi, Japan
pp. 413-416

A hybrid GMM/SVM approach to speaker identification (Abstract)

S. Fine , IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
pp. 417-420

Learning the decision function for speaker verification (Abstract)

S. Bengio , IDIAP, Martigny, Switzerland
pp. 425-428

Learning statistically efficient features for speaker recognition (Abstract)

Gil-Jin Jang , Dept. of Comput. Sci., KAIST, Taejon, South Korea
pp. 437-440

A combination between VQ and covariance matrices for speaker recognition (Abstract)

M. Faundez-Zanuy , Escola Universitaria Politecnica de Mataro, Univ. Politecnica de Catalunya, Barcelona, Spain
pp. 453-456

Text-dependent speaker verification under noisy conditions using parallel model combination (Abstract)

M. Faundez-Zanuy , Sch. of Electron. & Electr. Eng., Univ. of Birmingham, UK
pp. 457-460

Continuous speech recognition using a hierarchical Bayesian model (Abstract)

F. Mouria-Behi , Artificial Intelligence Group, ENSI/LIA, Tunis, Tunisia
pp. 469-472

Multiple linear transforms (Abstract)

N.K. Goel , LSI Logic, Gaithersburg, MD, USA
pp. 481-484

Discriminative training of HMM using maximum normalized likelihood algorithm (Abstract)

K. Markov , Dept. of Inf. & Comput. Sci., Toyohashi Univ. of Technol., Japan
pp. 497-500

Modeling uncertainty of data observation (Abstract)

A. Wendemuth , Philips Res. Lab., Aachen, Germany
pp. 501-504

Multiple mixture segmental HMM and its applications (Abstract)

Bing Xiang , Sch. of Electr. Eng. & Comput. Eng., Cornell Univ., Ithaca, NY, USA
pp. 509-512

Multiple-regression hidden Markov model (Abstract)

K. Fujinaga , Japan Adv. Inst. of Sci. & Technol., Ishikawa, Japan
pp. 513-516

Tandem acoustic modeling in large-vocabulary recognition (Abstract)

D.P.W. Ellis , Dept. of Electr. Eng., Columbia Univ., New York, NY, USA
pp. 517-520

Towards task-independent speech recognition (Abstract)

F. Lefevre , Lab. d'Inf. pour la Mecanique et les Sci. de l'Ingenieur, CNRS, Orsay, France
pp. 521-524

Nonlinear dynamical system based acoustic modeling for ASR (Abstract)

N.D. Warakagoda , Dept. of Telecommun., NTNU, Trondheim, Norway
pp. 525-528

Improving trigram language modeling with the World Wide Web (Abstract)

Xiaojin Zhu , Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
pp. 533-536

Dialog-context dependent language modeling combining n-grams and stochastic context-free grammars (Abstract)

K. Hacioglu , Center for Spoken Language Res., Colorado Univ., Boulder, CO, USA
pp. 537-540

Use of non-negative matrix factorization for language model adaptation in a lecture transcription task (Abstract)

M. Novak , IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
pp. 541-544

Data augmentation and language model adaptation (Abstract)

D. Janiszek , LIA, Univ. of Avignon, France
pp. 549-552

Classes for fast maximum entropy training (Abstract)

J. Goodman , Microsoft Res., Washington, DC, USA
pp. 561-564

Automatic generation and selection of multiple pronunciations for dynamic vocabularies (Abstract)

S. Deligne , IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
pp. 565-568

Sub-lexical modelling using a finite state transducer framework (Abstract)

Xiaolong Mou , Lab. for Comput. Sci., MIT, Cambridge, MA, USA
pp. 573-576

What kind of pronunciation variation is hard for triphones to model? (Abstract)

D. Jurafsky , Center for Spoken Language Res., Colorado Univ., Boulder, CO, USA
pp. 577-580

Bootstrap method for Chinese new words extraction (Abstract)

Shan He , Dept. of Electr. Eng., Shanghai Jiaotong Univ., China
pp. 581-584

Kanji-to-hiragana conversion based on a language model (Abstract)

Wei-Bin Chang , Philips Res. East Asia - Taipei, Taiwan
pp. 585-588

A dynamic semantic model for re-scoring recognition hypotheses (Abstract)

C. Wai , Human-Computer Commun. Lab., Chinese Univ. of Hong Kong, China
pp. 589-592

Advances in automatic meeting record creation and access (Abstract)

A. Waibel , Interactive Syst. Labs, Carnegie Mellon Univ., Pittsburgh, PA, USA
pp. 597-600

Topic Forest: a plan-based dialog management structure (Abstract)

Xiaojun Wu , Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
pp. 617-620

Speech enhancement using the sparse code shrinkage technique (Abstract)

I. Potamitis , Electr. & Comput. Eng. Dept., Patras Univ., Greece
pp. 621-624

STFT-based multi-channel acoustic interference suppressor (Abstract)

C. Avendano , Creative Adv. Technol. Center, Scotts Valley, CA, USA
pp. 625-628

Experimental investigation of delayed instantaneous demixer for speech enhancement (Abstract)

Yong Xiang , Dept. of Electr. & Electron. Eng., Melbourne Univ., Parkville, Vic., Australia
pp. 633-636

Lattice-ladder decorrelation filters developed for co-channel speech separation (Abstract)

Kuan-Chieh Yen , Dept. of CECS, Missouri Univ., Columbia, MO, USA
pp. 637-640

Single channel speech enhancement using MDL-based subspace approach in Bark domain (Abstract)

R. Vetter , Centre Suisse d'Electronique et de Microtech., Neuchfitel, Switzerland
pp. 641-644

Recursively updated eigenfilterbank for speech enhancement (Abstract)

M. Jeppesen , Centre for Person Kommunikation, Aalborg Univ., Denmark
pp. 653-656

Statistical speech reconstruction at the phoneme level (Abstract)

M. Savic , ECSE Dept., Rensselear Polytech. Inst., New York, NY, USA
pp. 657-660

On speech enhancement under signal presence uncertainty (Abstract)

I. Cohen , Lamar Signal Process. Ltd, Israel
pp. 661-664

A cross-correlation technique for enhancing speech corrupted with correlated noise (Abstract)

Yi Hu , Dept. of Electr. Eng., Texas Univ. at Dallas, Richardson, TX, USA
pp. 673-676

Author Index (Abstract)

pp. a-1-a-14
107 ms
(Ver )