This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Speech Processing at BBN
January-March 2006 (vol. 28 no. 1)
pp. 32-45
John Makhoul, BBN Technologies
This survey of Bolt Beranek and Newman's speech processing activities covers a period that began around 1971. Areas of importance-technical as well as historical?include speech recognition and understanding, speech coding, speaker recognition, and speech modification. A number of today's best-regarded techniques in speech and language processing stem from BBN?s early work.

1. M.Mitchell Waldrop, The Dream Machine—J.C.R. Licklider and the Revolution That Made Computing Personal, Viking, 2001.
2. J. Swets, "The ABC's of BBN: From Acoustics to Behavioral Sciences to Computers," IEEE Annals of the History of Computing, vol. 27, no. 2, Apr.-June 2005, pp. 15-29.
3. A.L. Norberg and J.E. O'Neill, Transforming Computer Technology: Information Processing for the Pentagon, 1962–1986, Johns Hopkins Press, 1996.
4. D. Walden, "Artificial Intelligence at BBN," A Culture of Innovation, D. Walden and R. Nickerson, eds., to be published.
5. I was officially part of the Cognitive Information Processing Group at MIT and Murray Eden was my thesis advisor. However I did most of my thesis work in the Speech Communications Group under Stevens' direction.
6. A.L. Norberg and J.E. O'Neill,Transforming Computer Technology: Information Processing for the Pentagon, pp. 232-233.
7. A. Newell et al., Speech-Understanding Systems: Final Reports of a Study Group, North-Holland, 1971.
8. Computer Science Telecommunications Board National Research Council Funding Revolution—Government Support for Computing Research, National Academic Press 1999.
9. A.L. Norberg and J.E. O'Neill, Transforming Computer Technology: Information Processing for the Pentagon, p. 233.
10. The SUR program the results of each of the contracts are documented by D. Klatt Trends in Speech Recognition, Prentice Hall W.A. Lea ed. 1980 pp. 247-421.
11. And Schwartz has been more or less involved in the algorithms of most of BBN's speech projects in the years since.
12. Another student of Ken Stevens who joined BBN is Ray Tomlinson of @-sign fame whose contributions are described elsewhere in this issue. Tomlinson did his MS thesis under Stevens' supervision.
13. J.J. Wolf and W.A. Woods, "The HWIM Speech Understanding System," D. Klatt, Trends in Speech Recognition, W.A. Lea, ed., Prentice Hall, 1980, pp. 316-339.
14. J. Makhoul and J. Wolf,Linear Prediction and the Spectral Analysis of Speech, tech. report 2304, BBN, Aug. 1972.
15. J. Makhoul, "Linear Prediction: A Tutorial Review," Proc. IEEE, vol. 63, no. 4, 1975, pp. 561-580. This paper was named a "Citation Classic" by the Institute for Scientific Information in the 22 Mar. 1982 issue of Current Contents.
16. The travel task was to recognize questions statements from a hypothetical air traveler as if the speech understanding system were a travel agent. The travel task has been used frequently over the years by researchers wishing to test or demonstrate their systems. This task has the advantage of having a reasonably large but constrained vocabulary in a relatively narrow world of discourse addressing a problem most observers are familiar with who can see the difficulty value of its solution by computer.
17. The branching factor is a measure of the average number of word choices at each point in an utterance. The term has since been replaced by the probabilistic measure of "perplexity." See L.R. Bahl F. Jelinek, R.L. Mercer, "A Maximum Likelihood Approach to Continuous Speech Recognition," IEEE Trans. Pattern Analysis Machine Intelligence, vol. PAMI-5 no. 2 1983 pp. 179-190.
18. D.H. Klatt, "Overview of the ARPA Speech Understanding Project," Trends in Speech Recognition, Prentice Hall, W.A. Lea, ed., 1980, p. 254.
19. S. Baron, "Control Systems R&D at BBN," IEEE Annals of the History of Computing, vol. 27, no. 2, 2005, pp. 52-64.
20. V. Viswanathan et al., "Variable Frame Rate Transmission: A Review of the Methodology and Applications to Narrowband LPC Speech Coding," IEEE Trans. Comm., vol. COM-30, no. 4, 1982, pp. 674-686.
21. C.J. Weinstein and J.W. Forgie, "Experience with Speech Communication in Packet Networks," IEEE J. Selected Areas in Comm., vol. SAC-1, no. 6, 1983, pp. 963-980.
22. W. Lidinsky and D. Vlack eds., Perspectives on Packetized Voice and Data Communication, IEEE Press, 1990.
23. As part of the ARPA-sponsored 1970s multicontractor network speech project many issues were discovered that fed into follow-on work in another group at BBN and collaborating groups elsewhere and became the basis of the Internet Engineering Task Force effort that is the basis of voice-over-IP technology today.
24. The speech compression algorithm used in that demo was BBN's.
25. One notable historical comment here is that Texas Instruments, in developing its innovative and popular 1978 Speak & Spell toy, made extensive use of speech compression technology for coding the stored speech—much of it obtained through reading of various publications, including Natural Communication with Computers: Speech Compression Research at BBN, tech. report 2976 BN 1974 on ARPA speech compression work, and through personal communication with BBN and other scientists.
26. J. Makhoul et al., Natural Communication with Computers: Speech Compression Research at BBN, tech. report 2976 vol. II BBN Dec. 1974.
27. E. Blackman et al., "Narrowband LPC Speech Transmission over Noisy Channels," IEEE Int'l Conf. Acoustics, Speech, Signal Processing, IEEE Press, 1979, pp. 60-63.
28. V. Viswanathan, A.L. Higgins, and W.H. Russell, "Design of a Robust Base-band LPC Coder for Speech Transmission Over 9.6 kbit/s Noisy Channels," IEEE Trans. Comm., COM-30, no. 4, 1982, pp. 663-673.
29. J. Makhoul and M. Berouti, "Predictive and Residual Encoding of Speech," J. Acoust. Soc. Amer., vol. 66, Dec. 1979, pp. 1633-1641.
30. R. Viswanathan, W. Russell, and A. Higgins, "Noisy-Channel Performance of 16 kb/s APC Coders," Proc. IEEE Int'l Conf. Acoustics, Speech, Signal Processing, IEEE Press, 1981, pp. 615-618.
31. J. Wolf and K. Field, "Real-Time Speech Coder Implementation on an Array Processor," IEEE Trans. Comm., vol. 30, no. 4, 1982, pp. 615-620.
32. V. Viswanathan et al., High Quality 16 kbit/s Voice A/D Implementation, final report no. 6205, BBN, Apr. 1986.
33. J. Makhoul, S. Roucos, and H. Gish, "Vector Quantization in Speech Coding," invited paper, Proc. IEEE, vol. 73, no. 11, Nov. 1985, pp. 1551-1588.
34. S. Roucos, R. Schwartz, and J. Makhoul, "A Segment Vocoder at 150B/S," Proc. IEEE Int'l Conf. Acoustics, Speech, Signal Processing, IEEE Press, 1983, pp. 61-64.
35. P. Jeanrenaud and P. Peterson, "Segment Vocoder Based on Reconstruction with Natural Segments," Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, IEEE Press, 1991, pp. 605-608.
36. R. Schwartz et al., "Diphone Synthesis for Phonetic Vocoder," Proc. IEEE Int'l Conf. Acoustics, Speech, Signal Processing, IEEE Press, 1979, pp. 891-894.
37. V. Viswanathan and J. Makhoul, "Quantization Properties of Transmission Parameters in Linear Predictive Systems," IEEE Trans. Acoustics, Speech, Signal Processing, vol. 23, no. 3, 1975, pp. 309-321.
38. J. Makhoul, "Stable and Efficient Lattice Methods for Linear Prediction," IEEE Trans. Acoustics, Speech, Signal Processing, vol. 25, no. 5, 1977, pp. 423-428. This paper was awarded the IEEE Signal Processing Society Senior Award in 1978.
39. V. Viswanathan, W. Russell, and J. Makhoul, "Objective Speech Quality Evaluation of Narrowband LPC Vocoders," Proc. IEEE Int'l Conf. Acoustics, Speech, Signal Processing, IEEE Press, 1978, pp. 591-594.
40. J. Makhoul et al., "A Mixed- Source Model for Speech Compression and Synthesis," J. Acoustics Soc. America, vol. 64, no. 6, 1978, pp. 1577-1581.
41. Wife of Bernie Cosell who is mentioned elsewhere in this issue by C. Partridge, "Data Networking @ BBN."
42. J. Makhoul, "Methods for Nonlinear Spectral Distortion of Speech Signals," Proc. IEEE Int'l Conf. Acoustics, Speech, Signal Processing, IEEE Press, 1976, pp. 87-90.
43. S. Roucos and A.M. Wilgus, "High Quality Time-Scale Modification for Speech," Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, IEEE Press, 1985, pp. 493-496.
44. M. Berouti, R. Schwartz, and J. Makhoul, "Enhancement of Speech Corrupted by Acoustic Noise," Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, IEEE Press, 1979, pp. 208-211.
45. V. Viswanathan et al., "Multisensor Speech Input for Enhanced Immunity to Acoustic Background Noise," Proc. IEEE Int'l Conf. Acoustics, Speech, Signal Processing, IEEE Press, 1984, pp. 18A.3.1-18A.3.4.
46. V. Viswanathan and C. Henry, "Noise-Immune Multisensor Speech Input: Formal Subjective Testing in Operational Conditions," Proc. IEEE Int'l Conf. Acoustics, Speech, Signal Processing, IEEE Press, 1989, pp. 373-376.
47. R. Schwartz, S. Roucos, and M. Berouti, "The Application of Probability Density Estimation to Text-Independent Speaker Identification," Proc. IEEE Int'l Conf. Acoustics, Speech, Signal Processing, IEEE Press, 1982, pp. 1649-1652.
48. H. Gish et al., "Methods and Experiments for Text-Independent Speaker Recognition over Telephone Channels," Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, IEEE Press, 1986, pp. 865-868.
49. R. Schwartz et al., Speaker Verification IR&D Final Report, tech. report 6661, BBN, Nov. 1987.
50. R. Schwartz et al., "Improved Hidden Markov Modeling of Phonemes for Continuous Speech Recognition," Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, IEEE Press, 1984, pp. 35.3.1-35.3.4.
51. M. Krasner et al., "Development of a Vidvox Speech Communication Aid," Proc. Applied Voice Input/Output Soc. (AVIOS) Conf., AVOIS Soc., 1984, pp. 15-28.
52. A.L. Norberg and J.E. O'Neill, Transforming Computer Technology: Information Processing for the Pentagon, pp. 275-282.
53. Jim Baker used HMMs in his PhD thesis at CMU so HMMs were known to CMU but they did not bid their use in their Strategic Computing bid.
54. J. Makhoul and R. Schwartz, "Ignorance Modeling," Invariance and Variability in Speech Processes, J.S. Perkell and D.H. Klatt, eds., Lawrence Erlbaum Associates, 1986, pp. 344-345.
55. Y. Chow et al., "BYBLOS: The BBN Continuous Speech Recognition System," Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, IEEE Press, 1987, pp. 89-92.
56. The cepstrum is the inverse Fourier transform of the logarithm of the spectrum. The word cepstrum is an anagram of "spectrum" first introduced in a paper by B.P. Bogert M.J.R. Healy, and J.W. Tukey, "The Frequency Analysis of Time Series for Echoes: Cepstrum Pseudo-Autocovariance Cross-Cepstrum and Saphe Cracking," Proc. Symp. Time Series Analysis, John Wiley & Sons 1963 pp. 209-243. The first dozen coefficients or so of the cepstrum have been shown to be effective parameters for speech recognition purposes (see S.B. Davis and P. Mermelstein, "Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences," IEEE Trans. Acoustics Speech and Signal Processing, vol. 28 1980 pp. 357-366).
57. R.M. Schwartz et al., "Context-Dependent Modeling for Acoutic-Phonetic Recognition of Continuous Speech," Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, IEEE Press, 1985, pp. 1205-1208.
58. The visitors included Allen Sears (ARPA Program Mangager) Ned Neuberg (Department of Defense), and David Pallett (National Institute of Standards and Technology).
59. It was after this demonstration that Ned Neuberg famously told me that this was the first time that he could say that if you poured money into speech recognition research it would be well worth it.
60. O. Kimball et al., "Efficient Implementation of Continuous Speech Recognition on a Large Scale Parallel Processor," Proc. IEEE Int'l Conf. Acoustic, Speech, and Signal Processing, IEEE Press, 1987, pp. 852-855.
61. R. Schwartz and Y.L. Chow, "The N-Best Algorithm: An Efficient and Exact Procedure for Finding the N Most Likely Sentence Hypotheses," Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, IEEE Press, 1990, pp. 81-84.
62. S. Austin et al., "Spoken Language System Using Commercial Hardware," Proc. Speech and Natural Language Workshop, Morgan Kaufmann, 1990, pp. 72-77.
63. R. Schwartz and L. Nguyen, Single-Tree Method for Grammar-Directed, Very Large Vocabulary Speech Recognizer, US Patent 5,621,859, Patent and Trademark Office, 1997.
64. L. Nguyen and R. Schwartz, "Single-Tree Method for Grammar-Directed Search," Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, IEEE Press, 1999, pp. 613-616.
65. Curiously the description of these events in M. Mitchell Waldrop The Dream Machine—J.C.R. Licklider and the Revolution That Made Computing Personal, p. 238 makes no mention of BBN's demonstration of an HMM-based system.
66. J.R. Rohlicek et al., "Continuous Hidden Markov Modeling for Speaker-Independent Word Spotting," Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, IEEE Press, 1989, pp. 627-630.
67. S. Miller et al., "A Fully Statistical Approach to Natural Language Interfaces," Proc. Assoc. Computational Linguistics, ACL, 1996, pp. 55-61.
68. This arrangement started in 1987 when professor John Proakis asked me if I would teach a course in speech processing at Northeastern. I later became adjunct professor and was able to supervise students doing their theses at BBN. This arrangement continues today.
69. D. Dahl et al., "Expanding the Scope of the ATIS Task: The ATIS-3 Corpus," Proc. ARPA Human Language Technology Workshop, Morgan Kaufmann, 1994, pp. 3-8.
70. M. Bates et al., "Advances in BBN's Spoken Language System," Proc. ARPA Spoken Language Technology Workshop, Morgan Kaufmann, 1994, pp. 43-47.
71. J. Makhoul, A. El-Jaroudi, and R. Schwartz, "Partitioning Capabilities of Two-Layer Neural Networks," IEEE Trans. Signal Processing, vol. 39, no. 6, 1991, pp. 1435-1440.
72. H. Gish, "A Probabilistic Approach to the Understanding and Training of Neural Network Classifiers," Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, IEEE Press, 1990, pp. 1361-1364.
73. G. Zavaliagkos et al., "A Hybrid Continuous Speech Recognition System Using Segmental Neural Nets with Hidden Markov Models," Pattern Recognition and Artificial Intelligence, vol. 7, no. 4, 1993, pp. 949-963.
74. J. Makhoul et al., "A Script-Independent Methodology for Optical Character Recognition," Pattern Recognition, vol. 31, Sept. 1998, pp. 1285-1294.
1. L.E. Baum and J.A. Eagon, "An Inequality with Applications to Statistical Estimation for Probabilistic Functions of Markov Processes and to a Model of Ecology," Am. Mathematical Soc. Bull., vol. 73, Am. Mathematical Soc., 1967, pp. 360-363.
2. L.R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proc. IEEE, vol. 77, no. 2, 1989, pp. 257-285.
3. For a layperson's introduction to HMMs used in the speech recognition application see R. Comerford J. Makhoul, and R. Schwartz, "The Voice of the Computer is Heard in the Land (and It Listens Too!)," IEEE Spectrum, vol. 34 no. 12 1997 pp. 39-47.
4. J.K. Baker, "The Dragon System—An Overview," IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 23, Jan. 1975, pp. 24-29.
5. Baker had learned about HMMs while interning at IDA. After his PhD Baker went to work at IBM. Later Baker and his wife Janet founded Dragon Systems.
6. F. Jelinek, L.R. Bahl, and R.L. Mercer, "Design of a Linguistic Statistical Decoder for the Recognition of Continuous Speech," IEEE Trans. Information Theory, vol. 21, Mar. 1975, pp. 250-256.
7. L.R. Bahl, F. Jelinek, and R.L. Mercer, "A Maximum Likelihood Approach to Continuous Speech Recognition," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. PAMI-5, no. 2, 1983, pp. 179-190.

Index Terms:
speech processing, speech coding and compression, speech recognition, word spotting, speech modification, speaker recognition, hidden Markov models, speech understanding
Citation:
John Makhoul, "Speech Processing at BBN," IEEE Annals of the History of Computing, vol. 28, no. 1, pp. 32-45, Jan.-March 2006, doi:10.1109/MAHC.2006.19
Usage of this product signifies your acceptance of the Terms of Use.