The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - February (1979 vol.1)
pp: 164-172
Ching Y. Suen , SENIOR MEMBER, IEEE, Department of Computer Science, Concordia University, Montreal, P.Q., Canada; Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambri
ABSTRACT
n-gram (n = 1 to 5) statistics and other properties of the English language were derived for applications in natural language understanding and text processing. They were computed from a well-known corpus composed of 1 million word samples. Similar properties were also derived from the most frequent 1000 words of three other corpuses. The positional distributions of n-grams obtained in the present study are discussed. Statistical studies on word length and trends of n-gram frequencies versus vocabulary are presented. In addition to a survey of n-gram statistics found in the literature, a collection of n-gram statistics obtained by other researchers is reviewed and compared.
CITATION
Ching Y. Suen, "n-Gram Statistics for Natural Language Understanding and Text Processing", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.1, no. 2, pp. 164-172, February 1979, doi:10.1109/TPAMI.1979.4766902
22 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool