This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Statistical Pattern Recognition: A Review
January 2000 (vol. 22 no. 1)
pp. 4-37

Abstract—The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have been receiving increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition, require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.

[1] H.M. Abbas and M.M. Fahmy, “Neural Networks for Maximum Likelihood Clustering,” Signal Processing, vol. 36, no. 1, pp. 111-126, 1994.
[2] H. Akaike, A New Look at the Statistical Model Identification IEEE Trans. Automatic Control, vol. 19, no. 6, pp. 716-723, 1974.
[3] S. Amari, T.P. Chen, and A. Cichocki, “Stability Analysis of Learning Algorithms for Blind Source Separation,” Neural Networks, vol. 10, no. 8, pp. 1,345-1,351, 1997.
[4] J.A. Anderson, “Logistic Discrimination,” Handbook of Statistics. P. R. Krishnaiah and L.N. Kanal, eds., vol. 2, pp. 169-191, Amsterdam: North Holland, 1982.
[5] J. Anderson, A. Pellionisz, and E. Rosenfeld, Neurocomputing 2: Directions for Research. Cambridge Mass.: MIT Press, 1990.
[6] A. Antos, L. Devroye, and L. Gyorfi, “Lower Bounds for Bayes Error Estimation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 7, pp. 643-645, July 1999.
[7] H. Avi-Itzhak and T. Diep, “Arbitrarily Tight Upper and Lower Bounds on the Bayesian Probability of Error,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 1, pp. 89-91, Jan. 1996.
[8] E. Backer, Computer-Assisted Reasoning in Cluster Analysis. Prentice Hall, 1995.
[9] R. Bajscy and S. Kovacic, "Multiresolution Elastic Matching," Computer Vision, Graphics&Image Processing, vol. 46, no. 1, pp. 1-21, 1989.
[10] A. Barron, J. Rissanen, and B. Yu, “The Minimum Description Length Principle in Coding and Modeling,” IEEE Trans. Information Theory, vol. 44, no. 6, pp. 2,743-2,760, Oct. 1998.
[11] A.J. Bell and T.J. Sejnowski, An Information-Maximization Approach to Blind Separation and Blind Deconvolution Neural Computation, vol. 7, no. 6, June 1995.
[12] Y. Bengio, “Markovian Models for Sequential Data,” Neural Computing Surveys, vol. 2, pp. 129-162, 1999. http://www.icsi.berkeley.edu/~jagotaNCS.
[13] K.P. Bennett, “Semi-Supervised Support Vector Machines,” Proc. Neural Information Processing Systems, Denver, 1998.
[14] J. Bernardo and A. Smith, Bayesian Theory. John Wiley&Sons, 1994.
[15] J. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. New York: Plenum, 1981.
[16] Fuzzy Models for Pattern Recognition: Methods that Search for Structures in Data. J.C. Bezdek and S.K. Pal, eds., IEEE CS Press, 1992.
[17] S.K. Bhatia and J.S. Deogun, “Conceptual Clustering in Information Retrieval,” IEEE Trans. Systems, Man, and Cybernetics, vol. 28, no. 3, pp. 427-436, 1998.
[18] C.M. Bishop, Neural Networks for Pattern Recognition. Clarendon Press, 1995.
[19] A. Blum and P. Langley, Selection of Relevant Features and Examples in Machine Learning Artificial Intelligence, vol. 97, nos. 1-2, pp. 245-271, 1997.
[20] I. Borg and P. Groenen, Modern Multidimensional Scaling, Berlin: Springer-Verlag, 1997.
[21] L. Breiman, “Bagging Predictors,” Machine Learning, vol. 24, pp. 123-140, 1996.
[22] L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees. Wadsworth, Calif., 1984.
[23] C.J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 1-47, 1998.
[24] J. Cardoso, “Blind Signal Separation: Statistical Principles,” Proc. IEEE, vol. 86, pp. 2,009-2,025, 1998.
[25] C. Carpineto and G. Romano, “A Lattice Conceptual Clustering System and Its Application to Browsing Retrieval,” Machine Learning, vol. 24, no. 2, pp. 95-122, 1996.
[26] G. Castellano, A.M. Fanelli, and M. Pelillo, “An Iterative Pruning Algorithm for Feedforward Neural Networks,” IEEE Trans. Neural Networks, vol. 8, no. 3, pp. 519-531, 1997.
[27] C. Chatterjee and V.P. Roychowdhury, “On Self-Organizing Algorithms and Networks for Class-Separability Features,” IEEE Trans. Neural Networks, vol. 8, no. 3, pp. 663-678, 1997.
[28] B. Cheng and D.M. Titterington, “Neural Networks: A Review from Statistical Perspective,” Statistical Science, vol. 9, no. 1, pp. 2-54, 1994.
[29] H. Chernoff, “The Use of Faces to Represent Points in k-Dimensional Space Graphically,” J. Am. Statistical Assoc., vol. 68, pp. 361-368, June 1973.
[30] P. Chou,“Optimal partitioning for classification and regression trees,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, no. 4, pp. 340-354, Apr. 1991.
[31] P. Comon, “Independent Component Analysis, a New Concept?” Signal Processing, vol. 36, no. 3, 1994.
[32] P.C. Cosman, K.L. Oehler, E.A. Riskin, and R.M. Gray, “Using Vector Quantization for Image Processing,” Proc. IEEE, vol. 81, pp. 1,326-1,341, Sept. 1993.
[33] T.M. Cover, “Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition,” IEEE Trans. Electronic Computers, vol. 14, pp. 326-334, June 1965.
[34] T.M. Cover, “The Best Two Independent Measurements are not the Two Best,” IEEE Trans. Systems, Man, and Cybernetics, vol. 4, pp. 116-117, 1974.
[35] T.M. Cover and J.M. Van Campenhout, “On the Possible Orderings in the Measurement Selection Problem,” IEEE Trans. Systems, Man, and Cybernetics, vol. 7, no. 9, pp. 657-661, Sept. 1977.
[36] A. Dempster, N. Laird, and D. Rubin, “Maximum Likelihood from Incomplete Data via the (EM) Algorithm,” J. Royal Statistical Soc., vol. 39, pp. 1-38, 1977.
[37] H. Demuth and H.M. Beale, Neural Network Toolbox for Use with Matlab. version 3, Mathworks, Natick, Mass., 1998.
[38] D. De Ridder and R.P.W. Duin, “Sammon's Mapping Using Neural Networks: Comparison,” Pattern Recognition Letters, vol. 18, no. 11-13, pp. 1,307-1,316, 1997.
[39] P.A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach. London: Prentice Hall, 1982.
[40] L. Devroye, “Automatic Pattern Recognition: A Study of the Probability of Error,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 10, no. 4, pp. 530-543, 1988.
[41] L. Devroye, L. Gyorfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition. Berlin: Springer-Verlag, 1996.
[42] A. Djouadi and E. Bouktache, “A Fast Algorithm for the Nearest-Neighbor Classifier,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 3, pp. 277-282, 1997.
[43] H. Drucker, C. Cortes, L.D. Jackel, Y. Lecun, and V. Vapnik, “Boosting and Other Ensemble Methods,” Neural Computation, vol. 6, no. 6, pp. 1,289-1,301, 1994.
[44] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis, New York: John Wiley&Sons, 1973.
[45] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification and Scene Analysis. second ed., New York: John Wiley&Sons, 2000.
[46] R.P.W. Duin, “A Note on Comparing Classifiers,” Pattern Recognition Letters, vol. 17, no. 5, pp. 529-536, 1996.
[47] R.P.W. Duin, D. De Ridder, and D.M.J. Tax, “Experiments with a Featureless Approach to Pattern Recognition,” Pattern Recognition Letters, vol. 18, nos. 11-13, pp. 1,159-1,166, 1997.
[48] B. Efron, The Jackknife, the Bootstrap and Other Resampling Plans. Philadelphia: SIAM, 1982.
[49] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “Knowledge Discovery and Data Mining: Towards a Unifying Framework,” Proc. Second Int'l Conf. Knowledge Discovery and Data Mining, Aug. 1999.
[50] F. Ferri, P. Pudil, M. Hatef, and J. Kittler, “Comparative Study of Techniques for Large Scale Feature Selection,” Pattern Recognition in Practice IV, E. Gelsema and L. Kanal, eds., pp. 403-413, 1994.
[51] M. Figueiredo, J. Leitão, and A.K. Jain, On Fitting Mixture Models Energy Minimization Methods in Computer Vision and Pattern Recognition, E. Hancock and M. Pellilo, eds., pp. 54-69, Springer-Verlag, 1999.
[52] Y. Freund and R. Schapire, “Experiments with a New Boosting Algorithm,” Proc. 13th Int'l Conf. Machine Learning, pp. 148-156, 1996.
[53] J.H. Friedman, “Exploratory Projection Pursuit,” J. Am. Statistical Assoc., vol. 82, pp. 249-266, 1987.
[54] J.H. Friedman, “Regularized Discriminant Analysis,” J. Am. Statistical Assoc., vol. 84, pp. 165-175, 1989.
[55] H. Frigui and R. Krishnapuram, “A Robust Competitive Clustering Algorithm with Applications in Computer Visions,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 5, pp. 450- 465, May 1999.
[56] K.S. Fu, Syntactic Pattern Recognition and Applications. Englewood Cliffs, N.J.: Prentice-Hall, 1982.
[57] K.S. Fu, “A Step Towards Unification of Syntactic and Statistical Pattern Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 5, no. 2, pp. 200-205, Mar. 1983.
[58] K. Fukunaga, Introduction to Statistical Pattern Recognition, second edition. Academic Press, 1990.
[59] K. Fukunaga and R.R. Hayes, "Effects of Sample Size in Classifier Design," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 11, no. 8, pp. 873-885, Aug. 1989.
[60] K. Fukunaga and R.R. Hayes, “The Reduced Parzen Classifier,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 11, no. 4, pp. 423-425, Apr. 1989.
[61] K. Fukunaga and D.M. Hummels, “Leave-One-Out Procedures for Nonparametric Error Estimates,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 11, no. 4, pp. 421-423, Apr. 1989.
[62] K. Fukushima, S. Miyake, and T. Ito, “Neocognitron: A Neural Network Model for a Mechanism of Visual Pattern Recognition,” IEEE Trans. Systems, Man, and Cybernetics, vol. 13, pp. 826-834, 1983.
[63] S. Gelfand,C. Ravishankar,, and E. Delp,“An iterative growing and pruning algorithm for classification tree design,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, no. 2, pp. 163-174, Feb. 1991.
[64] S. Geman, E. Bienenstock, and R. Doursat, ”Neural Networks and the Bias/Variance Dilemma,” Neural Computation, vol. 4, pp. 1–58, 1992.
[65] C. Glymour, D. Madigan, D. Pregibon, and P. Smyth, “Statistical Themes and Lessons for Data Mining,” Data Mining and Knowledge Discovery, vol. 1, no. 1, pp. 11-28, 1997.
[66] M. Golfarelli, D. Maio, and D. Maltoni, “On the Error-Reject Trade-Off in Biometric Verification System,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 786-796, July 1997.
[67] R.M. Gray, "Vector Quantization," IEEE Acoustics, Speech and Signal Processing, pp. 4-29, Apr. 1984.
[68] R.M. Gray and R.A. Olshen, “Vector Quantization and Density Estimation,” Proc. Int'l Conf. Compression and Complexity of Sequences, 1997. .
[69] U. Grenander, General Pattern Theory. Oxford Univ. Press, 1993.
[70] D.J. Hand, “Recent Advances in Error Rate Estimation,” Pattern Recognition Letters, vol. 4, no. 5, pp. 335-346, 1986.
[71] M.H. Hansen and B. Yu, “Model Selection and the Principle of Minimum Description Length,” technical report, Lucent Bell Lab, Murray Hill, N.J., 1998.
[72] M.A. Hearst, “Support Vector Machines,” IEEE Intelligent Systems, pp. 18-28, July/Aug. 1998.
[73] S. Haykin, Neural Network—A Comprehensive Foundation, second ed. Prentice Hall, 1999.
[74] T.K. Ho, J.J. Hull, and S.N. Srihari, “Decision Combination in Multiple Classifiers Systems,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 1, pp. 66-75, Jan. 1994.
[75] T.K. Ho, The Random Subspace Method for Constructing Decision Forests IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 832-844, Aug. 1998.
[76] J.P. Hoffbeck and D.A. Landgrebe, “Covariance Matrix Estimation and Classification with Limited Training Data,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 7, pp. 763-767, July 1996.
[77] A. Hyvarinen, “Survey on Independent Component Analysis,” Neural Computing Surveys, vol. 2, pp. 94-128, 1999. http://www-isl.stanford.edu/~gray/compression.htmlhttp:/ /www.icsi.berkeley.edu/ ~jagotaNCS.
[78] A. Hyvarinen and E. Oja, “A Fast Fixed-Point Algorithm for Independent Component Analysis,” Neural Computation, vol. 9, pp. 1,483-1,492, 1997.
[79] R.A. Jacobs, M.I. Jordan, S.J. Nowlan, and G.E. Hinton, “Adaptive Mixtures of Local Experts,” Neural Computation, vol. 3, pp. 79-87, 1991.
[80] A.K. Jain and B. Chandrasekaran, “Dimensionality and Sample Size Considerations in Pattern Recognition Practice,” Handbook of Statistics. P.R. Krishnaiah and L.N. Kanal, eds., vol. 2, pp. 835-855, Amsterdam: North-Holland, 1982.
[81] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Englewood Cliffs, N.J.: Prentice Hall, 1988.
[82] A.K. Jain, R.C. Dubes, and C.-C. Chen, "Bootstrap Techniques for Error Estimation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 9, pp. 628-633, 1987.
[83] A.K. Jain, J. Mao, and K.M. Mohiuddin, "Artificial Neural Networks: A Tutorial," Computer, vol. 29, pp. 31-44, Mar. 1996.
[84] A.K. Jain, Y. Zhong, and S. Lakshmanan, Object Matching Using Deformable Templates IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 3, pp. 267-278, Mar. 1996.
[85] A. Jain and D. Zongker, Feature Selection: Evaluation, Application, and Small Sample Performance IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 2, pp. 153-158, Feb. 1997.
[86] F. Jelinek, Statistical Methods for Speech Recognition, chapter 7. Cambridge, Mass.: MIT Press, 1998.
[87] M.I. Jordan and R.A. Jacobs, “Hierarchical Mixtures of Experts and the EM Algorithm,” Neural Computation, vol. 6, pp. 181-214, 1994.
[88] D. Judd, P. McKinley, and A.K. Jain, “Large-Scale Parallel Data Clustering,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 871-876, Aug. 1998.
[89] L.N. Kanal, “Patterns in Pattern Recognition: 1968-1974,” IEEE Trans. Information Theory, vol. 20, no. 6, pp. 697-722, 1974.
[90] J. Kittler, M. Hatef, R. Duin, and J. Matas, “On Combining Classifiers,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226-239, Mar. 1998.
[91] R.M. Kleinberg, “Stochastic Discrimination,” Annals of Math. and Artificial Intelligence, vol. 1, pp. 207-239, 1990.
[92] T. Kohonen, Self-Organizing Maps. Berlin: Springer-Verlag, 1995.
[93] A. Krogh and J. Vedelsby, “Neural Network Ensembles, Cross Validation, and Active Learning,” Advances in Neural Information Processing Systems, G. Tesauro, D. Touretsky, and T. Leen, eds., vol. 7, Cambridge, Mass.: MIT Press, 1995.
[94] L. Lam, C. Suen, “Optimal Combination of Pattern Classifiers,” Pattern Recognition Letters, vol. 16, pp. 945-954, 1995.
[95] Y. Le Cun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, and L.D. Jackel, “Backpropagation Applied to Handwritten Zip Code Recognition,” Neural Computation, vol. 1, pp. 541-551, 1989.
[96] T.W. Lee, Independent Component Analysis. Dordrech: Kluwer Academic Publishers, 1998.
[97] C. Lee and D.A. Landgrebe, “Feature Extraction Based on Decision Boundaries,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, no. 4, pp. 388-400, 1993.
[98] B. Lerner, H. Guterman, M. Aladjem, and I. Dinstein, “A Comparative Study of Neural Network Based Feature Extraction Paradigms,” Pattern Recognition Letters vol. 20, no. 1, pp. 7-14, 1999
[99] D.R. Lovell, C.R. Dance, M. Niranjan, R.W. Prager, K.J. Dalton, and R. Derom, “Feature Selection Using Expected Attainable Discrimination,” Pattern Recognition Letters, vol. 19, nos. 5-6, pp. 393-402, 1998.
[100] D. Lowe,A. R. Webb,“Optimized Feature Extraction and the Bayes Decision in Feed-Forward Classifier Networks,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 13, pp. 355-364, April 1991.
[101] D.J.C. MacKay, “The Evidence Framework Applied to Classification Networks,” Neural Computation, vol. 4, no. 5, pp. 720-736, 1992.
[102] J.C. Mao and A.K. Jain, “Artificial Neural Networks for Feature Extraction and Multivariate Data Projection,” IEEE Trans. Neural Networks, vol. 6, no. 2, pp. 296-317, 1995.
[103] J. Mao, K. Mohiuddin, and A.K. Jain, "Parsimonious Network Design and Feature Selection Through Node Pruning," Proc. 12th ICPR,Jerusalem, pp. 622-624, 1994.
[104] J.C. Mao and K.M. Mohiuddin, “Improving OCR Performance Using Character Degradation Models and Boosting Algorithm,” Pattern Recognition Letters, vol. 18, no. 11-13, pp. 1,415-1,419, 1997.
[105] G. McLachlan, Discriminant Analysis and Statistical Pattern Recognition. New York: John Wiley&Sons, 1992.
[106] M. Mehta, J. Rissanen, and R. Agrawal, “MDL-Based Decision Tree Pruning,” Proc. First Int'l Conf. Knowledge Discovery in Databases and Data Mining, Montreal, Canada Aug. 1995
[107] C.E. Metz, “Basic Principles of ROC Analysis,” Seminars in Nuclear Medicine, vol. VIII, no. 4, pp. 283-298, 1978.
[108] R.S. Michalski and R.E. Stepp, “Automated Construction of Classifications: Conceptual Clustering versus Numerical Taxonomy,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 5, pp. 396-410, 1983.
[109] D. Michie, D.J. Spiegelhalter, and C.C. Taylor, Machine Learning, Neural and Statistical Classification. New York: Ellis Horwood, 1994.
[110] S.K. Mishra and V.V. Raghavan, “An Empirical Study of the Performance of Heuristic Methods for Clustering,” Pattern Recognition in Practice. E.S. Gelsema and L.N. Kanal, eds., North-Holland, pp. 425-436, 1994.
[111] G. Nagy, “State of the Art in Pattern Recognition,” Proc. IEEE, vol. 56, pp. 836-862, 1968.
[112] G. Nagy, “Candide's Practical Principles of Experimental Pattern Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 5, no. 2, pp. 199-200, 1983.
[113] R. Neal, Bayesian Learning for Neural Networks. New York: Springer Verlag, 1996.
[114] H. Niemann, “Linear and Nonlinear Mappings of Patterns,” Pattern Recognition, vol. 12, pp. 83-87, 1980.
[115] K.L. Oehler and R.M. Gray, Combining Image Compression and Classification Using Vector Quantization IEEE Trans. Pattern Analysis Machine Intelligence, vol. 17, no. 5, pp. 461-473, May 1995.
[116] E. Oja, Subspace Methods of Pattern Recognition, Letchworth, Hertfordshire, England: Research Studies Press, 1983.
[117] E. Oja, “Principal Components, Minor Components, and Linear Neural Networks,” Neural Networks, vol. 5, no. 6, pp. 927-936. 1992.
[118] E. Oja, “The Nonlinear PCA Learning Rule in Independent Component Analysis,” Neurocomputing, vol. 17, no.1, pp. 25-45, 1997.
[119] E. Osuna, R. Freund, and F. Girosi, An Improved Training Algorithm for Support Vector Machines Proc. IEEE Workshop Neural Networks and Signal Processing, Sept. 1997.
[120] Y. Park and J. Sklanski, “Automated Design of Linear Tree Classifiers,” Pattern Recognition, vol. 23, no. 12, pp. 1,393-1,412, 1990.
[121] T. Pavlidis, Structural Pattern Recognition. New York: Springer-Verlag, 1977.
[122] L.I. Perlovsky, “Conundrum of Combinatorial Complexity,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 6, pp. 666-670, 1998.
[123] M.P. Perrone and L.N. Cooper, “When Networks Disagree: Ensemble Methods for Hybrid Neural Networks,” Neural Networks for Speech and Image Processing. R.J. Mammone, ed., Chapman-Hall, 1993.
[124] J. Platt, "Fast Training of SVMs Using Sequential Minimal Optimization," to be published in Advances in Kernel Methods—Support Vector Machine Learning, B. Schölkpf, C. Burges, and A. Smola, eds., MIT Press, Cambridge, Mass., 1998.
[125] R. Picard, Affective Computing. MIT Press, 1997.
[126] P. Pudil, J. Novovicová, and J. Kittler, "Floating search methods in feature selection," Pattern Recognition Letters, vol. 15, pp. 1,119-1,125, 1994.
[127] P. Pudil, J. Novovicova, and J. Kittler, “Feature Selection Based on the Approximation of Class Densities by Finite Mixtures of the Special Type,” Pattern Recognition, vol. 28, no. 9, pp. 1,389-1,398, 1995.
[128] J.R. Quinlan,“Simplifying decision trees,” Int’l J. Man-Machine Studies, vol. 27, pp. 221-234, 1987.
[129] J.R. Quinlan, C4.5: Programs for Machine Learning,San Mateo, Calif.: Morgan Kaufman, 1992.
[130] L.R. Rabiner, “Tutorial on Hidden Markov Model and Selected Applications in Speech Recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257-285, 1989.
[131] S.J. Raudys and V. Pikelis, “On Dimensionality, Sample Size, Classification Error, and Complexity of Classification Algorithms in Pattern Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 2, pp. 243-251, 1980.
[132] S.J. Raudys and A.K. Jain, "Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, pp. 252-264, 1991.
[133] S. Raudys, “Evolution and Generalization of a Single Neurone: I. Single-layer Perception as Seven Statistical Classifiers,” Neural Networks, vol. 11, no. 2, pp. 283-296, 1998.
[134] S. Raudys and R.P.W. Duin, “Expected Classification Error of the Fisher Linear Classifier with Pseudoinverse Covariance Matrix,” Pattern Recognition Letters, vol. 19, nos. 5-6, pp. 385-392, 1998.
[135] S. Richardson and P. Green, “On Bayesian Analysis of Mixtures with Unknown Number of Components,” J. Royal Statistical Soc. (B), vol. 59, pp. 731-792, 1997.
[136] B. Ripley, “Statistical Aspects of Neural Networks,” Networks on Chaos: Statistical and Probabilistic Aspects. U. Bornndorff-Nielsen, J. Jensen, and W. Kendal, eds., Chapman and Hall, 1993.
[137] B. Ripley, Pattern Recognition and Neural Networks. Cambridge, Mass.: Cambridge Univ. Press, 1996.
[138] J. Rissanen, Stochastic Complexity in Statistical Inquiry. World Scientific Series in Computer Science, vol. 15, 1989.
[139] K. Rose, “Deterministic Annealing for Clustering, Compression, Classification, Regression and Related Optimization Problems,” Proc. IEEE, vol. 86, pp. 2,210-2,239, 1998.
[140] P.E. Ross, “Flash of Genius,” Forbes, pp. 98-104, Nov. 1998.
[141] J.W. SammonJr., “A Nonlinear Mapping for Data Structure Analysis,” IEEE Trans. Computer, vol. l8, pp. 401-409, 1969.
[142] R.E. Schapire, “The Strength of Weak Learnability,” Machine Learning, vol. 5, no. 2, pp. 197-227, 1990.
[143] R.E. Schapire, Y. Freund, P. Bartlett, and W.S. Lee, “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods,” Annals of Statistics, 1999.
[144] B. Schölkopf, “Support Vector Learning,” Ph.D. thesis, Technische Universität, Berlin, 1997.
[145] B. Schölkopf, A. Smola, and K.-R. Müller, "Nonlinear Component Analysis as a Kernel Eigenvalue Problem," Neural Computation, Vol. 10, 1998, pp. 1299-1319.
[146] B. Scholkopf, K. Sung, C.J.C. Burges, and F. Girosi, Comparing Support Vector Machines with Gaussian Kernels to Radial Basis Function Classifiers IEEE Trans. Signal Processing, vol. 45, no. 11, pp. 2758-2765, 1999.
[147] J. Shürmann, Pattern Classification: A Unified View of Statistical and Neural Approaches. Wiley Interscience, 1996.
[148] S. Sclove, “Application of the Conditional Population Mixture Model to Image Segmentation,” IEEE Trans. Pattern Recognition and Machine Intelligence, vol. 5, pp. 428-433, 1983.
[149] I.K. Sethi and G.P.R. Sarvarayudu, “Hierarchical Classifier Design Using Mutual Information,” IEEE Trans. Pattern Recognition and Machine Intelligence, vol. 1, pp. 194-201, Apr. 1979.
[150] R. Setiono and H. Liu, “Neural-Network Feature Selector,” IEEE Trans. Neural Networks, vol. 8, no. 3, pp. 654-662, 1997.
[151] W. Siedlecki and J. Sklanski, "A Note on Genetic Algorithms for Large Scale Feature Selection," Pattern Recognition Letters, vol. 10, pp. 335-347, 1989.
[152] P.Y. Simard, Y. LeCun, and J. Denker, "Efficient Pattern Recognition Using a New Transformation Distance," Advances in Neural Information Processing Systems, pp. 50-58.San Mateo, Calif.: Morgan Kaufman, 1993.
[153] P. Simard, B. Victorri, Y. LeCun, and J. Denker, “Tangent Prop—A Formalism for Specifying Selected Invariances in an Adaptive Network,” Advances in Neural Information Processing Systems, 4, J.E. Moody, S.J. Hanson, and R.P. Lippmann, eds., pp. 651-655, California: Morgan Kaufmann, 1992.
[154] P. Somol, P. Pudil, J. Novovicova, and P. Paclik, “Adaptive Floating Search Methods in Feature Selection,” Pattern Recognition Letters, vol. 20, nos. 11,12, 13, pp. 1,157-1,163, 1999.
[155] D. Titterington, A. Smith, and U. Makov, Statistical Analysis of Finite Mixture Distributions. Chichester, U.K.: John Wiley&Sons, 1985.
[156] V. Tresp and M. Taniguchi, “Combining Estimators Using Non-Constant Weighting Functions,” Advances in Neural Information Processing Systems, G. Tesauro, D.S. Touretzky, and T.K. Leen, eds., vol. 7, Cambridge, Mass.: MIT Press, 1995.
[157] G.V. Trunk, “A Problem of Dimensionality: A Simple Example,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 1, no. 3, pp. 306-307, July 1979.
[158] K. Tumer and J. Ghosh, “Analysis of Decision Boundaries in Linearly Combined Neural Classifiers,” Pattern Recognition, vol. 29, pp. 341-348, 1996.
[159] S. Vaithyanathan and B. Dom, “Model Selection in Unsupervised Learning with Applications to Document Clustering,” Proc. Sixth Int'l Conf. Machine Learning, pp. 433-443, June 1999.
[160] M. van Breukelen, R.P.W. Duin, D.M.J. Tax, and J.E. den Hartog, “Handwritten Digit Recognition by Combined Classifiers,” Kybernetika, vol. 34, no. 4, pp. 381-386, 1998.
[161] V.N. Vapnik, Estimation of Dependences Based on Empirical Data, Berlin: Springer-Verlag, 1982.
[162] V.N. Vapnik, Statistical Learning Theory, John Wiley&Sons, 1998.
[163] S. Watanabe, Pattern Recognition: Human and Mechanical. New York: John Wiley&Sons, 1985.
[164] Frontiers of Pattern Recognition. S. Watanabe, ed., New York: Academic Press, 1972.
[165] A.R. Webb, “Multidimensional Scaling by Iterative Majorization Using Radial Basis Functions,” Pattern Recognition, vol. 28, no. 5, pp. 753-759, 1995.
[166] S. Weiss and C. Kulikowski, Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems, Morgan Kaufmann, 1991.
[167] M. Whindham and A. Cutler, “Information Ratios for Validating Mixture Analysis,” J. Am. Statistical Assoc., vol. 87, pp. 1,188-1,192, 1992.
[168] D. Wolpert, "Stacked Generalization," Neural Networks, Vol. 5, 1992, pp. 241-259.
[169] A.K.C. Wong and D.C.C. Wang, “DECA: A Discrete-Valued Data Clustering Algorithm,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 1, pp. 342-349, 1979.
[170] K. Woods, W.P. Kegelmeyer, and K.W. Bowyer, "Combination of Multiple Classifiers Using Local Accuracy Estimates," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 4, pp. 405-410, Apr. 1997.
[171] Q. Xie, C.A. Laszlo, and R.K. Ward, “Vector Quantization Technique for Nonparametric Classifier Design,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, no. 12, pp. 1326-1330, Dec. 1993.
[172] L. Xu, A. Krzyzak, and C.Y. Suen, “Methods of Combining Multiple Classifiers and Their Applications to Handwriting Recognition,” IEEE Trans. Systems, Man, and Cybernetics, vol. 22, no. 3, pp. 418-435, 1992.
[173] G.T. Toussaint, “The Use of Context in Pattern Recognition,” Pattern Recognition, vol, 10, no. 3, pp. 189-204, 1978.
[174] K. Mohiuddin and J. Mao, “Optical Character Recognition,” Wiley Encyclopedia of Electrical and Electronic Engineering. J.G. Webster, ed., vol. 15, pp. 226-236, John Wiley and Sons, Inc., 1999.
[175] R.M. Haralick, “Decision Making in Context,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 5, no. 4, pp. 417-418, Mar. 1983.
[176] S. Chakrabarti, B. Dom, and P. Indyk, "Enhanced Hypertext Classification Using Hyper-links," ACM SIGMOD Int'l Conf. Management of Data, ACM Press, New York, 1998, pp. 307-318.

Index Terms:
Statistical pattern recognition, classification, clustering, feature extraction, feature selection, error estimation, classifier combination, neural networks.
Citation:
Anil K. Jain, Robert P.W. Duin, Jianchang Mao, "Statistical Pattern Recognition: A Review," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4-37, Jan. 2000, doi:10.1109/34.824819
Usage of this product signifies your acceptance of the Terms of Use.