This Article 
 Bibliographic References 
 Add to: 
On the Algorithmic Implementation of Stochastic Discrimination
May 2000 (vol. 22 no. 5)
pp. 473-490

AbstractStochastic discrimination is a general methodology for constructing classifiers appropriate for pattern recognition. It is based on combining arbitrary numbers of very weak components, which are usually generated by some pseudorandom process, and it has the property that the very complex and accurate classifiers produced in this way retain the ability, characteristic of their weak component pieces, to generalize to new data. In fact, it is often observed, in practice, that classifier performance on test sets continues to rise as more weak components are added, even after performance on training sets seems to have reached a maximum. This is predicted by the underlying theory, for even though the formal error rate on the training set may have reached a minimum, more sophisticated measures intrinsic to this method indicate that classifier performance on both training and test sets continues to improve as complexity increases. In this paper, we begin with a review of the method of stochastic discrimination as applied to pattern recognition. Through a progression of examples keyed to various theoretical issues, we discuss considerations involved with its algorithmic implementation. We then take such an algorithmic implementation and compare its performance, on a large set of standardized pattern recognition problems from the University of California Irvine, and Statlog collections, to many other techniques reported on in the literature, including boosting and bagging. In doing these studies, we compare our results to those reported in the literature by the various authors for the other methods, using the same data and study paradigms used by them. Included in this paper is an outline of the underlying mathematical theory of stochastic discrimination and a remark concerning boosting, which provides a theoretical justification for properties of that method observed in practice, including its ability to generalize.

[1] R. Berlind, “An Alternative Method of Stochastic Discrimination with Applications to Pattern Recognition,” PhD thesis, SUNY/Buffalo, New York, 1994.
[2] L. Breiman, “Bagging Predictors,” Machine Learning, vol. 24, pp. 123-140, 1996.
[3] D. Chen, “Statistical Estimates for Kleinberg's Method of Stochastic Discrimination,” PhD thesis, SUNY/Buffalo, New York, 1998.
[4] Y. Freund and R. E. Schapire, “Experiments with a New Boosting Algorithm,” Proc. 13th Int'l Conf. Machine Learning, pp. 148-156, July 1996.
[5] Y. Freund and R.E. Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting J. Computer and Systems Science, vol. 55, pp. 119-139, 1997.
[6] T.K. Ho, The Random Subspace Method for Constructing Decision Forests IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 832-844, Aug. 1998.
[7] T.K. Ho, “Random Decision Forests,” Proc. Third Int'l Conf. Document Analysis and Recognition, pp. 278-282, 1995.
[8] E.M. Kleinberg, “Stochastic Discrimination,” Annals of Math. and Artificial Intelligence, pp. 207-239, 1990.
[9] E.M. Kleinberg, “An Overtraining-Resistant Stochastic Modeling Method for Pattern Recognition,” Annals of Statistics, pp. 2,319-2,349, 1996.
[10] E.M. Kleinberg, “A Mathematically Rigorous Foundation for Supervised Learning,” Proc. First Int'l Workshop on Multiple Classifier Systems, to appear.
[11] E.M. Kleinberg, “A Note on the Mathematics Underlying Boosting,” preprint, to appear.
[12] D. Michie, D. Spiegelhalter, and C.C. Taylor, Machine Learning, Neural and Statistical Classification. Ellis Horwood, 1994.
[13] J.R. Quinlan, C4.5: Programs for Machine Learning,San Mateo, Calif.: Morgan Kaufman, 1992.
[14] V.N. Vapnik, Estimation of Dependences Based on Empirical Data. Springer-Verlag, 1982.

Index Terms:
Pattern recognition, classification algorithms, stochastic discrimination, SD.
Eugene M. Kleinberg, "On the Algorithmic Implementation of Stochastic Discrimination," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 5, pp. 473-490, May 2000, doi:10.1109/34.857004
Usage of this product signifies your acceptance of the Terms of Use.