Publication 2002 Issue No. 2 - February Abstract - On Optimal Pairwise Linear Classifiers for Normal Distributions: The Two-Dimensional Case
On Optimal Pairwise Linear Classifiers for Normal Distributions: The Two-Dimensional Case
February 2002 (vol. 24 no. 2)
pp. 274-280
 ASCII Text x Luis Rueda, B. John Oommen, "On Optimal Pairwise Linear Classifiers for Normal Distributions: The Two-Dimensional Case," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 2, pp. 274-280, February, 2002.
 BibTex x @article{ 10.1109/34.982905,author = {Luis Rueda and B. John Oommen},title = {On Optimal Pairwise Linear Classifiers for Normal Distributions: The Two-Dimensional Case},journal ={IEEE Transactions on Pattern Analysis and Machine Intelligence},volume = {24},number = {2},issn = {0162-8828},year = {2002},pages = {274-280},doi = {http://doi.ieeecomputersociety.org/10.1109/34.982905},publisher = {IEEE Computer Society},address = {Los Alamitos, CA, USA},}
 RefWorks Procite/RefMan/Endnote x TY - JOURJO - IEEE Transactions on Pattern Analysis and Machine IntelligenceTI - On Optimal Pairwise Linear Classifiers for Normal Distributions: The Two-Dimensional CaseIS - 2SN - 0162-8828SP274EP280EPD - 274-280A1 - Luis Rueda, A1 - B. John Oommen, PY - 2002KW - Pattern classificationKW - statistical pattern recognitionKW - optimal Bayesian classificationKW - linear classifiers.VL - 24JA - IEEE Transactions on Pattern Analysis and Machine IntelligenceER -

Abstract—Optimal Bayesian linear classifiers have been studied in the literature for many decades. In this paper, we demonstrate that all the known results consider only the scenario when the quadratic polynomial has coincident roots. Indeed, we present a complete analysis of the case when the optimal classifier between two normally distributed classes is pairwise and linear. To the best of our knowledge, this is a pioneering work for the use of such classifiers in any area of statistical Pattern Recognition (PR). We shall focus on some special cases of the normal distribution with nonequal covariance matrices. We determine the conditions that the mean vectors and covariance matrices have to satisfy in order to obtain the optimal pairwise linear classifier. As opposed to the state of the art, in all the cases discussed here, the linear classifier is given by a pair of straight lines, which is a particular case of the general equation of second degree. One of these cases is when we have two overlapping classes with equal means, which resolves the general case of the Minsky's paradox for the perceptron. We have also provided some empirical results, using synthetic data for the Minsky's paradox case, and demonstrated that the linear classifier achieves very good performance. Finally, we have tested our approach on real life data obtained from the UCI machine learning repository. The empirical results that we obtained show the superiority of our scheme over the traditional Fisher's discriminant classifier.

[1] R. Duda, P. Hart, and D. Stork, Pattern Classification. New York: John Wiley&Sons, 2001.
[2] W. Krzanowski, P. Jonathan, W. McCarthy, and M. Thomas, “Discriminant Analysis with Singular Covariance Matrices: Methods and Applications to Spectroscopic Data,” Applied Statistics, vol. 44, pp. 101-115, 1995.
[3] A. Webb, Statistical Pattern Recognition. New York: Oxford Univ. Press, 1999.
[4] W. Malina, “On an Extended Fisher Criterion for Feature Selection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 3, pp. 611-614, 1981.
[5] O. Murphy, “Nearest Neighbor Pattern Classification Perceptrons,” Neural Networks: Theoretical Foundations and Analysis, C. Lau, ed., pp. 263-266, IEEE Press, 1992.
[6] S. Raudys, “Evolution and Generalization of a Single Neurone: I. Single-layer Perception as Seven Statistical Classifiers,” Neural Networks, vol. 11, no. 2, pp. 283-296, 1998.
[7] S. Raudys, “Evolution and Generalization of a Single Neurone: II. Complexity of Statistical Classifiers and Sample Size Considerations,” Neural Networks, vol. 11, no. 2, pp. 297-313, 1998.
[8] A. Rao, D. Miller, K. Rose, and A. Gersho, “A Deterministic Annealing Approach for Parsimonious Design of Piecewise Regression Models,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 2, pp. 159-173, Feb. 1999.
[9] S. Raudys, “On Dimensionality, Sample Size, and Classification Error of Nonparametric Linear Classification,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 6, pp. 667-671, June 1997.
[10] M. Aladjem, “Linear Discriminant Analysis for Two Classes Via Removal of Classification Structure,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 2, pp. 187-192, Feb. 1997.
[11] M. Minsky, Perceptrons, second ed., MIT Press, 1988.
[12] L.G. Rueda and B.J. Oommen, “On Optimal Pairwise Linear Classifiers for Normal Distributions: The$d\hbox{-}{\rm{Dimensional}}$Case,” technical report, School of Computer Science, Carleton Univ., Ottawa, Canada. In preparation.
[13] K. Fukunaga, Introduction to Statistical Pattern Recognition, second edition. Academic Press, 1990.
[14] L.G. Rueda and B.J. Oommen, “On Optimal Pairwise Linear Classifiers for Normal Distributions: The Two-Dimensional Case,” Technical Report TR-00-03, School of Computer Science, Carleton Univ., Ottawa, Canada, May 2000.
[15] J. Brown and C. Manson, The Elements of Analytical Geometry. McMillan and Co., Limited 1950.
[16] E. Deeba and A. Gunawardena, Interactive Linear Algebra with MAPLE V. Springer, 1997.
[17] O. Mangasarian, W. Street, and W. Wolberg, “Breast Cancer Diagnosis and Prognosis via Linear Programming,” Operations Research, vol. 43, no. 4, pp. 570-577, 1995.

Index Terms:
Pattern classification, statistical pattern recognition, optimal Bayesian classification, linear classifiers.
Citation:
Luis Rueda, B. John Oommen, "On Optimal Pairwise Linear Classifiers for Normal Distributions: The Two-Dimensional Case," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 2, pp. 274-280, Feb. 2002, doi:10.1109/34.982905