This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Partial Classification: The Benefit of Deferred Decision
August 1998 (vol. 20 no. 8)
pp. 769-776

Abstract—It is shown that partial classification, which allows for indecision in certain regions of the data space, can increase a benefit function, defined as the difference between the probabilities of correct and incorrect decisions, joint with the event that a decision is made. This is particularly true for small data samples, which may cause a large deviation of the estimated separation surface from the intersection surface between the corresponding probability density functions. Employing a particular density estimation method, an indecision domain is naturally defined by a single parameter, whose optimal size, maximizing the benefit function, is derived from the data. The benefit function is shown to translate into profit in stock trading. Employing medical and economic data, it is shown that partial classification produces, on average, higher benefit values than full classification, assigning each new object to a class, and that the marginal benefit of partial classification reduces as the data size increases.

[1] M.A. Aizerman, E.M, Braverman, and L.I. Rozonoer, "The Probability Problem of Pattern Recognition Learning and the Method of Potential Functions," Automation and Remote Control, vol. 25, pp. 1,175-1,190, 1964.
[2] T.W. Anderson, An Introduction to Multivariate Statistical Analysis.New York, NY: John Wiley, 1958.
[3] H.C. Andrews, Mathematical Techniques in Pattern Recognition.New York, NY: Wiley-Interscience, 1972.
[4] N.P. Archer and S. Wang, "Fuzy Set Representation of Neural Network Classification Boundaries," IEEE Trans. Systems, Man, and Cybernetics, vol. 21, no. 4, pp. 735-742, July 1991.
[5] Y. Baram, "On Two-Dimensional Data Representation by Radial Base Functions," IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 32, no. 1, pp. 163-164, Feb. 1984.
[6] E.B. Baum, "On the Capabilities of Multilayer Perceptrons," J. Complexity, vol. 4, pp. 193-215, 1988.
[7] C.M. Bishop, Neural Networks for Pattern Recognition.Oxford: Clarendon Press, 1995.
[8] C.K. Chow, "An Optimum Character Recognition System Using Decision Functions," IEEE Trans. Electronic Computers," vol. 6, pp. 247-254, 1957.
[9] T.M. Cover, "Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition," IEEE Trans. Electronic Computers, pp. 326-334, June 1965.
[10] G. Cybenko, "Approximation by Superposition of Sigmoidal Functions," Math. Control Signals and Systems, vol. 2, pp. 303-314, 1989.
[11] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis.New York, NY: John Wiley, 1973.
[12] K. Fukunaga, "Introduction to Statistical Pattern Recognition,"San Diego, Calif.: Academic Press, 1990.
[13] K. Fukunaga and D.L. Kessell, "Application of Error-Reject Functions," IEEE Trans. Information Theory, vol. 18, pp. 814-817, 1972.
[14] G.H. Golub and C.F. Van Loan, Matrix Computation.Baltimore, Md.: The Johns Hopkins Univ. Press, 1983.
[15] A.D. Gordon, Classification.London: Chapman and Hall, 1980.
[16] M.E. Hellman, "The Nearest Neighbor Classification Rule With a Reject Option," IEEE Trans. Systems, Science, and Cybernetics, vol. 6, no. 3, pp. 179-185, July 1970.
[17] P.G. Hoel, Introduction to Mathematical Statistics.New York, NY: Wiley, 1984.
[18] A.R. Horn and C.H. Johnson, Matrix Analysis. Cambridge Univ. Press, 1985.
[19] H. Ishibuchi and H. Tanaka, "Approximate Pattern Classification using neural Networks," Fuzzy Logic, R. Lowen and M. Roubens eds., pp. 225-236. Kluwer Academic Publishers, 1993.
[20] J. Meinguet, "Multivariable Interpolation at Arbitrary Points Made Simple," J. Applied Math. Phys. (ZAMP), vol. 30, pp. 292-304, 1979.
[21] M.L. Minsky and S.A. Pappert, Perceptrons. M.I.T. Press, 1988.
[22] N.J. Nilsson, Learning Machines.Princeton, NJ: McGraw-Hill, 1965.
[23] A. Papoulis, Probability, Random Variables and Stochastic Processes.Princeton, NJ: McGraw-Hill, 1984.
[24] E. Parzen, "On Estimation of Probability Density Function and Mode," Annals Mathematical Statistics, vol. 33, pp. 1,065-1,076, 1962.
[25] F. Rosenblatt, "The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain," Psychological Review, vol. 65, pp. 386-408, 1958.
[26] Z. Roth and Y. Baram, "Multi-Dimensional Density Shaping by Sigmoids," IEEE Trans. Neural Networks, vol. 7, no. 5, pp. 1,291-1,298, Sept. 1996.
[27] Univ. of California at Irvine, Machine Learning Data Bases. www.ics.uci.edu/AI/MLMachine-Learning.html .

Index Terms:
Classification, pattern recognition, hypothesis testing, decision making, machine learning, stock trading, medical diagnosis.
Citation:
Yoram Baram, "Partial Classification: The Benefit of Deferred Decision," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 769-776, Aug. 1998, doi:10.1109/34.709564
Usage of this product signifies your acceptance of the Terms of Use.