CSDL Home IEEE Transactions on Pattern Analysis & Machine Intelligence 2008 vol.30 Issue No.04 - April

Subscribe

Issue No.04 - April (2008 vol.30)

pp: 658-669

ABSTRACT

This paper presents a deterministic solution to an approximated classification-error based objectivefunction. In the formulation, we propose a quadratic approximation as the function for achieving smootherror counting. The solution is subsequently found to be related to the weighted least-squares wherebya robust tuning process can be incorporated. The tuning traverses between the least-squares estimateand the approximated total-error-rate estimate to cater for various situations of unbalanced attributedistributions. By adopting a linear parametric classifier model, the proposed classification-error basedlearning formulation is empirically shown to be superior to that using the original least-squares-errorcost function. Finally, it will be seen that the performance of the proposed formulation is comparableto other classification-error based and state-of-the-art classifiers without sacrificing the computationalsimplicity.

INDEX TERMS

Pattern Classification, Classification Error Rate, Discriminant Functions, Polynomials andMachine Learning

CITATION

Kar-Ann Toh, How-Lung Eng, "Between Classification-Error Approximation and Weighted Least-Squares Learning",

*IEEE Transactions on Pattern Analysis & Machine Intelligence*, vol.30, no. 4, pp. 658-669, April 2008, doi:10.1109/TPAMI.2007.70730REFERENCES

- [1] R.O. Duda, P.E. Hart, and D.G. Stork,
Pattern Classification, second ed. John Wiley & Sons, 2001.- [2] J. Schürmann,
Pattern Classification: A Unified View of Statistical and Neural Approaches. John Wiley & Sons, 1996.- [3] T. Hastie, R. Tibshirani, and J. Friedman,
The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.- [8] V.N. Vapnik,
Statistical Learning Theory. Wiley-Interscience, 1998.- [9] E.E. Osuna, R. Freund, and F. Girosi, “Support Vector Machines: Training and Applications,” Technical Report: A.I. Memo No. 1602, C.B.C.L. Paper No. 144, MIT Artificial Intelligence Laboratory and CBCL Dept. of Brain and Cognitive Sciences, 1997.
- [10] C.J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,”
Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121-167, 1998.- [16] B. Schölkopf and A.J. Smola,
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2002.- [18] K.-A. Toh, “Learning from Target Knowledge Approximation,”
Proc. First IEEE Conf. Industrial Electronics and Applications, pp. 815-822, May 2006.- [19] G.J. Gordon, “${\rm Generalized}^{2}\;{\rm Linear}^{2}$ Models,”
Proc. Advances in Neural Information Processing Systems (NIPS '02), pp. 577-584, Dec. 2002.- [20] P. McCullagh and J.A. Nelder,
Generalized Linear Models, second ed. Chapman and Hall, 1989.- [21] T. Hastie, R. Tibshirani, and J. Friedman,
The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.- [22] N.R. Draper and H. Smith,
Applied Regression Analysis. John Wiley & Sons, 1998.- [23] Y. Freund and R.E. Schapire, “A Short Introduction to Boosting,”
J. Japanese Soc. Artificial Intelligence, no. 5, pp. 771-780, Sept. 1999.- [24] Y. Freund, “Boosting a Weak Learning Algorithm by Majority,”
Information and Computation, vol. 121, pp. 256-285, 1995.- [26] D.J. Newman, S. Hettich, C.L. Blake, and C.J. Merz, “UCI Repository of Machine Learning Databases,” Univ. of California, Dept. of Information and Computer Sciences, http://www.ics. uci.edu/~mlearnMLRepository.html , 1998.
- [27] T.-S. Lim, W.-Y. Loh, and Y.-S. Shil, “A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms,”
Machine Learning, vol. 40, no. 3, pp. 203-228, 2000. |