Issue No.04 - April (2008 vol.30)
pp: 658-669
This paper presents a deterministic solution to an approximated classification-error based objectivefunction. In the formulation, we propose a quadratic approximation as the function for achieving smootherror counting. The solution is subsequently found to be related to the weighted least-squares wherebya robust tuning process can be incorporated. The tuning traverses between the least-squares estimateand the approximated total-error-rate estimate to cater for various situations of unbalanced attributedistributions. By adopting a linear parametric classifier model, the proposed classification-error basedlearning formulation is empirically shown to be superior to that using the original least-squares-errorcost function. Finally, it will be seen that the performance of the proposed formulation is comparableto other classification-error based and state-of-the-art classifiers without sacrificing the computationalsimplicity.
Pattern Classification, Classification Error Rate, Discriminant Functions, Polynomials andMachine Learning
Kar-Ann Toh, How-Lung Eng, "Between Classification-Error Approximation and Weighted Least-Squares Learning", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.30, no. 4, pp. 658-669, April 2008, doi:10.1109/TPAMI.2007.70730
[1] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, second ed. John Wiley & Sons, 2001.
[2] J. Schürmann, Pattern Classification: A Unified View of Statistical and Neural Approaches. John Wiley & Sons, 1996.
[3] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.
[4] T. Poggio, R. Rifkin, S. Mukherjee, and P. Niyogi, “General Conditions for Predictivity in Learning Theory,” Nature, vol. 428, pp. 419-422, Mar. 2004.
[5] G. Baudat and F. Anouar, “Generalized Discriminant Analysis Using a Kernel Approach,” Neural Computation, vol. 12, pp. 2385-2404, 2000.
[6] J. Lu, K.N. Plataniotis, and A.N. Venetsanopoulos, “Face Recognition Using Kernel Direct Discriminant Analysis Algorithms,” IEEE Trans. Neural Networks, vol. 14, no. 1, pp. 117-126, 2003.
[7] B.E. Boser, I.M. Guyon, and V.N. Vapnik, “A Training Algorithm for Optimal Margin Classifiers,” Proc. Fifth Ann. Workshop Computational Learning Theory, pp. 144-152, 1992.
[8] V.N. Vapnik, Statistical Learning Theory. Wiley-Interscience, 1998.
[9] E.E. Osuna, R. Freund, and F. Girosi, “Support Vector Machines: Training and Applications,” Technical Report: A.I. Memo No. 1602, C.B.C.L. Paper No. 144, MIT Artificial Intelligence Laboratory and CBCL Dept. of Brain and Cognitive Sciences, 1997.
[10] C.J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121-167, 1998.
[11] E. Jones, P. Runkle, N. Dasgupta, L. Couchman, and L. Carin, “Genetic Algorithm Wavelet Design for Signal Classification,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 8, pp. 890-895, Aug. 2001.
[12] N.S. Kim and S.S. Park, “Discriminative Training for Concatenative Speech Synthesis,” IEEE Signal Processing Letters, vol. 11, no. 1, pp. 40-43, 2004.
[13] M. Rimer and T. Martinez, “Classification-Based Objective Functions,” Machine Learning, vol. 63, no. 2, pp. 183-205, 2006.
[14] B.E. Boser, I.M. Guyon, and V.N. Vapnik, “A Training Algorithm for Optimal Margin Classifier,” Proc. Fifth ACM Workshop Computational Learning Theory, pp. 144-152, 1992.
[15] T. Poggio and F. Girosi, “Networks for Approximation and Learning,” Proc. IEEE, vol. 78, no. 9, pp. 1481-1497, 1990.
[16] B. Schölkopf and A.J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2002.
[17] K.-A. Toh, Q.-L. Tran, and D. Srinivasan, “Benchmarking a Reduced Multivariate Polynomial Pattern Classifier,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 6, pp. 740-755, June 2004.
[18] K.-A. Toh, “Learning from Target Knowledge Approximation,” Proc. First IEEE Conf. Industrial Electronics and Applications, pp. 815-822, May 2006.
[19] G.J. Gordon, “${\rm Generalized}^{2}\;{\rm Linear}^{2}$ Models,” Proc. Advances in Neural Information Processing Systems (NIPS '02), pp. 577-584, Dec. 2002.
[20] P. McCullagh and J.A. Nelder, Generalized Linear Models, second ed. Chapman and Hall, 1989.
[21] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.
[22] N.R. Draper and H. Smith, Applied Regression Analysis. John Wiley & Sons, 1998.
[23] Y. Freund and R.E. Schapire, “A Short Introduction to Boosting,” J. Japanese Soc. Artificial Intelligence, no. 5, pp. 771-780, Sept. 1999.
[24] Y. Freund, “Boosting a Weak Learning Algorithm by Majority,” Information and Computation, vol. 121, pp. 256-285, 1995.
[25] K.-A. Toh, “Training a Reciprocal-Sigmoid Classifier by Feature Scaling-Space,” Machine Learning, vol. 65, no. 1, pp. 273-308, 2006.
[26] D.J. Newman, S. Hettich, C.L. Blake, and C.J. Merz, “UCI Repository of Machine Learning Databases,” Univ. of California, Dept. of Information and Computer Sciences, http://www.ics. , 1998.
[27] T.-S. Lim, W.-Y. Loh, and Y.-S. Shil, “A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms,” Machine Learning, vol. 40, no. 3, pp. 203-228, 2000.
[28] J. Li, G. Dong, K. Ramamohanarao, and L. Wong, “DeEPs: A New Instance-Based Lazy Discovery and Classification System,” Machine Learning, vol. 54, no. 2, pp. 99-124, 2004.