This Article 
 Bibliographic References 
 Add to: 
LESS: A Model-Based Classifier for Sparse Subspaces
September 2005 (vol. 27 no. 9)
pp. 1496-1500
In this paper, we specifically focus on high-dimensional data sets for which the number of dimensions is an order of magnitude higher than the number of objects. From a classifier design standpoint, such small sample size problems have some interesting challenges. The first challenge is to find, from all hyperplanes that separate the classes, a separating hyperplane which generalizes well for future data. A second important task is to determine which features are required to distinguish the classes. To attack these problems, we propose the LESS (Lowest Error in a Sparse Subspace) classifier that efficiently finds linear discriminants in a sparse subspace. In contrast with most classifiers for high-dimensional data sets, the LESS classifier incorporates a (simple) data model. Further, by means of a regularization parameter, the classifier establishes a suitable trade-off between subspace sparseness and classification accuracy. In the experiments, we show how LESS performs on several high-dimensional data sets and compare its performance to related state-of-the-art classifiers like, among others, linear ridge regression with the LASSO and the Support Vector Machine. It turns out that LESS performs competitively while using fewer dimensions.

[1] U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine, “Broad Patterns of Gene Expression Revealed by Clustering of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays,” Proc. Nat'l Academy of Sciences, vol. 96, no. 12, pp. 6745-6750, 1999.
[2] C.G. Atkeson, A.W. Moore, and S. Schaal, “Locally Weighted Learning,” Artificial Intelligence Rev., vol. 11, pp. 11-73, 1997.
[3] C. Bhattacharyya, L.R. Grate, A. Rizki, D. Radisky, F.J. Molina, M.I. Jordan, M.J. Bissel, and I.S. Mian, “Simultaneous Classification and Relevant Feature Identification in High-Dimensional Spaces: Application to Molecular Profiling Data,” Signal Processing, vol. 83, pp. 729-743, 2003.
[4] C.L. Blake and C.J. Merz, “UCI Repository of Machine Learning Databases,” 1998.
[5] P.S. Bradley and O.L. Mangasarian, “Feature Selection via Concave Minimization and Support Vector Machines,” Proc. 15th Int'l Conf. Machine Learning, pp. 82-90, 1998.
[6] L. Breiman, “Better Subset Selection Using Non-Negative Garotte,” technical report, Univ. of California, Berkeley, 1993.
[7] C.J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121-167, 1998.
[8] J.C.W. Debuse and V.J. Rayward-Smith, “Feature Subset Selection within a Simulated Annealing Data Mining Algorithm,” J. Intelligent Information Systems, vol. 9, no. 1, pp. 57-81, 1997.
[9] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification. New York: John Wiley and Sons, Inc., 2001.
[10] “Free Software Foundation,” GNU Linear Programming Kit, http:/, 2005.
[11] G. Fung and O.L. Mangasarian, “Data Selection for Support Vector Machine Classifiers,” Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, R. Ramakrishnan and S. Stolfo, eds., vol. 2094, pp. 64-70, Aug. 2000.
[12] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander, “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 286, no. 5439, pp. 531-537, 1999.
[13] R.P. Gorman and T.J. Sejnowski, “Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets,” Neural Networks, vol. 1, pp. 75-89, 1988.
[14] M. Karzynski, A. Mateos, J. Herrero, and J. Dopazo, “Using a Genetic Algorithm and a Perceptron for Feature Selection and Supervised Class Learning in DNA Microarray Data,” Artificial Intelligence Rev., vol. 20, pp. 39-51, 2003.
[15] R. Kohavi and G.H. John, “Wrappers for Feature Subset Selection,” Artificial Intelligence, vol. 97, pp. 273-324, Dec. 1997.
[16] “Mosek Optimization Toolbox,” http:/, 2005.
[17] M.R. Osborne, B. Presnell, and B.A. Turlach, “On the LASSO and Its Dual,” J. Computational and Graphical Statistics, vol. 9, no. 2, pp. 319-337, 2000.
[18] V.G. Sigillito, S.P. Wing, L.V. Hutton, and K.B. Baker, “Classification of Radar Returns from the Ionosphere Using Neural Networks,” Johns Hopkins APL Technical Digest, vol. 10, pp. 262-266, 1989.
[19] S. Theodoridis and K. Koutroumbas, Pattern Recognition. London: Academic Press, 1999.
[20] R. Tibshirani, “Regression Shrinkage and Selection via the LASSO,” J. Royal Statistical Soc. B, vol. 58, no. 1, pp. 267-288, 1996.
[21] R. Tibshirani, T. Hastie, B. Balasubramanian, and G. Chu, “Diagnosis of Multiple Cancer Types by Shrunken Centroids of Gene Expression,” Proc. Nat'l Academy of Sciences, vol. 99, no. 10, pp. 6567-6572, 2002.
[22] L.J. van 't Veer, H. Dai, M.J. van de Vijver, Y.D. He, A.A.M. Hart, M. Mao, H.L. Peterse, K. van der Kooy, M.J. Marton, A.T. Witteveen, G.J. Schreiber, R.M. Kerkhoven, C. Roberts, P.S. Linsley, R. Bernards, and S.H. Friend, “Gene Expression Profiling Predicts Clinical Outcome of Breast Cancer,” Nature, vol. 415, pp. 530-536, Jan. 2002.
[23] V. Vapnik, Statistical Learning Theory. Wiley, 1998.
[24] H. Zhang and G. Sun, “Feature Selection Using Tabu Search Method,” Pattern Recognition, vol. 35, pp. 701-711, 2002.

Index Terms:
Index Terms- Classification, support vector machine, high-dimensional, feature subset selection, mathematical programming.
Cor J. Veenman, David M.J. Tax, "LESS: A Model-Based Classifier for Sparse Subspaces," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 9, pp. 1496-1500, Sept. 2005, doi:10.1109/TPAMI.2005.182
Usage of this product signifies your acceptance of the Terms of Use.