This Article 
 Bibliographic References 
 Add to: 
Tailored Aggregation for Classification
November 2009 (vol. 31 no. 11)
pp. 2098-2105
Tristan Mary-Huard, UMR AgroParisTech/INRIA, Paris
Stéphane Robin, UMR AgroParisTech/INRIA, Paris
Compression and variable selection are two classical strategies to deal with large-dimension data sets in classification. We propose an alternative strategy, called aggregation, which consists of a clustering step of redundant variables and a compression step within each group. We develop a statistical framework to define tailored aggregation methods that can be combined with selection methods to build reliable classifiers that benefit from the information contained in redundant variables. Two algorithms are proposed for ordered and nonordered variables, respectively. Applications to the kNN and CART algorithms are presented.

[1] U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine, “Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays,” Proc. Nat'l Academy of Sciences USA, vol. 96, no. 12, pp. 6745-6750, 1999.
[2] M. Anderberg, Cluster Analysis for Applications. Academic Press, Inc., 1973.
[3] A. Ben-Dor, L. Bruhn, N. Friedman, I. Nachman, M. Schummer, and Z. Yakhini, “Tissue Classification with Gene Expression Profiles,” J.Computational Biology, vol. 7, pp. 559-583, 2000.
[4] G. Biau, F. Bunea, and M. Wegkamp, “Functional Classification in Hilbert Spaces,” IEEE Trans. Information Theory, vol. 51, no. 6, pp. 2163-2172, June 2005.
[5] L. Birgé and P. Massart, “Minimal Penalties for Gaussian Model Selection,” Probability Theory and Related Fields, vol. 138, pp. 33-73, 2007.
[6] L. Breiman, “Statistical Modeling: The Two Cultures,” Statistical Science, vol. 16, no. 3, pp. 199-231, 2001.
[7] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. Wadsworth Int'l, 1984.
[8] D. Costanzo, C. Preda, and G. Saporta, “Anticipated Prediction in Discriminant Analysis on Functional Data for Binary Response,” Proc. 17th Symp. Computational Statistics, pp. 821-828, 2006.
[9] M. Dettling, “Revealing Predictive Gene Clusters with Supervised Algorithms,” Proc. Conf. in Distributed Statistical Computing, 2003.
[10] M. Dettling and P. Bühlmann, “Supervised Clustering of Genes,” Genome Biology, vol. 3, no. 12, pp. 1-15, 2002.
[11] R. Diaz-Uriarte, “Molecular Signatures from Gene Expression Data,” 0401043, 2004.
[12] S. Dudoit, J. Fridlyand, and T. Speed, “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data,” J. Am. Statistical Assoc., vol. 97, pp. 77-87, 2002.
[13] E. Fix and J. Hodges, “Discriminatory Analysis—Nonparametric Discrimination: Consistency Principles,” Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques, B.V. Dasarath, ed., IEEE CS Press, 1991.
[14] E. Fix and J. Hodges, “Nonparametric Discrimination: Small Sample Performance,” Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques, B.V. Dasarath, ed., IEEE CS Press, 1991.
[15] K. Fukumizu, F. Bach, and M. Jordan, “Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces,” J. Machine Learning Research, vol. 5, pp. 73-99, 2004.
[16] I. Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,” J. Machine Learning Research, vol. 3, pp. 1157-1182, 2003.
[17] H. Harman, Modern Factor Analysis. Univ. of Chicago Press, 1973.
[18] T. Hastie, A. Buja, and R. Tibshirani, “Penalized Discriminant Analysis,” Annals of Statistics, vol. 23, pp. 73-102, 1995.
[19] T. Hastie, R. Tibshirani, M. Eisen, A. Alizadeh, R. Levy, L. Staudt, W.C. Chan, D. Botstein, and P. Brown, “‘Gene Shaving’ as a Method for Identifying Distinct Sets of Genes with Similar Expression Patterns,” Genome Biology, vol. 1, no. 2, 2000.
[20] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.
[21] R. Kohavi and G. John, “Wrappers for Feature Subset Selection,” Artificial Intelligence, vol. 97, pp. 273-324, 1997.
[22] B. Krishnapuram, A. Hartemink, L. Carin, and M. Figueiredo, “A Bayesian Approach to Joint Feature Selection and Classifier Design,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 9, pp. 1105-1111, Sept. 2004.
[23] B. Krishnapuram, L. Carin, and A. Hartemink, “Gene Expression Analysis: Joint Feature Selection and Classifier Design,” Kernel Methods in Computational Biology, MIT Press, 2004.
[24] M. Lavielle, “Detection of Multiple Changes in a Sequence of Dependent Variables,” Stochastic Processes and Their Applications, vol. 83, pp. 79-102, 1999.
[25] M. Lavielle, “Using Penalised Contrasts for the Change-Point Problem,” Signal Process, vol. 85, no. 8, pp. 1501-1510, 2005.
[26] T. Mary-Huard, S. Robin, and J.-J. Daudin, “A Penalized Criterion for Variable Selection in Classification,” J. Multiple Analysis, vol. 98, no. 4, pp. 695-705, 2007.
[27] S. Michiels, S. Koscielny, and C. Hill, “Prediction of Cancer Outcome with Microarrays: A Multiple Random Validation Strategy,” Lancet, vol. 365, pp. 488-492, 2005.
[28] C. Preda, G. Saporta, and C. Lévéder, “PLS Classification of Functional Data,” Computational Statistics, vol. 22, no. 2, pp. 223-235, 2007.
[29] F. Rossi and N. Villa, “Support Vector Machine for Functional Data Classification,” Neural Computing, vol. 69, nos. 7-9, pp. 223-239, 2006.
[30] G. Saporta, Probabilités, Analyse des Données et Statistique. Editions Technip, 1990.
[31] Y. Su, T. Murali, V. Pavlovic, M. Schaffer, and S. Kasif, “Rankgene: Identification of Diagnostic Genes Based on Expression Data,” Bioinformatics, vol. 19, no. 12, pp. 1578-1579, 2003.
[32] C. Tuleau, “Sélection de Variables Pour la Discrimination en Grande Dimension et Classification de Données Fonctionnelles,” PhD thesis, Univ. Paris-Sud XI, 2005.
[33] M. Xiong, W. Li, J. Zhao, L. Jin, and E. Boerwinkle, “Feature (Gene) Selection in Gene Expression-Based Tumor Classification,” Molecular Genetics and Metabolism, vol. 73, no. 3, pp. 239-247, 2001.
[34] L. Yu and H. Liu, “Efficient Feature Selection via Analysis of Relevance and Redundancy,” J. Machine Learning Research, vol. 5, pp. 1205-1224, 2004.

Index Terms:
Classification, aggregation, selection, large-dimension data, ordered variables.
Tristan Mary-Huard, Stéphane Robin, "Tailored Aggregation for Classification," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 11, pp. 2098-2105, Nov. 2009, doi:10.1109/TPAMI.2009.55
Usage of this product signifies your acceptance of the Terms of Use.