CSDL Home IEEE Transactions on Pattern Analysis & Machine Intelligence 2009 vol.31 Issue No.11 - November
Issue No.11 - November (2009 vol.31)
Tristan Mary-Huard , UMR AgroParisTech/INRIA, Paris
Stéphane Robin , UMR AgroParisTech/INRIA, Paris
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPAMI.2009.55
Compression and variable selection are two classical strategies to deal with large-dimension data sets in classification. We propose an alternative strategy, called aggregation, which consists of a clustering step of redundant variables and a compression step within each group. We develop a statistical framework to define tailored aggregation methods that can be combined with selection methods to build reliable classifiers that benefit from the information contained in redundant variables. Two algorithms are proposed for ordered and nonordered variables, respectively. Applications to the kNN and CART algorithms are presented.
Classification, aggregation, selection, large-dimension data, ordered variables.
Tristan Mary-Huard, Stéphane Robin, "Tailored Aggregation for Classification", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.31, no. 11, pp. 2098-2105, November 2009, doi:10.1109/TPAMI.2009.55