2014 IEEE International Conference on Data Mining (ICDM) (2014)

Shenzhen, China

Dec. 14, 2014 to Dec. 17, 2014

ISSN: 1550-4786

ISBN: 978-1-4799-4303-6

pp: 510-519

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2014.57

ABSTRACT

How to explore complex data? Often, several representations for each data object are available, the data are described by attributes of heterogeneous data type and/or each data object is characterized by many features. It is difficult to choose a suitable similarity measure and an appropriate data mining technique to get an unbiased overview on the information contained in complex data. In this paper, we introduce Metric Factorization as a novel data mining task. The goal of Metric Factorization is to discover the major alternative views of complex data. Our novel algorithm MF extends matrix factorization techniques to support metric data. We do not need to choose a single similarity measure but can just input any available metric. Metric Factorization builds automatically interesting basis spaces from a large variety of input metrics. Due to metric properties, the basis spaces can be further explored with standard techniques like Multidimensional Scaling. We relate the Metric Factorization task to data compression and demonstrate how ideas from information theory (Minimum Description Length principle) make the parametrization of MF optional. We further introduce the idea of landmark points to effectively compress and thus support large data sets. Extensive experiments demonstrate the benefits of our approach.

INDEX TERMS

Measurement, Encoding, Image color analysis, Feature extraction, Optimization, Data mining, Standards

CITATION

C. Plant, "Metric Factorization for Exploratory Analysis of Complex Data,"

*2014 IEEE International Conference on Data Mining (ICDM)*, Shenzhen, China, 2014, pp. 510-519.

doi:10.1109/ICDM.2014.57

CITATIONS

SEARCH