2013 IEEE 13th International Conference on Data Mining Workshops (2012)
Brussels, Belgium Belgium
Dec. 10, 2012 to Dec. 10, 2012
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2012.94
In this paper we propose a framework for modeling the intrinsic dimensionality of data sets. The models can be viewed as generalizations of the expansion dimension, which was originally proposed for the analysis of certain similarity search indices using the Euclidean distance metric. Here, we extend the original model to other metric spaces: vector spaces with the $L_p$ or vector angle (cosine similarity) distance measures, as well as product spaces for categorical data. We also provide a practical guide for estimating both local and global intrinsic dimensionality. The estimates of data complexity can subsequently be used in the design and analysis of algorithms for data mining applications such as search, clustering, classification, and outlier detection.
Vectors, Data mining, Extraterrestrial measurements, Search problems, Complexity theory, Data models
Michael E. Houle, Hisashi Kashima, Michael Nett, "Generalized Expansion Dimension", 2013 IEEE 13th International Conference on Data Mining Workshops, vol. 00, no. , pp. 587-594, 2012, doi:10.1109/ICDMW.2012.94