2011 IEEE 11th International Conference on Data Mining (2011)
Dec. 11, 2011 to Dec. 14, 2011
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2011.70
In today's applications, data analysis tasks are hindered by many attributes per object as well as by faulty data with missing values. Subspace clustering tackles the challenge of many attributes by cluster detection in any subspace projection of the data. However, it poses novel challenges for handling missing values of objects, which are part of multiple subspace clusters in different projections of the data. In this work, we propose a general fault tolerance definition enhancing subspace clustering models to handle missing values. We introduce a flexible notion of fault tolerance that adapts to the individual characteristics of subspace clusters and ensures a robust parameterization. Allowing missing values in our model increases the computational complexity of subspace clustering. Thus, we prove novel monotonicity properties for an efficient computation of fault tolerant subspace clusters. Experiments on real and synthetic data show that our fault tolerance model yields high quality results even in the presence of many missing values. For repeatability, we provide all datasets and executables on our website.
subspace clustering, missing values, incomplete data, fault tolerance
T. Seidl, S. Günnemann, E. Müller and S. Raubach, "Flexible Fault Tolerant Subspace Clustering for Data with Missing Values," 2011 IEEE 11th International Conference on Data Mining(ICDM), Vancouver, Canada, 2011, pp. 231-240.