|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
2011 IEEE 11th International Conference on Data Mining
A Fast and Flexible Clustering Algorithm Using Binary Discretization
Vancouver, Canada
December 11-December 14
ISBN: 978-0-7695-4408-3
| ASCII Text | x | ||
| Mahito Sugiyama, Akihiro Yamamoto, "A Fast and Flexible Clustering Algorithm Using Binary Discretization," Data Mining, IEEE International Conference on, pp. 1212-1217, 2011 IEEE 11th International Conference on Data Mining, 2011. | |||
| BibTex | x | ||
| @article{ 10.1109/ICDM.2011.9, author = {Mahito Sugiyama and Akihiro Yamamoto}, title = {A Fast and Flexible Clustering Algorithm Using Binary Discretization}, journal ={Data Mining, IEEE International Conference on}, volume = {0}, year = {2011}, issn = {1550-4786}, pages = {1212-1217}, doi = {http://doi.ieeecomputersociety.org/10.1109/ICDM.2011.9}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Data Mining, IEEE International Conference on TI - A Fast and Flexible Clustering Algorithm Using Binary Discretization SN - 1550-4786 SP1212 EP1217 A1 - Mahito Sugiyama, A1 - Akihiro Yamamoto, PY - 2011 KW - Shape-based clustering KW - Hierarchical clustering KW - Discretization KW - Binary encoding KW - Sorting VL - 0 JA - Data Mining, IEEE International Conference on ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2011.9
We present in this paper a new clustering algorithm for multivariate data. This algorithm, called BOOL (Binary coding Oriented clustering), can detect arbitrarily shaped clusters and is noise tolerant. BOOL handles data using a two-step procedure: data points are first discretized and represented as binary words, clusters are then iteratively constructed by agglomerating smaller clusters using this representation. This latter step is carried out with linear complexity by sorting such binary representations, which results in dramatic speedups when compared with other techniques. Experiments show that BOOL is faster than K-means, and about two to three orders of magnitude faster than two state-of-the-art algorithms that can detect non-convex clusters of arbitrary shapes. We also show that BOOL's results are robust to changes in parameters, whereas most algorithms for arbitrarily shaped clusters are known to be overly sensitive to such changes. The key to the robustness of BOOL is the hierarchical structure of clusters that is introduced automatically by increasing the accuracy of the discretization.
Index Terms:
Shape-based clustering, Hierarchical clustering, Discretization, Binary encoding, Sorting
Citation:
Mahito Sugiyama, Akihiro Yamamoto, "A Fast and Flexible Clustering Algorithm Using Binary Discretization," icdm, pp.1212-1217, 2011 IEEE 11th International Conference on Data Mining, 2011
Usage of this product signifies your acceptance of the Terms of Use.
