This Article 
 Bibliographic References 
 Add to: 
Interpretable Hierarchical Clustering by Constructing an Unsupervised Decision Tree
January 2005 (vol. 17 no. 1)
pp. 121-132
In this paper, we propose a method for hierarchical clustering based on the decision tree approach. As in the case of supervised decision tree, the unsupervised decision tree is interpretable in terms of rules, i.e., each leaf node represents a cluster, and the path from the root node to a leaf node represents a rule. The branching decision at each node of the tree is made based on the clustering tendency of the data available at the node. We present four different measures for selecting the most appropriate attribute to be used for splitting the data at every branching node (or decision node), and two different algorithms for splitting the data at each decision node. We provide a theoretical basis for the approach and demonstrate the capability of the unsupervised decision tree for segmenting various data sets. We also compare the performance of the unsupervised decision tree with that of the supervised one.

[1] R. Duda, P. Hart, and D. Stork, Pattern Classification, second ed. New York: Wiley, 2001.
[2] J. Durkin, “Induction via ID3,” AI Expert, vol. 7, pp. 48-53, 1992.
[3] U.M. Fayyad and K.B. Irani, “On the Handling of Continuous-values Attributes in Decision Tree Generation,” Machine Learning, vol. 8, pp. 87-102, 1992.
[4] L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees. New York: Chapman & Hall, 1993.
[5] J.R. Quinlan, Programs for Machine Learning. San Fransisco: Morgan Kaufmann, 1993.
[6] J.R. Quinlan, “Improved Use of Continuous Attributes in C4.5,” J. Artificial Intelligence, vol. 4, pp. 77-90, 1996.
[7] C.E. Brodley and P.E. Utgoff, “Multivariate Decision Trees,” Machine Learning, vol. 19, pp. 45-77, 1995.
[8] Y. Yang and J.O. Pedersen, “A Comparatative Study on Feature Selection in Text Categorization,” Proc. 14th Int'l Conf. Machine Learning (ICML '97), pp. 412-420, 1997.
[9] M. Riley, “Some Applications of Tree Based Modeling to Speech and Language Indexing,” Proc. DARPA Speech and Natural Language Workshop, pp. 339-352, 1989.
[10] J. Chien, C. Huang, and S. Chen, “Compact Decision Trees with Cluster Validity for Speech Recognition,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, pp. 873-876, 2002.
[11] S. Salzberg, A.L. Delcher, K.H. Fasman, and J. Henderson, “A Decision Tree System for Finding Genes in DNA,” J. Computational Biology, vol. 5, pp. 667-680, 1998.
[12] O. Zamir and O. Etzioni, “Web Document Clustering: A Feasibility Demonstration,” Research and Development in Information Retrieval, pp. 46-54, 1998.
[13] Y.H. Cho, J.K. Kim, and S.H. Kim, “A Personalized Recommender System Based on Web Usage Mining and Decision Tree Induction,” Expert Systems with Applications, vol. 23, pp. 329-342, 2002.
[14] M. Held and J.M. Buhmann, “Unsupervised On-Line Learning of Decision Trees for Hierarchical Data Analysis,” Proc. Advances of the Neural Information Processing Systems (NIPS), 1997.
[15] P. Bellot and M. El-Beze, “Clustering by Means of Unsupervised Decision Trees or Hierarchical and k-Means Like Algorithms,” Proc. RIAO 2000 Conf., pp. 344-363, 2000.
[16] B. Liu, Y. Xia, and P. Yu, “Clustering through Decision Tree Construction,” Technical Report RC 21695, IBM Research Report, IBM, 2000.
[17] B. Liu, Y. Xia, and P.S. Yu, “Clustering through Decision Tree Construction,” Proc. Conf. Information and Knowledge Management (CIKM), pp. 20-29, 2000.
[18] D. Boley, “Hierarchical Taxonomies Using Divisive Partitioning,” Technical Report TR-98-012, Dept. of Computer Science, Univ. of Minnesota, Minneapolis, 1998.
[19] S.M. Savaresi, D.L. Boley, S. Bittanti, and G. Gazzaniga, “Choosing the Cluster to Split in Bisecting Divisive Clustering Algorithms,” CSE Report TR 00-055, Univ. of Minnesota, 2000.
[20] M.H. Law, A.K. Jain, and M.A.T. Figueiredo, “Feature Selection in Mixture-Based Clustering,” Proc. Advances in Neural Information Processing, vol. 15, 2003.
[21] F. Markowetz and A. v. Heyderbreck, “Class Discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines,” Proc. 26th Ann. Conf. Gesellschaft für Klassifikation, pp. 1-8, 2002.
[22] A. v. Heydebreck, W. Huber, A. Poustka, and M. Vingron, “Identifying Splits with Clear Separation: A New Class Discovery Method for Gene Expression Data,” Bioinformatics, vol. 17, pp. 107-114, 2001.
[23] A. Ben-Dor, N. Friedman, and Z. Yakhini, “Class Discovery in Gene Expression Data,” Proc. Eighth Ann. Int'l Conf. Research in Computational Molecular Biology, pp. 31-38, 2001.
[24] A. Hinneburg and D.A. Keim, “Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering,” Proc. 25th Very Large Data Bases Conf., pp. 506-517, 1999.
[25] H.H. Bock, “Probabilistic Aspects in Cluster Analysis,” Conceptual and Numerical Analysis of Data, O. Opitz, ed., pp. 12-44, Heidelberg: Springer-Verlag, 1989.
[26] H.H. Bock, “Information and Entropy in Cluster Analysis,” Proc. First US/Japan Conf. Frontiers of Statistical Modeling, 1994.
[27] F.B. Baulieu, “A Classification of Presence/Absence Based Dissimilarity Coefficients,” J. Classification, vol. 6, pp. 233-246, 1989.
[28] J. Basak, R.K. De, and S.K. Pal, “Unsupervised Feature Selection Using Neuro-Fuzzy Approach,” Pattern Recognition Letters, vol. 19, pp. 997-1006, 1998.
[29], 2003.
[30] A. El-Hamdouchi and P. Willett, “Hierarchical Document Clustering Using Ward's Method,” Proc. Ninth Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 149-156, 1986.
[31] S. Garner, “Weka: The Waikato Environment for Knowledge Analysis,” Proc. New Zealand Computer Science Research Students Conf., pp. 57-64, 1995.
[32] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, chapter 8, Morgan Kaufmann, 2000.
[33] R. Krishnapuram and K. Kummamuru, “Automatic Taxonomy Generation: Issues and Possibilities,” Proc. Fuzzy Sets and Systems (IFSA '03), pp. 52-63, 2003.
[34], 2004.
[35] G. Salton, Cluster Search Strategies and the Optimization of Retrieval Effectiveness. Englewood Cliffs, N.J.: Prentice Hall, 1971.

Index Terms:
Unsupervised decision tree, entropy, data set segmentation.
Jayanta Basak, Raghu Krishnapuram, "Interpretable Hierarchical Clustering by Constructing an Unsupervised Decision Tree," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 1, pp. 121-132, Jan. 2005, doi:10.1109/TKDE.2005.11
Usage of this product signifies your acceptance of the Terms of Use.