The Community for Technology Leaders
2016 International Conference on Frontiers of Information Technology (FIT) (2016)
Islamabad, Pakistan
Dec. 19, 2016 to Dec. 21, 2016
ISBN: 978-1-5090-5300-1
pp: 81-86
Taxonomy is an effective mean of organizing, managing and accessing large amount of information available in today's digital world. However, data changes frequently in today's digitally connected world. Taxonomy needs to be evolved when underlying data set changes, otherwise, it could not represent the changed data set accurately. Many automatic techniques are available that are generating taxonomy effectively for large to small data sets but, very less focus has been made on the evolution of taxonomy. One way to update taxonomy is to regenerate it from scratch. In this research, we attempt an alternate way i.e., we propose a novel approach to evolve an existing taxonomy by incrementally updating it whenever changes in a data set occur. In the proposed system, an initial taxonomy is generated based on hierarchical clustering and then it is incrementally updated whenever underlying data set changes. We tested our methodology, in comparison to regeneration, on a text data set belonging to computing domain. We observed that regeneration of taxonomy from scratch when data set changes consumes more time and resources as compared to evolving the existing taxonomy. The results obtained show that evolution is a better approach as compared to regeneration, as far as time efficiency is considered. Moreover, quality of the evolved taxonomy is equally comparable to the regenerated taxonomy. Thus, one can get an updated taxonomy in a shorter period of time through evolution.
Taxonomy, Data models, Labeling, Merging, Clustering algorithms, Computational modeling, Natural language processing
Rabia Irfan, Sharifullah Khan, "Evolving the Taxonomy Based on Hierarchical Clustering Approach", 2016 International Conference on Frontiers of Information Technology (FIT), vol. 00, no. , pp. 81-86, 2016, doi:10.1109/FIT.2016.023
94 ms
(Ver 3.3 (11022016))