loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2008 10th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing
Clustering Heterogeneous Web Data using Clustering by Compression. Cluster Validity
Timisoara, Romania
September 26-September 29
ISBN: 978-0-7695-3523-4
The expansive nature of the Internet produced a vast quantity of unstructured data, compared to our conception of a conventional data base. The application of clustering on the World Wide Web is essential to get structured information from this sea of information. In this paper, we intend to test the results of a new clustering technique – clustering by compression – when applied to heterogeneous sets of data. The clustering by compression procedure is based on a parameter-free, universal, similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pair-wise concatenation). In order to validate the results, we calculate some quality indices. If the values we obtain prove a high quality of the clustering, in the near future we plan to include the clustering by compression technique into a framework for clustering heterogeneous web objects.
Index Terms:
clustering, heterogeneous data, cluster validity
Citation:
Alexandra Cernian, Dorin Carstoiu, Adriana Olteanu, "Clustering Heterogeneous Web Data using Clustering by Compression. Cluster Validity," synasc, pp.123-126, 2008 10th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, 2008
Usage of this product signifies your acceptance of the Terms of Use.