loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fourth Latin American Web Congress (LA-WEB'06)
Analysis of Web Search Engine Clicked Documents
Cholula, Mexico
October 25-October 27
ISBN: 0-7695-2693-4
David F. Nettleton, University Pompeu Fabra, Spain
Liliana Calder?n-Benavides, University Pompeu Fabra, Spain
Ricardo Baeza-Yates, Yahoo! Research, Spain
In this paper we process and analyze web search engine query and click data from the perspective of the documents (URL?s) selected. We initially define possible document categories and select descriptive variables to define the documents. The URL dataset is preprocessed and analyzed using some traditional statistical methods, and then processed by the Kohonen SOM clustering technique[5], which we use to produce a two level clustering. The clusters are interpreted in terms of the document categories and variables defined initially. Then we apply the C4.5[9] rule induction algorithm to produce a decision tree for the document category. The objective of the work is to apply a systematic data mining process to click data, contrasting non-supervised (Kohonen) and supervised (C4.5) methods to cluster and model the data, in order to identify document profiles which relate to theoretical user behavior, and document (URL) organization.
Citation:
David F. Nettleton, Liliana Calder?n-Benavides, Ricardo Baeza-Yates, "Analysis of Web Search Engine Clicked Documents," la-web, pp.209-219, Fourth Latin American Web Congress (LA-WEB'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.