loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
35th Annual Hawaii International Conference on System Sciences (HICSS'02)-Volume 3
Big Island, Hawaii
January 07-January 10
ISBN: 0-7695-1435-9
DTDMI02In the context of information retrieval, traditional collection selection algorithms have been widely studied. These algorithms utilize language models, a representation of the contents of each text collection over which selection is to be performed, but these language models cannot always be easily acquired. Query-based sampling is a technique by which these language models are discovered by interacting with a collection and observing the results. Previous work has shown query-based sampling to be a viable solution to the problem of discovering the contents of text collections when the information cannot be otherwise obtained. However, the characteristics of language models of WWW collections created using query-based sampling have not yet been studied. This work evaluates two query- based sampling techniques for building language models of three World Wide Web collections. Experimental results support the effectiveness of query-based sampling as a solution for building language models of web collections. This work also proposes a metric by which it may be possible to determine the point at which further sampling of a given web collection can cease. This metric is used along with other metrics used in previous work to determine the fidelity of these language models.
Citation:
G. Monroe, J. French, A. Powell, "Obtaining Language Models of Web Collections Using Query-Based Sampling Techniques," hicss, vol. 3, pp.67b, 35th Annual Hawaii International Conference on System Sciences (HICSS'02)-Volume 3, 2002
Usage of this product signifies your acceptance of the Terms of Use.