loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06)
Automatic Keyword Extraction Using Linguistic Features
Hong Kong, China
December 18-December 22
ISBN: 0-7695-2702-7
Xinghua Hu, University of California, Santa Cruz
Bin Wu, University of California, Santa Cruz
This paper describes a novel keyword extraction algorithm Position Weight (PW) that utilizes linguistic features to represent the importance of the word position in a document. Topical terms and their previous-term and next-term co-occurrence collections are extracted. To measure the degree of correlation between a topical term and its co-occurrence terms, three methods are employed including Term Frequency Inverse Term Frequency (TFITF), Position Weight Inverse Position Weight (PWIPW), and CHI-Square (?2). The co-occurrence terms that have the highest degree of correlation and exceed a co-occurrence frequency threshold are combined together with the original topical term to form a final keyword. With the linear computational complexity of the algorithm, the vector space of documents in a large corpus or boundless web can be quickly represented by sets of keywords, which makes it possible to retrieve large-scale information fast and effectively.
Citation:
Xinghua Hu, Bin Wu, "Automatic Keyword Extraction Using Linguistic Features," icdmw, pp.19-23, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.