The Community for Technology Leaders
2013 IEEE 13th International Conference on Data Mining Workshops (2006)
Hong Kong, China
Dec. 18, 2006 to Dec. 22, 2006
ISBN: 0-7695-2702-7
pp: 19-23
Bin Wu , University of California, Santa Cruz
Xinghua Hu , University of California, Santa Cruz
ABSTRACT
This paper describes a novel keyword extraction algorithm Position Weight (PW) that utilizes linguistic features to represent the importance of the word position in a document. Topical terms and their previous-term and next-term co-occurrence collections are extracted. To measure the degree of correlation between a topical term and its co-occurrence terms, three methods are employed including Term Frequency Inverse Term Frequency (TFITF), Position Weight Inverse Position Weight (PWIPW), and CHI-Square (?2). The co-occurrence terms that have the highest degree of correlation and exceed a co-occurrence frequency threshold are combined together with the original topical term to form a final keyword. With the linear computational complexity of the algorithm, the vector space of documents in a large corpus or boundless web can be quickly represented by sets of keywords, which makes it possible to retrieve large-scale information fast and effectively.
INDEX TERMS
null
CITATION
Bin Wu, Xinghua Hu, "Automatic Keyword Extraction Using Linguistic Features", 2013 IEEE 13th International Conference on Data Mining Workshops, vol. 00, no. , pp. 19-23, 2006, doi:10.1109/ICDMW.2006.36
93 ms
(Ver 3.3 (11022016))