2013 8th ChinaGrid Annual Conference (2011)
Dalian, Liaoning China
Aug. 22, 2011 to Aug. 23, 2011
ISBN: 978-0-7695-4472-4
pp: 60-67
Domain terms play a crucial role in many research areas, which has led to a rise in demand for automatic domain terms extraction. In this paper, we present a two-level evaluation approach based on term hood and unit hood to extract Chinese domain compound terms automatically, which takes the character-level and word-level information into account. To achieve this, we incorporate semantic features by using the word segmentation to recognize single word terms, then leverage the improved C-value and heuristic methods such as word formation pattern and word formation power to evaluate candidates at both levels. By validating our approach with several existing dictionaries, a significant improvement of compound terms detection is achieved. Experiments in legal corpus show our method is superior over other compared methods.
Domain Term Extraction, Compound Term, Chinese Word Segmentation, CCT C-value
