2011 Sixth Annual Chinagrid Conference (2011)
Dalian, Liaoning China
Aug. 22, 2011 to Aug. 23, 2011
Domain terms play a crucial role in many research areas, which has led to a rise in demand for automatic domain terms extraction. In this paper, we present a two-level evaluation approach based on term hood and unit hood to extract Chinese domain compound terms automatically, which takes the character-level and word-level information into account. To achieve this, we incorporate semantic features by using the word segmentation to recognize single word terms, then leverage the improved C-value and heuristic methods such as word formation pattern and word formation power to evaluate candidates at both levels. By validating our approach with several existing dictionaries, a significant improvement of compound terms detection is achieved. Experiments in legal corpus show our method is superior over other compared methods.
Domain Term Extraction, Compound Term, Chinese Word Segmentation, CCT C-value
J. Kang, T. Liu, H. Hu and X. Du, "Discovering Chinese Compound Term Using Termhood and Unithood Measures," 2011 Sixth Annual Chinagrid Conference(CHINAGRID), Dalian, Liaoning China, 2011, pp. 60-67.