The Community for Technology Leaders
2016 International Conference on Big Data and Smart Computing (BigComp) (2016)
Hong Kong, China
Jan. 18, 2016 to Jan. 20, 2016
ISSN: 2375-9356
ISBN: 978-1-4673-8795-8
pp: 199-206
Hancheol Park , Department of Knowledge Service Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
Gahgene Gweon , Department of Knowledge Service Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
Jeong Heo , Knowledge Mining Team, Electronics and Telecommunications Research Institute (ETRI), Daejeon, Republic of Korea
ABSTRACT
Paraphrase extraction is a task that involves the extraction of pairs of paraphrase expressions from a large-scale corpus. Because existing extraction methods are mostly designed for morphologically poor languages such as English, we present a method suited for agglutinative languages that are morphologically complex by attaching inflectional affixes to word stems. Specifically, we use the Korean language as a case study to address two types of problems that occur because existing methods model each lexical form as a separate word. The first problem is lexical data sparsity, and the second problem is not considering the morphological word structure. To mitigate these problems, we propose a novel phrasal paraphrase extraction method called affix modification-based bilingual pivoting method (AMBPM), which extends the existing bilingual pivoting method (BPM). Our experiments show that our proposed method significantly outperforms two state-of-the-art paraphrase extraction methods, namely the syntactic constraints-based bilingual pivoting method (SCBPM) and the skip-gram word embedding model with respect to meaning preservation and grammaticality of the extracted paraphrase pairs.
INDEX TERMS
Semantics, Syntactics, Natural language processing, Data mining, Telecommunications, Knowledge discovery
CITATION

H. Park, G. Gweon and J. Heo, "Affix modification-based bilingual pivoting method for paraphrase extraction in agglutinative languages," 2016 International Conference on Big Data and Smart Computing (BigComp)(BIGCOMP), Hong Kong, China, 2016, pp. 199-206.
doi:10.1109/BIGCOMP.2016.7425914
98 ms
(Ver 3.3 (11022016))