The Community for Technology Leaders
2016 International Conference on Big Data and Smart Computing (BigComp) (2016)
Hong Kong, China
Jan. 18, 2016 to Jan. 20, 2016
ISSN: 2375-9356
ISBN: 978-1-4673-8795-8
pp: 485-488
HaeYong Shin , Korea University, Seoul, Republic of Korea
Byung-Gul Ryu , Korea University, Seoul, Republic of Korea
Woo-Jong Ryu , Korea University, Seoul, Republic of Korea
GeunJae Lee , Korea University, Seoul, Republic of Korea
SangKeun Lee , Korea University, Seoul, Republic of Korea
ABSTRACT
The Open Directory Project (ODP) is a large scale, high quality and publicly available web directory. Many studies and real-world applications build on an ODP-based classifier. However, existing approaches use traditional bag-of-words representation of text to develop an ODP-based classifier and words alone do not always provide atomic units of semantic meaning. In this paper, we propose a novel framework to better understand the semantic meaning of text by bringing bag-of-phrases to ODP-based text classification. The proposed method employs a syntactic tree to extract phrases from ODP and applies a phrase selection method to alleviate the high dimensionality problem of bag-of-phrases. The conducted evaluation results demonstrate that our approach outperforms the state-of-the-art methods in classification performance.
INDEX TERMS
Semantics, Syntactics, Training, Taxonomy, Training data, Silicon, Testing
CITATION

HaeYong Shin, Byung-Gul Ryu, Woo-Jong Ryu, GeunJae Lee and SangKeun Lee, "Bringing bag-of-phrases to ODP-based text classification," 2016 International Conference on Big Data and Smart Computing (BigComp)(BIGCOMP), Hong Kong, China, 2016, pp. 485-488.
doi:10.1109/BIGCOMP.2016.7425975
93 ms
(Ver 3.3 (11022016))