Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference on (2008)
Dec. 9, 2008 to Dec. 12, 2008
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/WIIAT.2008.17
Web forum has become an important resource on the Web due to its rich information contributed by millions of Internet users every day. Meanwhile, thousands of junk or valueless messages exist in web forum. Recognizing high-quality topics should be fundamental tasks in Search Engine and Web Mining systems. However, it is not a trivial problem to quantify high-quality topics on web forum. Users face a daunting challenge in identifying a small subset of topics worthy of their attention. In this paper, we present several characteristics to measure high-quality topic, based on these characteristics, we propose a novel model to recognize high-quality topics on web forum. Our model consists of three steps. First, time series signals which contain distinctive characteristics between high-quality topics and non-high-quality topics are extracted from topics. Second, features are obtained from signals by using Wavelet Packet Transform (WPT). Third, upon the features, high-quality topics are recognized by using Back-Propagation Neural Network. Conducting experiments on Tencent Message Boards which have 2,710,994 messages and 189,962 authors ranging from Jan 1, 2005 to Nov 12, 2007, we demonstrate the efficiency of our model, showing that the average accuracy rate of high-quality topic recognition is 95% and nearly 50,000 topics can be recognized in one second.
High-Quality topic, Web Forum, Wavelet Packet Transform, Feature Extraction
X. Cheng, Y. Huang and Y. Chen, "A Wavelet-Based Model to Recognize High-Quality Topics on Web Forum," Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference on(WI-IAT), vol. 01, no. , pp. 343-351, 2008.