Eighth International Conference on Document Analysis and Recognition (ICDAR'05) A Learning Approach to Discovering Web Page Semantic Structures Seoul, Korea August 31-September 01 ISBN: 0-7695-2420-6
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDAR.2005.19
This paper proposes a learning approach for discovering the semantic structure of web pages. The task includes partitioning the text on a web page into information blocks and identifying their semantic categories. We employed two machine learning techniques, Adaboost and SVMs, to learn from a labeled web page corpus. We evaluated our approach on general web pages from the World Wide Web and obtained encouraging results. This work can be beneficial to a number of web-driven applications such as search engines, web-based question answering, web-based data mining as well as voice enabled web navigation.
Citation:
Junlan Feng, Patrick Haffner, Mazin Gilbert, "A Learning Approach to Discovering Web Page Semantic Structures," icdar, pp.1055-1059, Eighth International Conference on Document Analysis and Recognition (ICDAR'05), 2005 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||