loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Eighth International Conference on Document Analysis and Recognition (ICDAR'05)
A Learning Approach to Discovering Web Page Semantic Structures
Seoul, Korea
August 31-September 01
ISBN: 0-7695-2420-6
Junlan Feng, AT&T LABS RESEARCH
Patrick Haffner, AT&T LABS RESEARCH
Mazin Gilbert, AT&T LABS RESEARCH
This paper proposes a learning approach for discovering the semantic structure of web pages. The task includes partitioning the text on a web page into information blocks and identifying their semantic categories. We employed two machine learning techniques, Adaboost and SVMs, to learn from a labeled web page corpus. We evaluated our approach on general web pages from the World Wide Web and obtained encouraging results. This work can be beneficial to a number of web-driven applications such as search engines, web-based question answering, web-based data mining as well as voice enabled web navigation.
Citation:
Junlan Feng, Patrick Haffner, Mazin Gilbert, "A Learning Approach to Discovering Web Page Semantic Structures," icdar, pp.1055-1059, Eighth International Conference on Document Analysis and Recognition (ICDAR'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.