loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI'06)
A Generalized Hidden Markov Model Approach for Web Information Extraction
Hong Kong, China
December 18-December 22
ISBN: 0-7695-2747-7
Ping Zhong, City University of New York, USA
Jinlin Chen, City University of New York, USA
A Generalized Hidden Markov Model (GHMM) which extends traditional HMMs by making use of Web-specific information for Web information extraction is presented in this paper. Web content blocks are used instead of content terms as basic extraction unit in our approach. Besides, instead of using the traditional sequential state transition order, the state transition orders of GHMMs are detected based on layout structures of the corresponding web pages. Furthermore, multiple emission features are applied instead of single emission feature. In this way GHMMs can better accommodate Web information extraction. Experiments show promising results of GHMMs.
Citation:
Ping Zhong, Jinlin Chen, "A Generalized Hidden Markov Model Approach for Web Information Extraction," wi, pp.709-718, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.