This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2009 WRI Global Congress on Intelligent Systems
Information Extraction Based on Table Area Locating for E-Commerce Websites
Xiamen, China
May 19-May 21
ISBN: 978-0-7695-3571-5
Efficient extracting merchandise information is the key technology for e-commerce searching engine. By analyzing web table characters of HTML pages of e-commerce websites, this article proposes the notion of table area locating, and decomposes the merchandise information extraction into three key processes: searching Preparative Core Areas (PCA), locating Core Area (CA) and extracting attribute values from Core-Area, and then design the algorithm of locating Core Area and the algorithm of extracting attributes names and values. We experimented with the new approach on some HTML pages from various e-commerce websites. The results indicate that this approach can locate merchandise information area and extract attributes names and values efficiently, and have better performance of precise and recall.
Index Terms:
Web Tables, DOM tree, Area location, Information extraction
Citation:
Liubo Ouyang, Rui Dong, Beiji Zou, "Information Extraction Based on Table Area Locating for E-Commerce Websites," gcis, vol. 4, pp.441-445, 2009 WRI Global Congress on Intelligent Systems, 2009
Usage of this product signifies your acceptance of the Terms of Use.