The Community for Technology Leaders
RSS Icon
Subscribe
Los Angeles, CA
March 31, 2009 to April 2, 2009
ISBN: 978-0-7695-3507-4
pp: 134-138
ABSTRACT
A Web data Extraction technique based on label library is proposed for extracting information from data intensive Web pages in this paper. It eliminates conception ambiguity of the contents of Web pages with the label library, mines data regions by using MDR repeated patterns discovery algorithm, recognizes their structure and extracts data from them through a novel hierarchic pattern recognition and data extraction algorithm. Experiments showed it has perfect effect.
INDEX TERMS
Web information extraction, label library, data intensive Web pages
CITATION
Shoubiao Tan, Jin Fan, Yuan Jiang, "Web Data Extraction Based on Label Library", CSIE, 2009, 2009 WRI World Congress on Computer Science and Information Engineering, CSIE, 2009 WRI World Congress on Computer Science and Information Engineering, CSIE 2009, pp. 134-138, doi:10.1109/CSIE.2009.595
31 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool