This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Mining Web Pages for Data Records
November/December 2004 (vol. 19 no. 6)
pp. 49-55
Bing Liu, University of Illinois at Chicago
Robert Grossman, University of Illinois at Chicago
Yanhong Zhai, University of Illinois at Chicago
Much information on the Web is contained in regularly structured objects, or data records. Data records often present their host pages' essential information, such as lists of products and services. Mining data records to extract this information can help you provide value-added services. Existing approaches to data extraction on the Web include supervised learning and automatic techniques. Supervised learning requires substantial human effort, and current automatic techniques provide poor results. To solve this problem, the MDR (mining data records) system exploits two key observations about the layout of data records in Web pages and employs a string-matching algorithm. Experiments show that this new automatic technique significantly outperforms existing methods. In addition, it mines both contiguous and noncontiguous data records.
Index Terms:
data mining, Web mining, Web data extraction, Web data, databases
Citation:
Bing Liu, Robert Grossman, Yanhong Zhai, "Mining Web Pages for Data Records," IEEE Intelligent Systems, vol. 19, no. 6, pp. 49-55, Nov.-Dec. 2004, doi:10.1109/MIS.2004.68
Usage of this product signifies your acceptance of the Terms of Use.