This Article 
 Bibliographic References 
 Add to: 
Mining Web Pages for Data Records
November/December 2004 (vol. 19 no. 6)
pp. 49-55
Bing Liu, University of Illinois at Chicago
Robert Grossman, University of Illinois at Chicago
Yanhong Zhai, University of Illinois at Chicago
Much information on the Web is contained in regularly structured objects, or data records. Data records often present their host pages' essential information, such as lists of products and services. Mining data records to extract this information can help you provide value-added services. Existing approaches to data extraction on the Web include supervised learning and automatic techniques. Supervised learning requires substantial human effort, and current automatic techniques provide poor results. To solve this problem, the MDR (mining data records) system exploits two key observations about the layout of data records in Web pages and employs a string-matching algorithm. Experiments show that this new automatic technique significantly outperforms existing methods. In addition, it mines both contiguous and noncontiguous data records.
Index Terms:
data mining, Web mining, Web data extraction, Web data, databases
Bing Liu, Robert Grossman, Yanhong Zhai, "Mining Web Pages for Data Records," IEEE Intelligent Systems, vol. 19, no. 6, pp. 49-55, Nov.-Dec. 2004, doi:10.1109/MIS.2004.68
Usage of this product signifies your acceptance of the Terms of Use.