The Community for Technology Leaders
Proceedings. 20th International Conference on Data Engineering (2004)
Boston, Massachusetts
Mar. 30, 2004 to Apr. 2, 2004
ISSN: 1063-6382
ISBN: 0-7695-2065-0
pp: 817
Neeraj Agrawal , IBM India Research Lab
Rema Ananthanarayanan , IBM India Research Lab
Rahul Gupta , IBM India Research Lab
Sachindra Joshi , IBM India Research Lab
Raghu Krishnapuram , IBM India Research Lab
Sumit Negi , IBM India Research Lab
ABSTRACT
Data presented on commerce sites runs into thousands of pages, and is typically delivered from multiple back-end sources. This makes it difficult to identify incorrect, anomalous, or interesting data such as $9.99 air fares, missing links, drastic changes in prices and addition of new products or promotions. In this paper, we describe a system that monitors Websites automatically and generates various types of reports so that the content of the site can be monitored and the quality maintained. The solution designed and implemented by us consists of a site crawler that crawls dynamic pages, an information miner that learns to extract useful information from the pages based on examples provided by the user, and a reporter that can be configured by the user to answer specific queries. The tool can also be used for identifying price trends and new products or promotions at competitor sites. A pilot run of the tool has been successfully completed at the ibm.com site.
INDEX TERMS
null
CITATION

N. Agrawal, S. Joshi, R. Ananthanarayanan, S. Negi, R. Krishnapuram and R. Gupta, "EShopMonitor: A Web Content Monitoring Tool," Proceedings. 20th International Conference on Data Engineering(ICDE), Boston, Massachusetts, 2004, pp. 817.
doi:10.1109/ICDE.2004.1320055
85 ms
(Ver 3.3 (11022016))