2003 IEEE International Conference on E-Commerce Technology (CEC'03)
Page Digest for Large-Scale Web Services
Newport Beach, California
June 24-June 27
ISBN: 0-7695-1969-5
We introduce Page Digest, a mechanismfor efficient storage and processing of Web documents. The Page Digest design encourages a clean separation of the structural elements of Web documents from their content. Its encoding transformation produces many of the advantages of traditional string digest schemes yet remains invertible without introducing significant additional cost or complexity. Using the Page Digest encoding can provide at least an order of magnitude speedup when traversing a Web document as compared to using a standard Document Object Model implementation. Our experiments show that change detection using Page Digest operates in linear time, offering 75% improvement in execution performance compared with existing systems. In addition, the Page Digest encoding can reduce the tag name redundancy found in Web documents, allowing 30% to 50% reduction in document size.
Citation:
Daniel Rocco, David Buttler, Ling Liu, "Page Digest for Large-Scale Web Services," cec, pp.381, 2003 IEEE International Conference on E-Commerce Technology (CEC'03), 2003