2002 International Conference on Parallel Processing Workshops (ICPPW'02)
Using High Performance Systems to Build Collections for a Digital Library
Vancouver, B.C., Canada
August 18-August 21
ISBN: 0-7695-1680-7
Nothing is more distributed than the Web, with its content spread across thousands of servers. High performance hardware and software is essential for an effective download, analysis, and organization of this content. We describe our experience with a highly parallel Web crawling system (Mercator) to construct — automatically — collections of scientific resources for the National Science Digital Library.