The Community for Technology Leaders
Document Image Analysis for Libraries, International Workshop on (2004)
Palo Alto, California
Jan. 23, 2004 to Jan. 24, 2004
ISBN: 0-7695-2088-X
pp: 104
William Barrett , Brigham Young University
Luke Hutchison , Brigham Young University
Dallan Quass , Brigham Young University
Heath Nielson , Brigham Young University
Douglas Kennard , Brigham Young University
Large-scale, multi-terabyte digital libraries are becoming feasible due to decreasing costs of storage, CPU, and bandwidth. However, costs associated with preparing content for input into the library remain high due to the amount of human labor required. This paper describes the Digital Microfilm Pipeline — a sequence of image processing operations used to populate a large-scale digital library from a "mountain" of microfilm and reduce the human labor involved. Essential parts of the pipeline include algorithms for document zoning and labeling, consensus-based template creation, reversal of geometric transformations and Just-In-Time Browsing, an interactive technique for progressive access of image content over a low-bandwidth medium. We also suggest more automated approaches to cropping, enhancement and data extraction.

