Issue No.01 - January/February (2011 vol.31)
Sean Arietta , University of Virginia
Jason Lawrence , University of Virginia
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MCG.2010.105
Many example-based image-processing algorithms operate on image patches (small windows of pixels). Such algorithms are commonly used for texture synthesis, resolution enhancement, image denoising, colorization, and hole filling. One barrier to the widespread adoption and performance of these techniques is inaccessibility to a large, varied collection of image patches. The authors describe a database of one trillion image patches assembled from one million natural images downloaded from the Internet. They also describe and analyze two systems for performing nearest-neighbor searches over this database that use the parallel-programming frameworks Hadoop and MPI, respectively. To demonstrate this database's utility as a research tool, they used it to investigate the fundamental relationships between patch size, amount of training data, and expected accuracy of the closest matches. They report a closed-form analytic expression that relates these three quantities, letting them predict any one from the other two. The findings show that massive databases are necessary to achieve reliable performance for even moderate-size patches. These findings also offer important and heretofore absent guidelines for practitioners and researchers interested in working with and improving such data-driven systems.
natural images, image processing, nearest neighbor, image patches, image databases, kd-trees, locality-sensitive hashing, LSH, distributed processing, image search, computer graphics, graphics and multimedia
Sean Arietta, Jason Lawrence, "Building and Using a Database of One Trillion Natural-Image Patches", IEEE Computer Graphics and Applications, vol.31, no. 1, pp. 9-19, January/February 2011, doi:10.1109/MCG.2010.105