Issue No.12 - December (2007 vol.8)
Published by the IEEE Computer Society
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MDSO.2007.69
Large-scale digital libraries and book digitization projects are poised to go beyond prototypes into the mass market. This is generally accepted as a universal boon. However, whether end users anywhere in the world will be able to easily access materials across these projects is another matter. Much of digitization's promise might diminish unless commercial, nonprofit, and publicly funded resource organizations can agree to cooperate.
Large-scale digital libraries and book digitization projects are poised to go beyond prototypes into the mass market. Making these materials available to end users anywhere in the world is generally accepted as a universal boon. However, whether users will be able to easily access materials across projects is another matter. Much of digitization's promise might diminish unless commercial, nonprofit, and publicly funded resource organizations can cooperate.
"All the published literature of humankind in the next generation will be in digital form," says Brewster Kahle, cofounder of the Internet Archive and one of the driving forces behind the nonprofit Open Content Alliance (OCA, http://www.opencontentalliance.org), an open digitization consortium. "And all the older materials that will be used by younger people—except for a very few—will be online. So, if we want something to be used by the next generation, it has to be online. That's an understood premise. It's now also understood that it's not that expensive to get there."
We have the technology
The OCA already has full-service digitization technology in eight libraries in the US, Canada, and the UK, serving 80 member libraries. It plans to double the number of service centers within a year. Indeed, Kahle says the real bone of contention in advancing digital libraries will be in making communities realize that the technology isn't expensive and that they needn't relinquish the task to commercial interests.
"The technology for digitizing a book at beautiful quality is 10 cents a page," he says. That cost covers optical-character-recognition digitization, compression, packaging in multiple downloadable formats, cataloguing, and hosting at redundant sites in North America and Europe for long-term digital preservation. It includes a PC interface that people can use to print from home. Alternatively, users can download, print, and bind through Amazon or similar book-binding vendor offerings.
"All of that is in full production," Kahle says. "At 10 cents a page, or $30 a book, the idea of taking 1,000 books or 10,000 books that are important to a community, library, or individual is now easy. There are scanning centers all over the world that can help not only scan them but also produce searchable forms, then offer them for joint services to the whole world."
Certainly, librarians tackling the new digitization projects contend with complex technological issues. Notable among them is creating metadata schemas that work across multiple technologies and organizations. How best to provide multilingual services is another issue. However, the issue of who will control the digitization process, and its concomitant economic and access ramifications, is far more convoluted.
"We're not just talking about a $15-billion-a-year industry," Kahle says. "We're talking about how people think, how people pass information on to their young, how we as a society conceptualize ourselves. And if we have that dominated by one or two large corporations, we will be living in an Orwellian world. So the stakes are very high, beyond the financials of it."
Global projects under way
The most visible project now is the World Digital Library ( http://portal.unesco.org/en/ev.php-URL_ID=40277&URL_DO=DO_TOPIC&URL_SECTION=201.html), a partnership between the US Library of Congress and the United Nations Educational, Scientific, and Cultural Organization (UNESCO). The project aims to digitize unique cultural artifacts from throughout the world and make them globally available via the Internet at no charge. Example materials include manuscripts, maps, rare books, musical scores, recordings, films, prints, photographs, and architectural drawings. The WDL's developers featured a prototype at the October meeting that formalized the agreement; WDL director John Van Oudenaren says the project should be live by late 2008 or early 2009.
The WDL project holds some distinct advantages over some other multinational projects preparing for a public launch. First, the WDL plans to use no copyrighted material, so there will be no delays in negotiating rights with authors, artists, or composers. Second, the WDL partner institutions will all work from the same technological foundation, so interoperability should prove less daunting than what other projects face.
"The WDL is almost an end-to-end process," says Jill Cousins, director of the European Digital Library ( http://www.edlproject.eu), a separate European Union-funded digital library project. "They make the selection of material, digitize it, and make the metadata work according to the way they structured the system. Then they give access to it and create mirror sites for the other countries to get back what's been digitized."
The EDL aims to have its prototype working by November 2008. Cousins says the project has to negotiate technological agreement between all EU libraries, archives, and museums that wish to participate. Currently, the European Library, under whose auspices the EDL is being developed, has 47 partner libraries, 32 of which are fully digitized. However, she also says working through these negotiations might make the EDL more scalable in the long term.
"That's probably correct," WDL's Van Oudenaren says, "but the caveat is that scaling in itself is not our objective. Obviously we have to scale to a higher level than we are now, but we don't have any ambition to link all the digital content of the world and they do. We've set the bar a little lower for ourselves."
Whose library is it?
Interoperability poses several difficulties. Digitization is available in several common formats for text-heavy books. Developing metadata for such books is therefore easier than it is for multimedia materials spread across multiple institutions. Metadata compatibility will likely present the greatest challenges and the greatest opportunity for developers in this market. Van Oudenaren says the WDL and its partners can agree to a uniform metadata strategy for newly digitized material. However, the WDL can't expect its partner institutions to go back into existing digitized collections and redo their metadata schemes.
"We are looking at automated tools that can massage that metadata, and if vital information is in a different format, then we can look at automated ways to reorganize it into a homogeneous format. We did not want to spend years and years and years on meetings about how the world library community will come up with metadata standards. That's not realistic. It's just not going to happen."
Cousins says the EDL will most likely opt for a metadata scheme based on the Dublin Core standard ( http://dublincore.org). Presumably, as the EDL work progresses, mapping technologies will evolve to support semantic queries. This, in turn, will enable application-level interoperation without the need for separate, complex, and expensive application-level interoperability profiles.
Kahle says these integration layers offer a great economic opportunity for a wide range of developers and content providers and aggregators, as long as the open-access model prevails. Mike Mahoney, senior research analyst at Nerac, a Connecticut-based research and consulting firm, agrees. Mahoney says a new model of expanded access and micropayments could increases sales of journal articles, book excerpts, and other materials.
However, a schism has developed between the advocates of open access, such as Kahle, and backers of commercial library-scanning projects. The Google Book Search project ( http://books.google.com), for example, greatly minimizes scanning costs for its library partners. However, it also restricts third-party access to content scanned under the project. The third parties don't necessarily include other search engines, but they could.
Kahle says the Google restrictions are emblematic of an online culture already represented by monopolistic or duopolistic entities. In the US, for example, Kahle says LexisNexis and Westlaw dominate online scholarly legal material, while Elsevier dominates scientific material. This mishmash of proprietary fiefdoms, he says, is fragmenting access to authoritative digital reference material and literature. Kahle compares it to the incompatible proprietary networks that prevailed in the 1980s and early 1990s.
"Unfortunately, all those Google books may benefit Google, but the public will largely be left behind in the current contracts," he says. "The idea of one search engine is not good enough; the idea of one library is not good enough; the idea of one publisher is not good enough."
The ultimate realization of globally accessible libraries might come down to how digitization projects are funded. Kahle says the early commercially funded ventures (which include Google Books and Microsoft's Live Search Books) might eventually lose momentum as libraries begin to adapt traditional, unrestricted funding mechanisms for their new digital needs. And the success of broader funding will ultimately come down to whether the general population has the will to make it so. In WDL's case, the answer to that question could come soon.
"When we do release this plan later in 2008, it will have all kinds of cost estimates in there," Van Oudenaren says.
Because the WDL is a partnership between the Library of Congress and UNESCO, it has many possible funding streams, depending on different partner institutions' national and philanthropic relationships.
Kahle says the issue goes beyond funding public library projects. He says the debate is actually a pitched battle for the uppermost layers of the Internet itself, including works already in the public domain.
"We have people who are trying to monopolize, to close, the content layer," he says. "If we've spent the last 30 years building an open infrastructure and we lose it at the content layer, it will all be for naught."