The Community for Technology Leaders
Green Image
Issue No. 10 - October (2008 vol. 41)
ISSN: 0018-9162
pp: 20-22
Finding an Unexpected Use for Captcha Security Technology
An academic researcher has devised a way to use Captcha technology—originally designed to keep hackers' computers from automatically setting up e-mail accounts for use in spam attacks—to help transcribe old manuscripts.
Carnegie Mellon University assistant professor Luis von Ahn—who invented Captcha (completely automated public Turing test to tell computer and humans apart) with CMU professor Manuel Blum in 2000—has refined the technology into what he calls reCaptcha.

A researcher has modified the Captcha security technology so that it can help transcribe old manuscripts. Captcha requires users to transcribe displayed words before entering a website. This keeps hackers' computers from, for example, automatically accessing large numbers of sites to set up e-mail accounts for spam attacks. reCaptcha is similar but sends website visitors words from transcribed documents that optical character recognition can't identify. Users then successfully identify them.

The nonprofit Internet Archive ( is using reCaptcha to help automate the daily digitization of up to 1,000 books, newspapers, and other documents—some dating back to the earliest use of printing—from 70 US universities and libraries, said Brewster Kahle, the information repository's digital librarian and founder.
In addition, von Ahn said, the New York Times is using the technology to digitize its archive, which dates to 1851. Any similar project could work with the technique, he noted.
Captchas are little boxes on web pages that show a distorted set of letters and sometimes numbers, often with lines or other patterns running through them, that users must transcribe correctly to enter or work with a site. Because computers cannot easily read the letters and numbers, they can't perform the necessary transcription. The technology is now frequently used in numerous ways, such as by sites that sell tickets to sporting or entertainment events to keep people from using computers to automatically make large numbers of purchase requests.
Instead of displaying a computer-generated random collection of letters and numbers, reCaptcha presents someone trying to access a participating website with a word from an old manuscript that an optical-character-recognition (OCR) system scanned but couldn't understand.
reCaptcha gives each word that the OCR system couldn't recognize to a user along with a control word that the system has already identified. If the user can read both words, reCaptcha assumes their identification of the first word is correct. reCaptcha gives the same originally unrecognizable word to several other users to improve confidence in the identification. The application then sends the transcriptions back to the organizations that need the information.
Von Ahn said he recently released reCaptcha for general use without charge and that about 50,000 websites are using it for authentication. This transcribes more than 18 million words per day.
Generally, organizations use OCR to transcribe large documents and human scanning to find mistakes. However, this can be time-consuming because OCR is not always very accurate. In fact, it can have an error rate of up to 20 percent for books published before 1900 because the older documents are often smudged or faded, von Ahn explained.
reCaptcha's accuracy rate is 99 percent for all documents, he said.
Fight Flares over White Space for Wireless
The battle over soon-to-be-available spectrum that the US Federal Communications Commission is considering approving for broadband wireless communications is intensifying as the FCC gets closer to making a decision.
The four-year fight is over white space, pieces of largely unused spectrum between TV stations' frequencies that generally act as buffers to ensure that the channels don't interfere with one another. Proponents want to use the spectrum when US television broadcasters move to an all-digital format in February 2009. The end of analog transmissions will create more available white-space frequencies.
The amount will vary by market, and the FCC won't be able to identify exactly which spectrum will be available until after the digital-broadcasting transition, said Lynn Claudy, senior vice president of science and technology for the National Association of Broadcasters (NAB; He said most of the broadcasting would occur in spectrum currently used by UHF stations, in 6-MHz-wide channels in frequencies between 512 and 698 MHz.
Claudy said most of the major and secondary television markets have so many stations that little white space will be available in cities even after February, which will limit the technology's application to less-populated areas. However, he noted, these are precisely the areas that are underserved by current broadband technologies.
About 75 percent of white space is unused today, according to Jake Ward, spokesperson for the Wireless Innovation Alliance (WIA;, a consortium of companies, organizations, and individuals that support unlicensed use of the technology. Wireless microphones and some TV broadcasts utilize the other 25 percent.
Proponents say companies could use white space to provide broadband wireless services that would compete with those offered by telephone and cable companies, he noted. The capacity of the spectrum being used would let the technology push signals with speed and fidelity and provide Internet-access, high-speed data, telephony, and other services.
According to Ward, white-space usage would add competition to the network-services market, thereby lowering prices and encouraging carriers to provide new and improved services and technological innovation.
Claudy said opponents—including the NAB, individual broadcasters, and wireless-microphone vendor Shure—fear that using white space for wireless services would interfere with their signals.
Proponents—including Dell Computer, Google, and other WIA and White Spaces Coalition members—say this won't happen because the FCC could test and license devices so that they wouldn't cause interference. In addition, Ward said, the FCC is testing devices, developed by white-space proponents, that would minimize interference.
According to Forrester Research analyst Charles Golvin, the FCC may approve white-space use because it wants the wireless spectrum to be better utilized.
However, he added, the FCC recognizes it must set rules that ensure that white-space use won't interfere with TV broadcasts. He also said the agency would likely implement stiff financial penalties against service providers or device makers that violate these rules.
Researchers are still working on technical issues.
Golvin estimated it could take five years before white-space use can begin in earnest.
Material that Slows Light Signals Could Speed Up the Internet
Researchers are looking into ways to increase the Internet's speed by using metamaterials to slow down the light that carries data through the network.
The technique, developed by University of Surrey scientists, would enable transmissions to use optical signals throughout the Internet. This would eliminate the current need to convert the signals to an electronic format and then back to optical, which complicates the transmission process.
Much of the Internet's infrastructure is based on fiber optics, with various data sets being carried at various frequencies within a light beam. The infrastructure uses equipment such as optical demultiplexers to separate out the frequencies so that each data set can be sent to its destination.
However, the optical signals are too fast for the system to further process properly and are difficult to control and manipulate. Thus, the system converts them into electrical signals, which are slower and easier to work with.
The system stores each frequency in an electronic component until the processing is finished. At that point, the signals go to a photodetector and then to a semiconductor laser, which recreates light signals with the same frequencies as the original transmission, enabling the system to send them to the intended recipients.
These steps add complexity and cost to the process.
Using devices made of metamaterials could slow down the optical signals enough to eliminate the need for electrical conversion, said University of Surrey professor Ortwin Hess. This would increase transmission speeds and better enable the Internet to handle bandwidth-intensive services such as video-on-demand and real-time video chat.
Metamaterials are synthetic substances and are usually composite. They don't occur naturally and have properties typically not found in natural substances.
One of the properties exhibited in the metamaterials that Hess' team uses—which are a combination of metallic and dielectric structures—is a negative refractive index. This property measures a material's optical density and thus the degree to which light slows when passing through. Natural materials don't have a negative refractive index.
Hess said his team sandwiched a solid-state, slow-light metamaterial layer between two other layers. When the researchers sent light through the metamaterial, the signals slowed enough for the system to process them without having to convert them to an electrical format.
Hess said his team is trying to resolve the primary challenges for working with metamaterials in general: making the tiny structures less fragile and reducing signal loss.
News Briefs written by Linda Dailey Paulson, a freelance technology writer based in Ventura, California. Contact her at
103 ms
(Ver 3.1 (10032016))