, IBM T.J.Watson Research Center
Pages: pp. 4-5
By now, most readers of IC have heard about the disappearance of Jim Gray aboard his sailboat on 28 January. Gray is arguably one of the most famous computer scientists in the world, having won the Turing award as well as many other accolades, primarily for his pioneering work in database transaction processing.
When he failed to return from a sailing expedition near San Francisco, word spread among the large community of computer scientists whose lives Gray had affected over the years, and they sprang into action. There are many accounts by now of both the disappearance and its aftermath; one that appeared recently in Information Week, by Charles Babcock, is fairly extensive (see www.informationweek.com/news/showArticle.jhtml? articleID=198701579 for the article). At the time of this writing (early April), Gray has not been located despite these Herculean efforts. Nevertheless, I think this is a tale worth telling here. I'll recap the technological aspects of the search for Gray and discuss some thoughts about one of the systems used in the search.
When Gray was reported missing at sea, the US Coast Guard set about searching for him in the same fashion they search for any missing sailor. After a few days, they gave up, but the CS community realized it had technology available that could potentially guide the Coast Guard in the right direction. As Babcock reported (and as others did earlier), the database community realized they could access satellite imagery and do a more thorough analysis to try to spot Gray's vessel, the Tenacious. Some analyzed ocean currents to predict the area where the sailboat might be if it were adrift. Anytime there was a serious suggestion that the sailboat might be spotted, one of the team members would use a private airplane to do a close inspection. None of these proved fruitful, unfortunately.
How did they know where to look? Generally, they used the old divide-and-conquer approach so familiar to computer scientists. Just like projects such as SETI@Home have used idle computers to search automatically in a very noisy space for the proverbial needle in a haystack, the search space here was divided and examined individually, in parallel, by many volunteers. Interestingly, though, there wasn't sufficient expertise to have a computer program examine a satellite image and decide with confidence that it was a sailboat. Instead, the search was done by lots and lots of humans.
I first learned about the search from the blog of Werner Vogels, CTO of Amazon.com. He reported that Gray was missing, then that people were working together to try and locate him through technology such as satellites. It turned out that Amazon has a system for distributing tasks to human end-users, the Amazon Mechanical Turk, subtitled "Artificial Artificial Intelligence" because like the original "Mechanical Turk" in the 18th century, which purported to be an automated chess player, there actually is human intelligence behind the façade.
Vogels arranged to present individual satellite images as requests in the Mechanical Turk for end users to examine and respond to. Normally, a payment is associated with each of these requests (or "hits") — usually around US$0.10 to $1.00 — to entice people to respond to them. With the search for the Tenacious, however, the motives were completely altruistic, and thousands of volunteers quickly pored through hundreds of thousands of images for free.
As an aside, I examined some of these images myself, and I felt enormous pressure to Do the Right Thing. What if I looked at an image, declared it uninteresting, and missed the one chance to point the rescuers in the right direction? I asked Vogels for confirmation that several people had to decide something was uninteresting before they were done with it, and although he affirmed that this was the case, Babcock's recent article reported the opposite: "Each tile was viewed and rated by three volunteers, and only those that got a high rating from all three would move to a second stage of review." Now I'm nervous again.
Gray has been working on astronomy and satellite maps himself in recent years and has made many friends with relevant expertise. Babcock reported that a professor at Johns Hopkins University (JHU), Alex Szalay, effectively repeated the Mechanical Turk experiment with higher-quality images in a high-performance environment (the local network at JHU, using people on site). This approach found some strong possibilities, and even tracked a boat very much like the Tenacious, but it wasn't the boat they wanted.
Readers of my column are by now aware of how appalled I am by the extent to which scammers, spammers, and other ne'er-do-wells have infested the Internet. Unfortunately, it seemed it was only a matter of time before they were attracted to the Mechanical Turk. When I see people willing to pay a nominal fee for bloggers to link to their Web pages with specific text, I see an attempt by people to Googlebomb themselves — make it so that a Google search for a specific phrase is directed to them. Perhaps people will see through these requests and ignore them, or perhaps people will jump at a chance for a low-overhead but low-reward edit. It seems it will be necessary to police these environments just like bloggers have to watch for comment spam and other problems.
But how to manage the trade-offs between effort and reward? In the past few years, I've seen several sites grow up around the idea of paying people to respond to surveys (e-Rewards is the one that I come across the most). Initially, I found these intriguing and was willing to help out, until the time expense seemed to dominate the reward. Paying US$5 for a 15-minute survey, say, is a whole lot better than minimum wage, but it's not remarkable for a computer scientist. Thus I'm more likely to respond to a survey when I think I will impact someone or something, not because of the reward I'm offered.
What does this mean for systems like the Mechanical Turk? Internet-scale systems to help with altruistic problems, such as finding a cure for cancer or extraterrestrial intelligence, are nothing too new. People can participate at essentially no cost, and they choose to do so in large numbers. When human time is involved, something else is needed. The experience with the Mechanical Turk and the search for Gray demonstrates the willingness of complete strangers to sacrifice their time, not just their compute cycles, to help others. We need to identify other situations in which human interactions can be put to such good use and apply systems such as the Mechanical Turk to them. In the meantime, let's keep out the riff-raff. And let's be prepared, so the next time someone goes missing — someone who isn't nearly as famous as Jim Gray but is desperate nevertheless — we can use this technology quickly and efficiently, to bring him or her home.
Oliver Spatscheck is a principal member of the technical staff at AT&T Labs — Research, where he has been actively involved in building AT&T's content distribution infrastructure in addition to his work on network monitoring and network-monitoring tools. He has published 34 referred papers spanning the areas of network security, operating systems security, cryptography, network monitoring, protocol optimization, Web caching, content routing, machine learning applied to traffic identification, and stream databases in the context of network monitoring. Spatscheck has a PhD in computer science from the University of Arizona. He co-authored Web Caching and Replication (Pearson Education, 2001) and was actively involved in multiple IETF efforts. He's also served as guest editor of the IEEE Journal on Selected Areas in Communications as well as on multiple program committees and NSF panels. Contact him at email@example.com.
The opinions expressed in this column are my personal opinions. I speak neither for my employer nor for IEEE Internet Computing in this regard, and any errors or omissions are my own.