Google and Twitter “Like” Social Indexing
by George Lawton
For the last decade, the dominant approach to finding information on the Web has been Google’s link-based approach. Google indexes material by using page-rank algorithms that use the links between pages. Pages with many links from other sites are rated as more important and, for example, placed at the top of search results.
People use Google not only to search for information but also to find products, books, and so on. Recently, both Google and Twitter have revealed new efforts to integrate preference information into these indexes in same manner as Facebook’s “like” button, which lets users indicate their preference by clicking on a special website button.
Professor Jon M. Kleinberg at Cornell University said, “The ‘like’ button is a channel that gets used when your actions by themselves are not a rich enough language for expressing your opinion. You get richer information to the extent that you can make this feedback part of the workflow of the site. When you combine these features with methods for personalizing the site, then people see some value in expressing their opinion.”
Facebook launched the “like” button on its public social networking site n April 2009. In April 2010, it merged this implementation with and replaced Facebook’s Fan feature for companies. By late April 2011, the “like” button had been added to over 250,000 sites. Both Google and Twitter recently released key advances in their respective social-indexing platforms, called “+1” and “Follow.”
Smaller companies are focused on extending social indexing into specific domains. For example Getglue.com has attracted 1.2 million users into creating the largest index of TV shows in the world. “The entertainment space is one of the most social and telling areas, when it comes to learning about people’s preferences,” said Getglue.com CEO Alex Iskold.
Proponents say that for some users and in some circumstances, this approach is more effective than one based on links. Network analysis tools, which help understand the significance of metadata, play a key role in social indexing. These tools are better at considering the opinions of friends when looking for restaurants, books, movies, websites, or TV shows than traditional analytic techniques, said Iskold.
The Evolution of Search
A main driver for social-indexing technology is the new kinds of analysis it makes possible over traditional approaches. For example, Kleinberg notes that people can get more useful information, when the site uses algorithms that incorporate feedback about people’s preferences.
Using preference metadata in search is the next major evolution of indexing systems, said Marc Smith, chief social scientist at the Connected Action consulting group, who also led the development of NodeXL, the world’s most popular network analysis tool. “I would argue that we’re about to move from the era of page-rank–based search to an era of people-rank–based search, where we’re not just looking for the links between pages, but starting to look at the links between people and new forms of links beyond. Text indexes created a higher value representation of content, while social indexing creates a higher level representation of collections of connections. It’s a natural progression.”
Smith said the need for search originated in the explosion in documents. The first Web search techniques treated all documents as independent islands. The field of search took a major step forward with Google’s introduction of the page-rank algorithm, which linked a Web site’s relevance to the number of other sites pointing to it.
Now the use of tie-relationship types is opening a new era of more precise search. These tie-relationship types include liking, linking, rating, reviewing, commenting, re-tweeting, replying, following, friending, and contacting.
Social-Indexing Technology Drivers
The growth of social-indexing technologies is being driven by improvements in processing algorithms, display techniques, and parallel processing techniques.
The network analysis required for social-indexing services is significant. Smith said that a basic desktop computer can process network microscale graphs of less than 1,000 nodes. Larger computations on mesoscale graphs of 10,000–1,000,000 nodes require dedicated hardware, and megascale graphs—beyond one million nodes—require a large data center or cloud service provider.
Smith said desktops offer a lot of opportunity for parallelism. NVidia’s Tesla represents a new class of GPU cards that enable mesoscale analysis. These cards come with up to 480 GPUs per card, and a single computer chassis can hold up to four cards.
A second shift has been the evolution of MapReduce tools such as Hadoop. Iskold said these tools let Getglue.com crunch complex algorithms in a matter of minutes by distributing across many virtual machines in the cloud.
Until recently, the IT world shunned the analytic techniques used in network analysis because of the higher redundancy and lack of data integrity, said Iskold. “Replication was a big ‘No’ in Wall Street, because data integrity was so important. But what’s important now is accurately representing the preference of users.”
Network analysis techniques have also improved, said Kleinberg. For example, researchers are trying to identify useful ways of scaling a network’s size up or down. In many cases, a developer can scale a complex network graph down to speed up analysis, but in other cases, this process reduces accuracy. More research is required to understand the best practices around scaling.
Another set of challenges lies in identifying the most efficient ways of rendering complex network graphs for analysis, said Cody Dunne, a doctoral student at the University of Maryland. The display screen itself places fundamental limits on how many nodes a user can visualize. Dunne said the biggest, most obvious challenges network analysis tools face are nodes overlapping, edges crossing unnecessarily, and edges tunneling underneath nodes without connecting to them. “There are automated layout algorithms and post-processing approaches for reducing these somewhat,” he said, “but they’re not usually implemented by analysis tools.”
Dunne sees a need to develop readability metrics for these kinds of graphs. “As social network analysis and graph drawing in general become more mainstream,” he explained, “it’s important to provide new entrants guidelines for effective graph-drawing creation. Without them, the graph drawings users produce can be unintelligible or even misleading.”
The first generation of tools is limited to specifying preferences, said Kleinberg. He expects social indexing promises to open up information classification in two dimensions: respect and agreement. He said these two dimensions often get bundled together in current preference implementations. New approaches such as the Epinions.com rating system, which lets participants rate other participants as well as products, could tease apart information about users’ preferences from their level of respect for others.
As Preferences Proliferate...
Experts believe that the proliferation of social indexes will also attract the attention of spammers and others who will try to game the system. “With any system,” Kleinberg warned, “you have to design the features and analysis knowing that spammers and well-intentioned people will take advantage of it in ways you did not expect. The popularity of these systems will incentivize people to adapt their behavior and do well by these metrics.”
There are also concerns about the number of different preference systems the Internet can practically support. Smith said, “I don't think there will be 20 buttons on every site, although there currently are on many.”
Smith believes that the proliferation of preference buttons could evolve in one of two ways. The number of preference indexes could continue to grow and be balanced and managed by a layer of preference-management tools that can add preferences to or get data from multiple social indexes. The other possibility is a competitive winnowing process that forces the convergence to two or three centralized repositories managed by companies like Google, Facebook, and Amazon.
George Lawton is a freelance researcher based in Guerneville, CA, Contact him at glawton@glawton.com.