• Vulnerability to spam. By letting users create and publish content without much central governance, social media Web sites have been able to amass a rich body of content. Unfortunately, user-generated content is intrinsically more vulnerable to spam and noise because the content isn't filtered by any meaningful editorial process. When no dependable third party verifies the integrity of published content or the author's motivation, a significant portion of such content inevitably exists to promote its own commercial interest, potentially without benefiting public users. Partly due to the significantly larger fraction of spam, metrics such as PageRank, which work well for "traditional" Web content, are less effective for social media content.
• Short lifespan. The content on social media Web sites tends to have a shorter lifespan because much of it focuses on an ongoing real-world event or a current "hot" topic. Public interest in such content subsides rapidly over time. Thus little user-generated content accumulates many incoming links or user visits before it becomes irrelevant, making it difficult to judge the general "quality" of such content.
• Locality of interest. The large pool of potential content creators on social media sites has produced an explosion of publicly shared content, but much of it is of little interest to the general public. When publication costs are high, Web sites publish only content that's interesting to a general audience. However, in a world of near-zero publication costs, a teenaged boy's daily journal is unlikely to spark the general public's interest, even though it might be interesting to his friends and family.
• Access control. Most user-generated content is "private," meaning it's sent to only a few recipients and isn't visible to anybody else. Recently, a significant middle ground has been emerging — for instance, Facebook provides differentiated access to all members of a network, and these networks frequently contain tens of thousands of people. Content visible to the network isn't distributed to all members; rather, it's hosted, and Facebook verifies access credentials at access time. Searching in such an environment provides significant new challenges that existing data structures don't effectively address.