Issue No. 02 - March/April (2010 vol. 14)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MIC.2010.39
Nitin Agarwal , University of Arkansas
Torsten Suel , Polytechnic Institute of New York University
Huan Liu , Arizona State University
Philip S. Yu , University of Illinois at Chicago
The widespread phenomenon of blogging demonstrates the power of citizen journalism — anytime information sharing that lets people exchange personal experiences, express likes and dislikes, voice opinions, offer suggestions, and form groups with genuine social activities. Blogs also act as conduits, propagating data at an unprecedented pace that has led to a gigantic and dynamic open source data archive as well as a unique opportunity for various research activities (including the study of influence, trust, reputation, privacy, search, spam, and group interaction 1).
Unprecedented Challenges and Opportunities
An obvious yet important challenge is how to model and mine this vast pool of data to extract, represent, and exploit meaningful knowledge and leverage the structures and dynamics of the blogosphere's emerging social networks. Social computing that combines data mining with social network analysis is an emerging interdisciplinary field that offers unique opportunities to develop novel algorithms and tools, ranging from text and content mining to graph and link mining. An associated challenge is data collection and objective evaluation: How can we effectively collect data and share it for research and benchmark building? How can we develop generally agreeable evaluation procedures to objectively measure this emerging field's progress?
The blogosphere's distinctive nature makes it imperative for academics, researchers, and industrial practitioners of disparate disciplines to explore and collaborate. Our objectives for pulling together this special issue of IEEE Internet Computing were to collate research submissions about the blogosphere with interesting ideas and original approaches, provide a conducive platform (this magazine is an ideal venue), and present state-of-the-art interdisciplinary research on social computing.
We initially received 27 submissions on a great variety of topics, including — but not limited to — popularity, opinion, and novelty mining in the blogosphere, scalable and distributed blog services, blog data visualization, and blog spam detection. Independent experts reviewed the pertinent submissions, and we ultimately selected three articles for inclusion in this special issue. We invited the authors of another four submissions to revise their works based on the expert reviews for inclusion in a future issue. We believe this series of articles about social computing in the blogosphere will accelerate idea dissemination and interdisciplinary collaboration.
In this Issue
The three articles in this issue examine well-established sociology theories in the presence of large amounts of social media data, survey existing knowledge discovery approaches in the blogosphere and the challenges encountered, and analyze social-political blogs in regional languages.
In the article "Homophily in the Digital World: A LiveJournal Case Study," Hady W. Lauw, John C. Shafer, Rakesh Agrawal, and Alexandros Ntoulas offer insight into the nature of friendship in an online environment such as LiveJournal, which is possibly quite different from the conventional friendships experienced in the offline world. The authors also validate homophily as the basis of our relationships on dimensions other than familial and geographical proximity. The article investigates whether the differences between physical and virtual worlds play a role in determining what governs the construction and sustenance of friendship ties, investigating two essential, interconnected questions: Are two users more likely to be friends if they share common interests, and are two users more likely to share interests if they're friends? The results of this study, as mentioned by the authors, could be key factors in developing a deeper understanding and hence accurate models for analyzing social ties in online networks.
Geetika T. Lakshmanan and Martin A. Oberhofer, in their article "Knowledge Discovery in the Blogosphere: Approaches and Challenges," present a translational research survey on the existing approaches for knowledge extraction and data mining in the blogosphere, pointing out various challenges and several potential research directions. The article observes the significant differences between the blogosphere and traditional text or even Web documents that warrant a special class of data mining algorithms. The authors present an easy-to-understand comparison of existing works on knowledge discovery in the blogosphere in terms of three prominent techniques frequently applied to knowledge discovery: clustering, ranking, and matrix factorization. They also provide an overall description of knowledge discovery strategies and survey the existing algorithms for each technique, analyzing the specific blogosphere challenges that each of these approaches can or can't handle. By doing so, the authors effectually suggest potential directions that other researchers can explore and extend. The illustrations presented in the article are particularly useful for readers trying to understand the available algorithms and the relationships among various techniques.
Finally, in their article "Metrics for Monitoring a Social-Political Blogosphere: A Malaysian Case Study," Brian Ulicny, Christopher J. Matheus, and Mieczyslaw M. Kokar motivate the need for new approaches to monitoring structurally and linguistically different social-political, or sopo, blogs. The authors describe their automated framework, showing the utility of their approach on the Malaysian blogosphere during and after the general elections in 2008. After reviewing some classic examples such as tf*idf, Google's PageRank, and Technorati's Authority, the authors propose new metrics for evaluating sopo blogs' content quality and topic relevance: their credibility metric combines three distinct measures (blog authority, reader engagement, and blogger accountability), and their other three metrics (blog relevance, timeliness, and specificity) are particularly suited to the nature of blog posts.
These three articles represent only some of the current research on modeling and mining in social media. Other work not included here but worthy of further investigation includes modeling and pattern discovery, 2 influence and diffusion, 3,4 security and privacy, 5 community detection and evolution, 6 and collective behavior learning and prediction. 7
David Lazer and his coauthors observed that, "The capacity to collect and analyze massive amounts of data has transformed such fields such as biology and physics. But the emergence of data driven 'computation social science' has been much slower." 8 New user-friendly technologies and media such as blogs and microblogs with few or no barriers to entry, offer unprecedented means to let people share what they experience, offer their opinions or answers, and report live events, such as presidential elections, natural disasters, and terrorist activities. Social network analysis isn't exactly a young field, but the unprecedented availability of massive amounts of data from social computing's evolution offers new challenges that require collaborative research from different disciplines — social sciences, computer science, psychology, cultural anthropology, and mathematics, to name a few. This special issue will hopefully contribute to the emerging field's advancement, and we look forward to the innovative ideas and ingenious algorithms to come.
We express our gratitude to the authors, whose submissions made this special issue possible and showed how active and exciting this emerging field is, the reviewers who sacrificed their time and made the effort to read and offer constructive comments to help the authors see their work from a different perspective, the IEEE Internet Computing editorial staff members who helped us manage the submissions, and the magazine's editorial board for its guidance in the process.
Huan Liu is a professor of computer science and engineering at Arizona State University. His research interests include feature selection, data mining, and computing with social media. Liu has a PhD in computer science from the University of Southern California. He's a member of the IEEE, AAAI, the ACM, and the American Society for Engineering Education. Contact him via www.public.asu.edu/~huanliu.
Philip S. Yu is a professor of computer science at the University of Illinois at Chicago and also holds the Wexler Chair in information technology. His research interests include data mining, social computing, database, and privacy. Yu has a PhD from Stanford University. He's a fellow of the ACM and the IEEE. Contact him via www.cs.uic.edu/PSYu/.
Nitin Agarwal is an assistant professor in the Information Science Department at the University of Arkansas at Little Rock. His primary research interests are in social computing, data mining, and Semantic Web mining. Agarwal has a PhD in computer science from Arizona State University. His dissertation focused on social computing aspects in the blogosphere. Contact him at firstname.lastname@example.org; www.ualr.edu/nxagarwal/.
Torsten Suel is an associate professor in the Department of Computer Science and Engineering at the Polytechnic Institute of New York University. His research interests are in Web search engines, databases, algorithms, and distributed systems. Suel has a Diploma from the Technical University of Braunschweig, Germany, and a PhD from the University of Texas at Austin. Contact him at email@example.com; http://cis.poly.edu/suel/.