, Microsoft China Research and Development Group
, University of North Carolina at Chapel Hill
, University of Vienna
Pages: pp. 12-13
It gives us great pleasure to present the IEEE MultiMedia readership with a selection of expanded versions of some of the best papers from the 14th ACM International Conference on Multimedia held October 2006 in Santa Barbara, California. As the conference is traditionally organized into three tracks (content, systems, and applications), we selected two articles from each track for this special issue. We gave the authors the opportunity to expand, edit, and otherwise improve their original conference papers.
Briefly, the articles included here are
Participation at the conference once again reflected the trend of increased attendance seen over the past five years. We believe that this increase results from the fact that multimedia systems and applications continue to increase in importance: mobile phones also take and display pictures and video, our television networks are now also telephone networks and vice versa, voice over Internet Protocol (VoIP) is a reality, video-on-demand is a commonplace cable network feature, and Skype and Joost have demonstrably proven that peer to peer can work. For further evidence, we need look no further than Google's $1.65 billion purchase of YouTube.
Having now seen the fruits of this research community tangibly blossoming in the mainstream consumer product market, you might think that we would by now have a clear and definitive definition for what multimedia is exactly. And yet, we do not. However, this might not be such a bad thing. In many ways, trying to define multimedia is like trying to define art. No one can really define it, everyone claims to know it when they see it, and yet no two people can agree on what it is. Like the field of multimedia itself, these articles are extremely diverse in the topics addressed, approaches taken, and lessons learned. Undaunted, we will still attempt to draw some common themes across these articles.
One such theme is the construction of semantically meaningful information as a result of combining and using multiple media information types. In "Reranking Methods for Visual Search," the authors show how low-level features in image space can be used to improve the relevance of video search results originally obtained using nearby text. Doing so allows the original search and indexing problem to remain in the well-understood and highly effective text domain, but the results of any particular query are made more semantically meaningful to the user by employing image features of the video content. Similarly, in "Toward Bridging the Annotation-Retrieval Gap in Image Search," the authors were motivated by the fact that while annotation and tagging may completely address the image search problem if done ideally, in practical systems annotation will be less than perfect and incomplete. By combining annotation-based search with content-based search, they were able to construct more meaningful results.
Semantics and meaning are ultimately constructions within the user's mind. Two of our articles attempt to use low-level media features to drive high-level presentation in semantically meaningful ways. In "Tiling Slideshow: An Audiovisual Presentation Method for Consumer Photos," music analysis, temporal relationships, and image-based features are all combined to automatically construct photo slideshows so that people can use their photos to reflect their real-world experience in a meaningful way. The authors of "Exploring Music Collections like Exploring Landscapes" attempt to translate music features into a visual landscape that reflects to the user the structure and organization of a music collection and, perhaps more interestingly, translates music browsing into a 3D visual navigation.
While these previously noted articles focus on constructing meaningful semantic information from lower-level media features, our last two articles attempt to reflect semantics from the user and application back into the system's lower levels. In "Progressive Cut: An Image Cutout Algorithm that Models User Intentions," an image cutout algorithm is improved by deriving the user's intention from the user's interaction with the image cutout interface. In "Scalable, Adaptive Streaming for Nonlinear Media," the authors construct a general media streaming framework in which the application can reflect and exploit known semantic relationships between media data units that might not have a predetermined natural linear ordering—instead depending on specific client interests at the time of access.
We hope you enjoy and learn from these articles as much as we have. With a collaborative effort, we envision a bright future for multiple multimedia fields.