The Community for Technology Leaders

Rich Media, Poor Media

John R. Smith, IBM Research

Pages: pp. 2-3

Abstract—Authoring of rich media content is not prevalent despite efforts to develop standards, tools, and platforms. Average users prefer to keep it simple. However, growing interest in stylizing content and pinning media objects is putting average users on a new path of creativity that could lead to richer multimedia content.

Keywords—multimedia, multimedia authoring, multimedia standards, social media, pinning

There has long been the promise that multimedia authoring will be as pervasive as word processing, the assumption being that people will want to routinely author rich content, and multimedia authoring tools will make it possible. 1 But it's not turning out that way. People want to keep it simple. Despite technical challenges being addressed from every perspective across tools, standards, and delivery platforms, rich multimedia content authoring for the mainstream is still elusive. Instead, users want to do less complicated and more instantaneous things such as tweeting freshly captured photos, posting unedited video files, applying simple effects to shared pictures, and pinning existing content. Although digital media is clearly becoming the new currency online, rich multimedia authoring is still in the domain of experts.

Substantial efforts have been made to empower average users to do more. One of the early standards for creating multimedia presentations is the Synchronized Multimedia Integration Language (SMIL,, developed by the World Wide Web Consortium (W3C). SMIL can create spatially and temporally synchronized presentations of media objects that include animations and transitions. 2 Researchers have developed numerous SMIL authoring tools, but despite efforts to create multiple improved versions of the standard since 1998, SMIL has not been widely adopted.

Similarly, a powerful coding scheme for complex multimedia scenes was standardized in MPEG-4 Part 11 that supports complex spatio-temporal presentation of media content, 2D and 3D object encoding, and descriptions of complex user-interaction behavior. 3 Originally providing a Binary Format for Scenes (BIFS), MPEG-4 Part 11 later provided a textual format called the Extensible MPEG-4 Textual Format (XMT) that made it easier to author and edit MPEG-4 multimedia presentations ( Unfortunately, BIFS and XMT have not attracted much interest.

More recently, the W3C has been creating HTML5 (, a markup language that promises significant improvements in media handling for the Web. This includes addressing a fundamental gap in Web standards today in that multimedia must be handled through opaque HTML objects and proprietary browser plug-ins. Although HTML5's support for complex presentation is not as extensive as SMIL and MPEG-4 BIFS/XMT, standardizing how audio, video, and graphics are provided as Web content is an important step forward. Still, it is unclear whether HTML5 will make us all rich multimedia content authors.

It is overwhelmingly apparent, however, that interest from Web users goes beyond just consuming media content. Online and mobile users are extremely active in contributing, manipulating, and redistributing media objects. YouTube recently reported hitting a new high watermark of receiving on average 72 video-hours of video content per minute across its tens of millions of channels ( There is also tremendous interest in photo-sharing sites such as Flickr, which recently surpassed 6 billion photo uploads.

Although media content is clearly at the center of YouTube and Flickr, social networking sites also receive a tremendous number of media objects. Facebook recently reported that it receives 6 billion photos per month. 4 The total number of photos uploaded to Facebook to date is estimated to be more than 100 billion. Foursquare reported that it has received billions of location-based check-ins ( Many of them include geotagged photos from mobile users.

Clearly, average users are contributing media objects in large volumes. They upload everything from photos of friends and family members to videos that capture their travels and even daily routines. But this is poor media—it lacks the dynamics, interactivity, and composition expected to be at the heart of multimedia authoring for average users by now. Nevertheless, people increasingly want to express creativity with media objects, even if it is with simple manipulations. For example, Instagram has captured tens of millions of mobile users simply by making it easy to stylize and share photos. Without a doubt, helping users share the content they like boasts creativity. For example, Pinterest lets users "pin" interesting photos and then automatically compose them into boards that can be shared and commented on. Although the boards are not what we would call rich media, pinning makes for pure, simple routine authoring.

This growing interest in stylizing content and pinning media objects is blazing a new path of creativity. For now, we'll need to be content with poor content. At least there's a lot of it. And we can keep our hopes alive for a richer future.

New Editorial Board Member



Please welcome Chia-Wen Lin to the IEEE MultiMedia editorial board. He is currently an associate professor in the Department of Electrical Engineering at National Tsing Hua University (EE/NTHU), Taiwan. Prior to joining the EE/NTHU, he worked in the Department of Computer Science and Information Engineering at National Chung Cheng University, Taiwan, and was with the Information and Communications Research Laboratories at the Industrial Technology Research Institute, Taiwan. His research interests include multimedia analysis, indexing, retrieval, ontologies, and semantics; content adaptation and personalization; ubiquitous and mobile media; multimedia communications and streaming; and media authoring.

Lin has a PhD from EE/NTHU and was awarded a PhD thesis award from the Ministry of Education, Taiwan. His paper won the SPIE VCIP 2005 Young Investigator Award. He was also awarded the National Chung Cheng University's Young Faculty Awards (2005–2007) and Taiwan National Science Council's Young Investigator Awards (2006–2009).

Chia-Wen Lin is an associate editor of the IEEE Transaction on Circuits and Systems for Video Technology, IEEE Transaction on Multimedia, and Journal of Visual Communication and Image Representation (JVCI) and an area editor of EURASIP Signal Processing: Image Communication. He is a member of the IEEE Signal Processing Society's Multimedia Signal Processing Technical Committee (TC) and Image, Video, and Multidimensional Signal Processing TC as well as the IEEE Communication Society's Multimedia Communications TC. Lin is currently the secretary of the IEEE Circuits and Systems Society's Multimedia Systems and Applications TC.

Contact him at or visit for a more complete curriculum vita and a list of publications.


About the Authors

John R. Smith is a senior manager of Intelligent Information Management at IBM T.J. Watson Research Center. Contact him at
63 ms
(Ver 3.x)