P. Muneesawang and L. Guan, Multimedia Database Retrieval: A Human-Centered Approach, Springer, 2006, $99, 186 pp., hardcover, ISBN 0-387-25627-X.
With tools like mobile phones, Flickr, and YouTube, everyday users can produce, manage, and share images and videos. Yet, even with a thousand words, it's difficult to adequately annotate, describe, or index many images because words can't quite describe our vacation in Paris the way pictures and video clips can. This leads to the question, What is the current status of image and video indexing and retrieval?
Multimedia Database Retrieval is probably one of the first books to address new multimedia retrieval techniques, which let users fully and efficiently use multimedia data libraries. The book focuses on image and video retrieval, using content-based methods and technologies. What makes this book unique is its focus on human factors that affect the way we search for and browse images. Targeted toward professionals, managers, and high-level executives, it's also excellent for anyone interested in emerging multimedia databases, object-relational databases, and digital libraries.
To help readers fully understand and appreciate the challenges of image indexing and retrieval, the authors begin with a brief discussion of state-of-the-art models and systems. There are two broad approaches. Keyword-based indexing uses keywords or descriptive text, which are stored together with images and videos in the databases. Most commercially available systems rely on this technique, retrieving material by matching a query, given in the form of keywords, with the stored keywords. This approach is unsatisfactory, however, because the text-based description tends to be incomplete, imprecise, and inconsistent in specifying visual information. Furthermore, it requires much human effort to develop suitable descriptions.
To overcome this problem, recent research has focused on content-based indexing and retrieval techniques. This approach lets users index and retrieve images and videos from databases using visual content (such as prominent regions, color, shape, size, and texture), motion-related information (movement of objects, enlarging or shrinking, and global camera operation), and similarity-based features. Imagine future versions of YouTube or Flickr where users will be able request "pictures of dogs" or "pictures of Abraham Lincoln." In another possible scenario, users could draw a rough approximation of the image they're looking for—for example, with general shapes or textures. Other methods include specifying the proportions of colors desired (such as "80 percent red, 20 percent blue") and searching for images that contain an object given in a query image.
This book describes techniques based on the second approach, providing newly developed methods that are intelligent enough to advise the adaptation modules to optimize the retrieval process. Furthermore, the book is an application-oriented reference for multimedia search and retrieval. It focuses on the following applications:
• Computer-aided detection systems applied to discovering underwater mines in side-scan sonar images. The method points to particular areas in a sonar image, labeling areas that are potentially dangerous. With this approach, operators can then further investigate a short list of images to quickly and accurately detect mines.
• Texture database applications to browse and interpret aerial photos in geographic information systems (GIS). This approach digitizes and then interprets aerial photography. Using the digital map, management can make decisions with more accuracy and confidence—for instance, designing a more efficient irrigation system or determining a soil sampling regime for fertility assessment.
• News video applications dedicated to indexing and retrieving news videos. The authors show how news headlines are used to retrieve a full news story. This lets users go directly to the full story from the headline of interest.
This focus on applying theoretical concepts in a practical scenario is what makes the book useful for practitioners in the multimedia community. Individuals or organizations in the process of procuring or buying a multimedia retrieval system will also find the book valuable.
Comprised of seven chapters, the book begins by introducing the first human-controlled interactive content-based retrieval (HCI-CBR) system. It introduces a nonlinear model using an expansion set of radial basis functions and the associated learning algorithms. The system performance is then demonstrated in comparison with other systems. The applications involved in the experiments are texture retrieval and image retrieval on a general photograph collection. This also includes applying image indexing retrieval techniques to a computer-aided detection system.
The third chapter depicts work on an enhanced HCI-CBR system, using a local model network. The network implements a mixture of models through relevance feedback learning algorithms. It discusses the theoretical principles and introduces a new learning strategy to apply this network model to interactive retrieval applications.
The really interesting part of this book begins with the fourth chapter, which presents the work on the machine-controlled content-based retrieval system. The system is introduced by incorporating a self-learning architecture with the HCI-CBR methods discussed in chapters 2 and 3. These automatic and semiautomatic techniques are applied to Web-based databases with the use of compressed-domain descriptors. The main advantage of the approach is in the automation of the interaction process. The book's message here is that minimizing user participation provides a more user-friendly environment and avoids errors caused by excessive human involvement.
Chapter 5 describes a new architecture of HCICBR on a peer-to-peer network, providing for network resource allocations. The key issue the authors discuss is using knowledge in HCI-CBR and applying it to a photograph collection.
The authors discuss alternative strategies for incorporating video indexing and retrieval in chapter 6. First, it introduces an adaptive video indexing (AVI) technique to characterize a news program video database. Then, the chapter presents the integration of AVI to a self-training neural network to implement automatic relevance feedback. It also emphasizes a broad spectrum of new features that might help usher in a new generation of video database applications.
In chapter 7, the authors describe different issues in dealing with applying AVI to an online system for movie retrieval. The audiovisual fusion model is critical to supporting concept-based queries from movie databases. In this practical application, retrieval accuracy isn't the only concern. A user-friendly environment is also desirable. This chapter describes how researchers have successfully applied these techniques for retrieving movie clips from a large digital collection.
A common pitfall of many books on multimedia systems is their ambition to cover everything under the sun related to multimedia. This usually results in superficially treating each subject. Fortunately, this book sticks to its main focus and gives a thorough discussion of issues concerning multimedia databases. To a large extent, the authors have also done a good job of not showing bias on any subject.
This is an outstanding book on an important emerging area. We definitely recommend it to professionals working in this area who want an update on current research and to students and researchers interested in starting work in this area.