MONTH 2007 (Vol. 8, No. 3) p. 6 1541-4922/07/$26.00 © 2007 IEEE Published by the IEEE Computer Society Multi-View Video: Get Ready for Next-Generation Television
Many believe that multi-view video is poised to change how people watch television and become a driving force in interactive multimedia entertainment, for both desktop and mobile environments. Many believe that multi-view video is poised to change how people watch television and that it could become a driving force in interactive multimedia entertainment, for both desktop and mobile environments. An MVV system acquires several video sequences of the same scene simultaneously from more than one angle and transports these streams remotely. Scenes can be displayed interactively, letting the user rotate the view from multiple angles as if it were 3D and enjoy the feeling of being in the scene. Owing to the massive amount of data involved and extensive processing requirements, real-time MVV processing presents research issues that lie at the frontier of video coding, image processing, computer vision, and display technologies. Building a complete end-to-end MVV system also hinges on several additional technologies, such as real-time acquisition, transmission, and display of dynamic scenes that users can view interactively on conventional screens. Several research groups around the world are actively researching MVV. Applications MVV technology could lead to exciting new applications in areas such as education, medicine, surveillance, communication, and entertainment. It could also lead to a mass-media shake-up and the birth of a new industry, especially in the mobile domain. Furthermore, researchers will also need to examine surround sound with a fresh perspective to accompany the video style. MVV can also profoundly affect telecommunication, given that telecommunication's ultimate goal is highly effective interpersonal information exchange. For instance, media sports coverage technology keeps evolving. In the past, only a few TV channels aired the games that interested people. Now audio and video coverage can be delivered over the Internet or broadcast in HDTV format. Technology has always dazzled sports fans. Instant replays, introduced in the early 1960s, added a new dimension that in-stadium fans couldn't see, and miniature cameras let viewers see what referees see on the field. As MVV technology matures, we can expect a revolution in coverage of sports—car racing, soccer, football, basketball, and so on. With multiple cameras capturing and broadcasting the scene live to viewers and letting them rotate the viewing angle, sports viewing could become a whole new concept. Current videoconferencing systems provide a fixed view of the remote scene, so they don't give you the feeling of being there. Multi-view video could have a broad impact on such systems. One important feature of future communications will be interactivity with stereoscopic and 3D vision, which make you feel more as if you're present in the scene. In a videoconferencing scenario, participants at different geographical sites could meet virtually and see one another in free viewpoint video or 3DTV style. Surveillance and remote monitoring of important sites, such as critical infrastructures, traffic, parking lots, and banks, could also benefit from this technology because it can provide coverage of very large areas from multiple angles. Other potential application areas include entertainment (such as concerts, multiuser games, and movies), education (such as digital libraries and archives, training and instruction manuals with real video, and surgeon training), culture (such as zoos, aquariums, and museums), and archiving (such as scientific archives, national treasures, and traditional entertainment). Research issues An MVV system consists of components for data acquisition, compression, and delivery. The acquisition component captures videos from multiple cameras and obtains the acquisition's parameters. The processing part analyzes the acquired data, extracts features of it, and compresses it for delivery and storage. On the receiving side, decoding and display devices reconstruct the view in either two or three dimensions, depending on the devices' capabilities. Video acquisition and representation For MVV content generation, numerous scene-acquisition methods are possible. The scene-modeling and real-time processing requirements and the available bandwidth for video transmission determine the variation in the number, type, and placement of cameras. For instance, for model-based representation, good-quality 3D video can be rendered using the input from only a limited number of cameras. Image-based correspondence techniques, however, might require a large number of input streams but little processing. Some video-acquisition schemes require static background capture before introducing the scene's dynamic parts. Estimating the setup's extrinsic and intrinsic parameters might require camera calibration. You can classify acquisition setups on the basis of camera placement geometry, camera type (stationary or motional), distance from the objects of interest, and synchrony of video acquisition. Other parameters, such as intrinsic parameters of different camera types, also distinguish different setups. On the basis of the acquisition system setup, MVV scenarios fall into different categories. The camera configuration can be parallel, 1 convergent(http://www.immersivemedia.com), or a combination of both. 2 Convergent configurations are generally used with model-based representations of the dynamic scenes captured. 3 Other capturing systems also exist. 4 - 7 In an MVV system, the video streams must be synchronized to ensure that all the cameras' "shutters" open at the same instant when they're sampling the scene from different angles. 3 , 8 Video captured from different cameras is used together with timing information to create novel views in multi-view video. The input from the cameras can be synchronized using external sources such as a light flash at periodic intervals. 4 External synchronization can slow down the frame rate considerably. One way of representing multi-view video is to use 2D video plus a disparity map and 3D structure. MPEG-4 multiview coding 8 proposed using video streams and a disparity map. Various rendering methods can be used with this scheme on the client side. The blue-c project at ETH (Eidgenössische Technische Hochschule) Zürich has used a 3D hierarchical data point representation. 4 It allowed efficient spatial coding into different data streams (tree structure, color, position, and normal information) and temporal coding using update, insert, and delete operators. Multi-view video processing MVV compression involves more than just compressing independent multiple streams, without which scene reconstruction wouldn't be possible. Traditional 2D video-coding standards, such as MPEG and H.2XX, exploit the human eye's characteristics, including its sensitivity to color. 2D video coding also takes advantage of the motion as well as the spatial and statistical redundancies in video data. In general, MVV is reconstructed from multiple 2D video sequences. More than one view video sequence must be transmitted or stored, leading to a massive amount of data. MVV compression algorithms should reduce redundancy in information from multiple views as much as possible to provide a high degree of compression, subject to distortion and resource constraints. The redundancy in MVV streams consists of intraframe redundancy (spatial): intraframe prediction coding; interframe redundancy (temporal): motion-compensated prediction coding; inter-view redundancy (geometrical): disparity-compensated prediction coding; transform redundancy (frequency): DCT (Discrete Cosine Transform) or wavelet transform coding; redundancy of human visual system: scalable coding. 3D video compression has the following additional requirements: Visual quality. Decompressed data should provide good visual quality. Criteria include subjective quality (that is, how it looks to the human visual system), objective quality, and quality consistency among views (that is, the data should provide perceptually similar visual quality over different views that will be presented in the same time frame). Synthesizability for reconstructed video. Decompressed data should support robust generation of a virtual or interpolated view. So, camera calibration information and the depth/disparity map should be compressed along with view data. Compatibility. Should be compatible for current and future video standards. 1 Low delay. The compression algorithms should provide low delay for real-time applications. Such delays include encoding and decoding delays, view change delays, and end-to-end delay. Camera motion. Should support encoding of video sequences, subject to camera motion. Scalability. This includes signal-noise ratio scalability, spatial scalability, temporal scalability, complexity scalability, view scalability, and scalability on a multitude of terminals and under different network conditions. Networking and transportation Delivering MVV video to end users will pose serious networking challenges, involving protocols, quality of service, channel-delay management, and error concealment and recovery. Depending on their environments and requirements, MVV systems can be built on different architectures (see figure 1 ). Figure 1. Various multi-view video system architectures: (a) distributed-acquisition and distributed-viewers model (DADV); (b) Local-acquisition and local-viewers model (LALV or Saito's model); 6 (c) distributed-acquisition and local-viewers model (DALV or Heinrich-Hertz Institute model; 9 (d) local-data-acquisition and distributed-viewers model (LADV or University of Central Florida model). 10 Projects Because multi-view video is a new and widely applicable research area with a broad range of open problems, numerous related research efforts are under way worldwide. In Europe, the Digital Stereoscopic Imaging and Application (DISTIMA—http://homes.esat.kuleuven.be/~konijn/mirage.html) project addressed the production, presentation, coding, and transmission of digital stereoscopic video signals over integrated broadband communications networks. Another European research project, the Package for New Operational Autostereoscopic Multiview System ( PANORAMA—http://www.cordis.lu/infowin/acts/analysys/products/thematic/ mpeg4/panorama/panorama.htm), has aimed to facilitate the hardware and software development of an MVV autostereoscopic telecommunication system. The Advanced Three-dimensional Television System Technology (ATTEST—http://www.extra.research.philips.com/euprojects/attest/) project aims to design an entire 3D-video chain, including content creation, coding, transmission, and display. Mitsubishi Electric Research Laboratories, Carnegie Mellon University's computer vision lab, Kyoto University, Heinrich Hertz Institute in Germany, and the blue-c project are pursuing similar endeavors. Outlook A workgroup of the International Organization for Standardization's Motion Picture Expert Group has been exploring 3D audiovisual technology. The 3DAV has discussed various applications and technologies in relation to the term "multi-view video." A multi-view profile is available in the MPEG-2 standard, which was defined in 1996 as an amendment for stereoscopic TV. The MVP extends the well-known hybrid coding toward exploitation of inter-view/channel redundancies by implicitly defining disparity-compensated prediction; however, it doesn't support interactivity. MPEG-4 version 2 includes the Multiple Auxiliary Component, defined in 2001. MAC's basic idea is that grayscale shape is used not only to describe the video object's transparency but also can be defined in a more general way. MACs are defined for a video object plane on a pixel-by-pixel basis and contain data related to the video object, such as disparity, depth, and additional texture. Since 2003, MPEG has also accelerated its work on MVV coding standards. The Multiview Video Coding initiative has passed MPEG's call-for-proposals stage. The proposals were based on the H.264/AVC video coding standard. Thus, the MVC is currently being developed and standardized as an extension of this standard in a joint ad hoc group on MVC (AHG on MVC) in JVT. For more information on the MPEG standardization efforts, see http://www.chiariglione.org/mpeg/working_documents.htm. Conclusion MVV-based products are expected to appear in two to three years. Watching television passively might soon be a thing of the past. ReferencesReferences Ishfaq Ahmad is a professor of computer science and engineering at the University of Texas at Arlington. Contact him at iahmad@cse.uta.edu
| |||||||||||||||||||||||||||||||||||||||||||||||||||||