This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Topology Dictionary for 3D Video Understanding
Aug. 2012 (vol. 34 no. 8)
pp. 1645-1657
T. Tung, Dept. of Intell. Sci. & Technol., Kyoto Univ., Kyoto, Japan
T. Matsuyama, Dept. of Intell. Sci. & Technol., Kyoto Univ., Kyoto, Japan
This paper presents a novel approach that achieves 3D video understanding. 3D video consists of a stream of 3D models of subjects in motion. The acquisition of long sequences requires large storage space (2 GB for 1 min). Moreover, it is tedious to browse data sets and extract meaningful information. We propose the topology dictionary to encode and describe 3D video content. The model consists of a topology-based shape descriptor dictionary which can be generated from either extracted patterns or training sequences. The model relies on 1) topology description and classification using Reeb graphs, and 2) a Markov motion graph to represent topology change states. We show that the use of Reeb graphs as the high-level topology descriptor is relevant. It allows the dictionary to automatically model complex sequences, whereas other strategies would require prior knowledge on the shape and topology of the captured subjects. Our approach serves to encode 3D video sequences, and can be applied for content-based description and summarization of 3D video sequences. Furthermore, topology class labeling during a learning process enables the system to perform content-based event recognition. Experiments were carried out on various 3D videos. We showcase an application for 3D video progressive summarization using the topology dictionary.

[1] T. Kanade, A. Yoshida, K. Oda, H. Kano, and M. Tanaka, "A Stereo Machine for Video-Rate Dense Depth Mapping and Its New Applications," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 196-202, 1996.
[2] J. Starck and A. Hilton, "Model-Based Multiple View Reconstruction of People," Proc. Ninth IEEE Int'l Conf. Computer Vision, vol. 2, pp. 915-922, 2003.
[3] T. Matsuyama, X. Wu, T. Takai, and S. Nobuhara, "Real-Time 3D Shape Reconstruction, Dynamic 3D Mesh Deformation, and High Fidelity Visualization for 3D Video," Computer Vision and Image Understanding, vol. 96, no. 3, pp. 393-434, 2004.
[4] J. Franco, C. Menier, E. Boyer, and B. Raffin, "A Distributed Approach for Real-Time 3D Modeling," Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshop, p. 31, 2004.
[5] K.M. Cheung, S. Baker, and T. Kanade, "Shape-from-Silhouette across Time: Part II: Applications to Human Modeling and Markerless Motion Tracking," Int'l J. Computer Vision, vol. 63, no. 3, pp. 225-245, 2005.
[6] J. Allard, C. Ménier, B. Raffin, E. Boyer, and F. Faure, "Grimage: Markerless 3D Interactions," Proc. ACM Siggraph, 2007.
[7] E. de Aguiar, C. Stoll, C. Theobalt, N. Ahmed, H.-P. Seidel, and S. Thrun, "Performance Capture from Sparse Multi-View Video," ACM Trans. Graphics, vol. 27, no. 3, pp. 1-10, 2008.
[8] S. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski, "A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 519-526, 2006.
[9] J. Starck and A. Hilton, "Surface Capture for Performance-Based Animation," IEEE Computer Graphics and Applications, vol. 27, no. 3, pp. 21-31, May/June 2007.
[10] T. Tung, S. Nobuhara, and T. Matsuyama, "Simultaneous Super-Resolution and 3D Video Using Graph-Cuts," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[11] T. Tung, S. Nobuhara, and T. Matsuyama, "Complete Multi-View Reconstruction of Dynamic Scenes from Probabilistic Fusion of Narrow and Wide Baseline Stereo," Proc. 13th IEEE Int'l Conf. Computer Vision, 2009.
[12] R. Gray and A. Gersho, Vector Quantization and Signal Compression. Kluwer, 1992.
[13] J. Ziv and A. Lempen, "A Universal Algorithm for Sequential Data Compression," IEEE Trans. Information Theory, vol. 23, no. 3, pp. 337-343, May 1977.
[14] T. Tung and F. Schmitt, "The Augmented Multiresolution Reeb Graph Approach for Content-Based Retrieval of 3D Shapes," Int'l J. Shape Modeling, vol. 11, no. 1, pp. 91-120, 2005.
[15] T. Tung, F. Schmitt, and T. Matsuyama, "Topology Matching for 3D Video Compression," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[16] S.P. Meyn and R. Tweedie, Markov Chains and Stochastic Stability. Cambridge Univ. Press, 2008.
[17] H. Habe, Y. Katsura, and T. Matsuyama, "Skin-Off: Representation and Compression Scheme for 3D-Video," Proc. Picture Coding Symp., 2004.
[18] P. Alliez and C. Gotsman, "Recent Advances in Compression of 3D Meshes," Advances in Multiresolution for Geometric Modelling, N.A. Dodgson, M.S. Floater, and M.A. Sabin, eds., pp. 3-26, Springer-Verlag, 2005.
[19] M. Alexa and W. Müllen, "Representing Animations by Principal Components," Computer Graphics Forum, vol. 19, no. 3, pp. 411-418, 2000.
[20] Z. Karni and C. Gotsman, "Compression of Soft-Body Animation Sequence," Computers and Graphics, vol. 28, pp. 25-34, 2004.
[21] J. Carranza, C. Theobalt, M. Magnor, and H.-P. Seidel, "Free-Viewpoint Video of Human Actors," ACM Trans. Graphics, vol. 22, no. 3, pp. 569-577, 2003.
[22] K. Palagyi and A. Kuba, "A Parallel 3D 12-Subiteration Thinning Algorithm," Graphical Models and Image Processing, vol. 61, no. 4, pp. 199-221, 1999.
[23] N. Cornea, D. Silver, X. Yuan, and R. Balasubramanian, "Computing Hierarchical Curveskeletons of 3D Objects," The Visual Computer, vol. 21, no. 11, pp. 945-955, 2005.
[24] I. Baran and J. Popovic, "Automatic Rigging and Animation of 3D Characters," ACM Trans. Graphics, vol. 26, no. 3, p. 27, 2007.
[25] A. Sharf, T. Lewiner, A. Shamir, and L. Kobbelt, "On-the-Fly Curve-Skeleton Computation for 3D Shapes," Computer Graphics Forum, vol. 26, no. 3, pp. 323-328, 2007.
[26] J. Winn, A. Criminisi, and T. Minka, "Object Categorization by Learned Universal Visual Dictionary," Proc. IEEE 10th Int'l Conf. Computer Vision, vol. 2, pp. 1800-1807, 2005.
[27] J. Shotton, M. Johnson, and R. Cipolla, "Semantic Texton Forests for Image Categorization and Segmentation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[28] B. Fulkerson, A. Vedaldi, and S. Soatto, "Localizing Objects with Smart Dictionaries," Proc. 10th European Conf. Computer Vision, vol. 1, pp. 179-192, 2008.
[29] M. Yeung and B.-L. Yeo, "Segmentation of Video by Clustering and Graph Analysis," Computer Vision and Image Understanding, vol. 71, no. 1, pp. 94-109, 1998.
[30] C.-W. Ngo, Y.-F. Ma, and H.-J. Zhang, "Video Summarization and Scene Detection by Graph Modeling," IEEE Trans. Circuits and Systems for Video Technology, vol. 15, no. 2, pp. 296-305, Feb. 2005.
[31] J. Sullivan and S. Carlsson, "Recognizing and Tracking Human Action," Proc. Seventh European Conf. Computer Vision, 2002.
[32] M. Müller, T. Röder, and M. Clausen, "Efficient Content-Based Retrieval of Motion Capture Data," ACM Trans. Graphics, vol. 24, no. 3, pp. 677-685, 2005.
[33] D. Weinland, E. Boyer, and R. Ronflard, "Action Recognition from Arbitrary Views Using 3D Exemplars," Proc. 11th IEEE Int'l Conf. Computer Vision, 2007.
[34] L. Sigal, A. Balan, and M. Black, "Humaneva: Synchronized Video and Motion Capture DataSet and Baseline Algorithm for Evaluation of Articulated Human Motion," Int'l J. Computer Vision, vol. 87, no. 1, pp. 4-27, 2010.
[35] M. Smith and T. Kanade, "Video Skimming and Characterization through the Combination of Image and Language Understanding Techniques," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 775-781, 1997.
[36] Y. Pritch, A. Rav-Acha, and S. Peleg, "Nonchronological Video Synopsis and Indexing," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 1971-1984, Nov. 2008.
[37] T. Tung and T. Matsuyama, "Topology Dictionary with Markov Model for 3D Video Content-Based Skimming and Description," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[38] P. Huang, A. Hilton, and J. Starck, "Human Motion Synthesis from 3D Video," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[39] A. Schödl, R. Szeliski, D. Salesin, and I. Essa, "Video Textures," Proc. ACM Siggraph, pp. 489-498, 2000.
[40] T. Mizuguchi, J. Buchanan, and T. Calvert, "Data Driven Motion Transitions for Interactive Games," Proc. Eurographics Short Presentations, 2001.
[41] P. Huang, A. Hilton, and J. Starck, "Shape Similarity for 3D Video Sequences of People," Int'l J. Computer Vision, vol. 89, nos. 2/3, pp. 362-381, 2010.
[42] P. Huang, T. Tung, S. Nobuhara, A. Hilton, and T. Matsuyama, "Comparison of Skeleton and Non-Skeleton Shape Descriptors for 3D Video," Proc. Int'l Symp. 3D Data Processing, Visualization, and Transmission, 2010.
[43] L. Molina-Tanco and A. Hilton, "Realistic Synthesis of Novel Human Movements from a Database of Motion Capture Examples," Proc. IEEE Workshop Human Motion, 2000.
[44] O. Arikan and D. Forsyth, "Interactive Motion Generation from Examples," ACM Trans. Graphics, vol. 21, no. 3, pp. 483-490, 2002.
[45] L. Kovar, M. Gleicher, and F.H. Pighin, "Motion Graphs," ACM Trans. Graphics, vol. 21, no. 3, pp. 473-482, 2002.
[46] J. Lee, J. Chai, P.S. Reitsman, J. Hodgins, and N.S. Pollard, "Interactive Control of Avatars Animated with Human Motion Data," ACM Trans. Graphics, vol. 21, no. 3, pp. 491-500, 2002.
[47] M. Hilaga, Y. Shinagawa, T. Kohmura, and T.L. Kunii, "Topology Matching for Fully Automatic Similarity Estimation of 3D Shapes," Proc. ACM Siggraph, pp. 203-212, 2001.
[48] G. Reeb, "On the Singular Points of a Completely Integrable Pfaff form or of a Numerical Function," Comptes Rendus Academie Sciences Paris, vol. 222, pp. 847-849, 1946.
[49] S. Park and J. Hodgins, "Capturing and Animating Skin Deformation in Human Motion," ACM Trans. Graphics, vol. 25, no. 3, pp. 881-889, 2006.
[50] O. Sorkine and M. Alexa, "As-Rigid-as-Possible Surface Modeling," Proc. Fifth Eurographics Symp. Geometry Processing, pp. 109-116, 2007.
[51] Y. Kho and M. Garland, "Sketching Mesh Deformations," ACM Trans. Graphics, vol. 24, no. 3, p. 934, 2005.
[52] T. Tung, "Shape Similarity Computation Using aMRG," tonytung.org, 2012.
[53] V. Pascucci, G. Scorzelli, P.-T. Bremer, and A. Mascarenhas, "Robust On-Line Computation of Reeb Graphs: Simplicity and Speed," ACM Trans. Graphics, vol. 26, no. 3, pp. 49-58, 2007.

Index Terms:
video signal processing,graph theory,image recognition,image sequences,learning (artificial intelligence),Markov processes,3D video progressive summarization,3D video understanding,data sets,3D video content,topology-based shape descriptor dictionary,pattern extraction,training sequences,Reeb graphs,Markov motion graph,topology change states,3D video sequences,content-based description,learning process,content-based event recognition,Three dimensional displays,Topology,Dictionaries,Shape,Video sequences,Solid modeling,Markov processes,semantic description.,3D video,dictionary,Reeb graph,topology matching,Markov model,editing,summarization
Citation:
T. Tung, T. Matsuyama, "Topology Dictionary for 3D Video Understanding," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 8, pp. 1645-1657, Aug. 2012, doi:10.1109/TPAMI.2011.258
Usage of this product signifies your acceptance of the Terms of Use.