The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - Dec. (2012 vol.18)
pp: 2382-2391
B. Rieck , Interdiscipl. Center for Sci. Comput. (IWR), Heidelberg Univ., Heidelberg, Germany
H. Mara , Interdiscipl. Center for Sci. Comput. (IWR), Heidelberg Univ., Heidelberg, Germany
H. Leitte , Interdiscipl. Center for Sci. Comput. (IWR), Heidelberg Univ., Heidelberg, Germany
ABSTRACT
The extraction of significant structures in arbitrary high-dimensional data sets is a challenging task. Moreover, classifying data points as noise in order to reduce a data set bears special relevance for many application domains. Standard methods such as clustering serve to reduce problem complexity by providing the user with classes of similar entities. However, they usually do not highlight relations between different entities and require a stopping criterion, e.g. the number of clusters to be detected. In this paper, we present a visualization pipeline based on recent advancements in algebraic topology. More precisely, we employ methods from persistent homology that enable topological data analysis on high-dimensional data sets. Our pipeline inherently copes with noisy data and data sets of arbitrary dimensions. It extracts central structures of a data set in a hierarchical manner by using a persistence-based filtering algorithm that is theoretically well-founded. We furthermore introduce persistence rings, a novel visualization technique for a class of topological features-the persistence intervals-of large data sets. Persistence rings provide a unique topological signature of a data set, which helps in recognizing similarities. In addition, we provide interactive visualization techniques that assist the user in evaluating the parameter space of our method in order to extract relevant structures. We describe and evaluate our analysis pipeline by means of two very distinct classes of data sets: First, a class of synthetic data sets containing topological objects is employed to highlight the interaction capabilities of our method. Second, in order to affirm the utility of our technique, we analyse a class of high-dimensional real-world data sets arising from current research in cultural heritage.
INDEX TERMS
topology, data analysis, data structures, data visualisation, history, information filtering, pattern classification, cultural heritage, multivariate data analysis, topological signatures, significant structures extraction, arbitrary high-dimensional data sets, data points classification, problem complexity, stopping criterion, visualization pipeline, algebraic topology, persistent homology, topological data analysis, noisy data, central structures, hierarchical manner, persistence-based filtering algorithm, persistence rings, topological features, large data sets, interactive visualization techniques, parameter space evaluation, relevant structures, analysis pipeline, synthetic data sets, topological objects, interaction capability, high-dimensional real-world data sets, Network topology, Clustering methods, Multivariate data sets, clustering, Topological persistence, multivariate data
CITATION
B. Rieck, H. Mara, H. Leitte, "Multivariate Data Analysis Using Persistence-Based Filtering and Topological Signatures", IEEE Transactions on Visualization & Computer Graphics, vol.18, no. 12, pp. 2382-2391, Dec. 2012, doi:10.1109/TVCG.2012.248
REFERENCES
[1] G. Carlsson, Topology and data Bulletin of the American Mathematical Society, 46: 255-308, 2009.
[2] G. Carlsson, T. Ishkhanov, V. de Silva,, and A. Zomorodian., On the local behavior of spaces of natural images. International Journal of Computer Vision, 76(1): 1-12, 2008.
[3] H. Carr, J. Snoeyink, and U. Axen., Computing contour trees in all dimensions. In Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms, SODA ‘00, pages 918-926. SIAM, 2000.
[4] F. Chazal,D. Cohen-Steiner,, and Q. Merigot,Geometric inference for measures based on distance functions. Rapport de recherche 6930, INRIA, May 2009.
[5] F. Chazal, L. Guibas, S. Oudot,, and P. Skraba., Persistence-based clustering in Riemannian manifolds. In Proc. 27th Annual ACM Symposium on Computational Geometry, pages 97-106, 2011.
[6] D. Cohen-Steiner, H. Edelsbrunner, and J. Harer., Stability of persistence diagrams. Discrete& Computational Geometry, 37(1): 103-120, Jan. 2007.
[7] V. de Silva and G. Carlsson., Topological estimation using witness com-plexes. IEEE/Eurographics Symposium on Point-Based Graphics, pages 157-166, 2004.
[8] V. de Silva and R. Ghrist., Coordinate-free coverage in sensor networks with controlled boundaries via homology. International Journal of Robotics Research, 25(12): 1205-1222, Dec. 2006.
[9] V. de Silva and R. Ghrist., Coverage in sensor networks via persistent homology. Algebraic & Geometric Topology, 7: 339-358, 2007.
[10] P. Diaconis and M. Shahshahani, The subgroup algorithm for generating uniform random variables Probability in the Engineering and Informational Sciences, 1(1): 15-32, 1987.
[11] H. Edelsbrunner and J. L, Harer Computational topology. American Mathematical Society, Providence, RI, 2010.
[12] H. Edelsbrunner, D. Letscher, and A. Zomorodian, Topological per-sistence and simplification Discrete & Computational Geometry, 28(4): 511-533, 2002.
[13] M. Ester, H.-P. Kriegel, J. Sander,, and X. Xu., A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pages 226-231, 1996.
[14] R. Ghrist, Barcodes: The persistent topology of data Bulletin of the American Mathematical Society, 45: 61-75, 2008.
[15] A. Gyulassy, V. Natarajan, V. Pascucci., P.-T. Bremer, and B. Hamann., Topology-based simplification for feature extraction from 3d scalar fields. In Proceedings of IEEE Conference on Visualization, 2005.
[16] A. Hatcher., Algebraic topology. Cambridge University Press, 2002.
[17] A. Hinneburg and D. A. Keirn., An efficient approach to clustering in large multimedia databases with noise. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, pages 58-65, 1998.
[18] P. J. Huber., Projection pursuit The Annals of Statistics, 13(2): 435-475, 1985.
[19] S. B. Kotsiantis and P. E. Pintelas., Recent advances in clustering: A brief survey WSEAS Transactions on Information Science and Applications, 1: 73-81, 2004.
[20] J. M, Lee Introduction to topological manifolds. Graduate Texts in Mathematics. Springer, 2000.
[21] H. Mara., Multi-Scale Integral Invariants for Robust Character Extraction from Irregular Polygon Mesh Data. PhD thesis, Heidelberg University, 2012 (submitted).
[22] H. Mara,S. Krömker, S. Jakob, and B. Breuckmann., GigaMesh and Gilgamesh - 3D Multiscale Integral Invariant Cuneiform Character Extraction. In Proc. VAST Int. Symposium on Virtual Reality, Archaeology and Cultural Heritage, pages 131-138, Palais du Louvre, Paris, France, 2010.
[23] S. Marsland., Machine learning - An algorithmic perspective. Chapmann & Hall / CRC Press, 2009.
[24] K. Moreland., Diverging color maps for scientific visualization. In Proceedings of the 5th International Symposium on Advances in Visual Computing: Part II, ISVC ‘09, pages 92-103, Berlin, Heidelberg, 2009. Springer-Verlag.
[25] J. R, Munkres Elements of algebraic topology. Addison-Wesley Pub-lishing Company, Inc., 1984.
[26] P. Oesterling, C. Heine, H. Janicke., G. Scheuermann, and G. Heyer., Vi-sualization of high-dimensional point clouds using their density distri-bution's topology. IEEE Transactions on Visualization and Computer Graphics, 17(11): 1547-1559, November 2011.
[27] V. Pascucci and K. Cole-McLaughlin., Efficient computation of the topology of level sets. In Visualization, 2002. VIS 2002. IEEE, pages 187-194, November 2002.
[28] V. Pascucci,K. Cole-McLaughlin,, and G. Scorzelli., The TOPORRERY: Computation and presentation of multi-resolution topology. In Mathematical Foundations of Scientific Visualization, Computer Graphics, and Massive Data Exploration, pages 19-40. Springer, 2009.
[29] H. Pottmann, J. Wallner, Q.-X. Huang,, and Y.-L. Yang., Integral invariants for robust geometry processing. Computer Aided Geometric Design, 26: 37-60, January 2009.
[30] G. Singh, F. Memoli, and G. Carlsson., Topological methods for the analysis of high dimensional data sets and 3d object recognition. In Euro-graphics Symposium on Point Based Graphics, pages 91-100, 2007.
[31] G. Singh, F. Memoli, T. Ishkhanov., G. Sapiro, G. Carlsson,, and D. Ringach., Topological analysis of population activity in visual cortex. Journal of Vision, 8(8), 2008.
[32] L. Vietoris., Uber den höheren Zusammenhang kompakter Räume und eine Klasse von zusammenhangstreuen Abbildungen. Mathematische An-nalen, 97(1): 454-472, 1927.
[33] M. Ward, G. Grinstein, and D. Keirn., Interactive data visualization: Foundations, techniques, and applications. A K Peters, Ltd., 2010.
[34] G. Weber, P.-T. Bremer, and V. Pascucci, Topological landscapes: A terrain metaphor for scientific data IEEE Transactions on Visualization and Computer Graphics, 13(6): 1416-1423, 2007.
[35] G. Weber, S. Dillard, H. Carr., V. Pascucci, and B. Hamann, Topology-controlled volume rendering IEEE Transactions on Visualization and Computer Graphics, 13(2): 330-341, 2007.
[36] L. Yang., Distance metric learning: A comprehensive survey. Technical report, Michigan State University, May 2006.
[37] A. Zomorodian., Topology for computing. Cambridge monographs on applied and computational mathematics. Cambridge University Press, 2005.
[38] A. Zomorodian., Fast construction of the Vietoris-Rips complex. Com-puters & Graphics, 34(3): 263-271, 2010.
[39] A. Zomorodian and G. Carlsson, Computing persistent homology Discrete and Computational Geometry, 33(2): 249-274, 2005.
[40] A. Zomorodian and G. Carlsson, Localized homology Computational Geometry, 41(3): 126-148, 2008.
38 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool