The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - Dec. (2012 vol.18)
pp: 2506-2515
Alexander Pilhofer , University of Augsburg
Alexander Gribov , University of Augsburg
Antony Unwin , University of Augsburg
ABSTRACT
Classifying a set of objects into clusters can be done in numerous ways, producing different results. They can be visually compared using contingency tables [27], mosaicplots [13], fluctuation diagrams [15], tableplots [20] , (modified) parallel coordinates plots [28], Parallel Sets plots [18] or circos diagrams [19]. Unfortunately the interpretability of all these graphical displays decreases rapidly with the numbers of categories and clusterings. In his famous book A Semiology of Graphics [5] Bertin writes “the discovery of an ordered concept appears as the ultimate point in logical simplification since it permits reducing to a single instant the assimilation of series which previously required many instants of study”. Or in more everyday language, if you use good orderings you can see results immediately that with other orderings might take a lot of effort. This is also related to the idea of effect ordering [12], that data should be organised to reflect the effect you want to observe. This paper presents an efficient algorithm based on Bertin’s idea and concepts related to Kendall’s t [17], which finds informative joint orders for two or more nominal classification variables. We also show how these orderings improve the various displays and how groups of corresponding categories can be detected using a top-down partitioning algorithm. Different clusterings based on data on the environmental performance of cars sold in Germany are used for illustration. All presented methods are available in the R package extracat which is used to compute the optimized orderings for the example dataset.
INDEX TERMS
Optimization, Graphics, Classification, Stress measurement, Clustering algorithms, seriation, Order optimization, fluctuation diagrams, classification
CITATION
Alexander Pilhofer, Alexander Gribov, Antony Unwin, "Comparing Clusterings Using Bertin's Idea", IEEE Transactions on Visualization & Computer Graphics, vol.18, no. 12, pp. 2506-2515, Dec. 2012, doi:10.1109/TVCG.2012.207
REFERENCES
[1] ADAC. The ADAC ecotest. http://www.adac.de/infotestrat/tests/eco-test default.aspx, 2012.
[2] Z. Bar-Joseph, E. Demaine, D. Gifford,, and T. Jaakkola., A method for chronologically ordering archaeological deposits. American Antiquity, 16: 293-301, 2001.
[3] O. Bastert and C. Matuszewski., Layered drawings of digraphs. In M. Kaufmann,, and D. Wagner, editors, Drawing Graphs, 2025 of Lecture Notes in Computer Science, pages 87-120. Springer Berlin / Heidelberg, 2001.
[4] A. Ben-Hur, A. Elisseeff, and I. Guyon, A stability based method for dis-covering structure in clustered data Pacific Symposium on Biocomputing, pages 6-17, 2002.
[5] J. Bertin., Graphics and Graphic Information Processing. Walter de Gruyter, Berlin, 1981.
[6] C. Chen, Generalized association plots: Information visualization via iteratively generated correlation matrices Statistica Sinica, 12: 7-29, 2002.
[7] C. Chen, H. Hwu, W. Jang., C. Kao, Y. Tien., S. Tzeng, and H. Wu., Matrix visualization and information mining. In Proceedings in Computational Statistics, pages 85-140, Heidelberg, 2004. Physika Verlag.
[8] J. Cohen, A coefficient of agreement for nominal scales Educational and Psychological Measurement, 20(1): 37-46, 1960.
[9] A. Dasgupta and R. Kosara, Pargnostics: screen-space metrics for parallel coordinates Visualization and Computer Graphics, IEEE Transactions on, 16(6): 1017-1026, nov.-dec. 2010.
[10] A. de Falguerolles, F. Friedrich, and G. Sawitzki, A tribute to J. Bertins graphical data analysis Proceedings of the SoftStat 97, pages 11-20, 1997.
[11] M. Friendly, Corrgrams: Exploratory displays for correlation matrices The American Statistician, 56(4): 316-324, 2002.
[12] M. Friendly and E. Kwan, Effect ordering for data displays Computational Statistics and data Analysis, 43(4): 509-539, 2003.
[13] J. Hartigan and B. Kleiner, Mosaics for contingency tabless Computing Science and Statistics, Proceedings of the 13th Symposium on the Inter-face, Springer-Verlag, 1981.
[14] M. Hashler, K. Hornik, and C. Buchta, Getting things in order: An introduction to the r package seriation Journal of Statistical Software, 25(3): 1-34, 2008.
[15] H. Hofmann, Exploring categorical data: Interactive mosaic plots Metrika, 51(1): 11-26, 2000.
[16] C. Hurley and R. Oldford, Pairwise display of high-dimensional information via eulerian tours and hamiltonian decompositions Journal of Computational and Graphical Statistics, 19(4): 861-886, 2010.
[17] D. Kendall, Seriation from abundance matrices Mathematics in the Archaeological and Historical Sciences, pages 214-252, 1971.
[18] R. Kosara and C. Ziemkiewicz., Parallel sets v2.1: Categorical data visualization. http://eagereyes.orgparallel-sets, 2009.
[19] M. Krzywinski, Circos: an information aesthetic for comparative ge-nomics Genome Res, 19: 1639-1645, 2009.
[20] E. K. wan, I. Lu, and M. Friendly, Tableplot: A new tool for assessing precise predictions Journal of Psychology, 217: 38-48, 2009.
[21] J. Lenstra, Clustering a data array and the traveling salesman problem Operations Research, 22: 413-414, 1974.
[22] A. Lex, M. Streit, C. Partl., K. Kashofer, and D. Schmalstieg, Comparative analysis of multidimensional, quantitative data Visualization and Computer Graphics, IEEE Transactions on, 16(6): 1027-1035, nov.-dec. 2010.
[23] E. M. äkinen and H. Siirtola., Reordering the reorderable matrix as an algorithmic problem. In M. Anderson, P. Cheng, and V. Haarslev, editors, , Theory and Application of Diagrams, 1889 of Lecture Notes in Computer Science, pages 453-468. Springer Berlin / Heidelberg, 2000.
[24] E. Mäkinen and H. Siirtola., The barycenter heuristic and the reorderable matrix. Informatica, 29: 357-363, 2005.
[25] W. McCormick, P. Schweitzer, and T. White, Problem decomposition and data reorganization by a clustering technique Operations Research, 20(5): 993-1009, 1972.
[26] S. Niermann, Optimizing the ordering of tables with evolutionary com-putation The American Statistician, 59(1): 41, 2005.
[27] K. Pearson., On the Theory of Contingency and Its Relation to Association and Normal Correlation. Drapers’ company research memoirs: Biomet-ric series. Cambridge University Press, 1904.
[28] A. Pilhöfer and A. Unwin., New approaches in visualization of categorical data: R-package extracat. Journal of Statistical Software, accepted, 2011.
[29] R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2011. ISBN 3–900051-07– 0.
[30] W. Rand, Objective criteria for the evaluation of clustering methods Journal of the American Statistical Association, 66: 846-850, 1971.
[31] W. Rand, Performance criteria for graph clustering and markov cluster experiments Technical Report INS-R0012 Centrum voor Wiskunde en Informatica, 2000.
[32] M. Schonlau, Visualizing categorical data arising in the health sciences using hammock plots Technical report, RAND Corporation, 2003.
[33] S. Simonoff., Analysing Categorical Data. Springer-Verlag, 2003.
[34] K. Sugiyama, S. Tagama, and M. Toda, Methods for visual understanding of hierarchical system structures IEEE Transactions on Systems, Man, and Cybernetics, 11(2): 109-125, 1981.
[35] M. Theus and S. Urbanek., Interactive Graphics for Data Analysis: Prin-ciples and Examples. Chapman & Hall, 2008.
[36] S. Urbanek and M. Theus., iPlots - high interaction graphics for r. In Proceedings of the DSC 2003 Conference, 2003.
[37] L. Wilkinson and M. Friendly, The history of the cluster heat map The American Statistician, 63(2): 179-184, 2009.
19 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool