Subscribe

Issue No.12 - Dec. (2012 vol.18)

pp: 2506-2515

Alexander Pilhofer , University of Augsburg

Alexander Gribov , University of Augsburg

Antony Unwin , University of Augsburg

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TVCG.2012.207

ABSTRACT

Classifying a set of objects into clusters can be done in numerous ways, producing different results. They can be visually compared using contingency tables [27], mosaicplots [13], fluctuation diagrams [15], tableplots [20] , (modified) parallel coordinates plots [28], Parallel Sets plots [18] or circos diagrams [19]. Unfortunately the interpretability of all these graphical displays decreases rapidly with the numbers of categories and clusterings. In his famous book A Semiology of Graphics [5] Bertin writes &#8220;the discovery of an ordered concept appears as the ultimate point in logical simplification since it permits reducing to a single instant the assimilation of series which previously required many instants of study&#8221;. Or in more everyday language, if you use good orderings you can see results immediately that with other orderings might take a lot of effort. This is also related to the idea of effect ordering [12], that data should be organised to reflect the effect you want to observe. This paper presents an efficient algorithm based on Bertin&#8217;s idea and concepts related to Kendall&#8217;s t [17], which finds informative joint orders for two or more nominal classification variables. We also show how these orderings improve the various displays and how groups of corresponding categories can be detected using a top-down partitioning algorithm. Different clusterings based on data on the environmental performance of cars sold in Germany are used for illustration. All presented methods are available in the R package extracat which is used to compute the optimized orderings for the example dataset.

INDEX TERMS

Optimization, Graphics, Classification, Stress measurement, Clustering algorithms, seriation, Order optimization, fluctuation diagrams, classification

CITATION

Alexander Pilhofer, Alexander Gribov, Antony Unwin, "Comparing Clusterings Using Bertin's Idea",

*IEEE Transactions on Visualization & Computer Graphics*, vol.18, no. 12, pp. 2506-2515, Dec. 2012, doi:10.1109/TVCG.2012.207REFERENCES

- [1] ADAC. The ADAC ecotest. http://www.adac.de/infotestrat/tests/eco-test default.aspx, 2012.
- [2] Z. Bar-Joseph, E. Demaine, D. Gifford,, and T. Jaakkola., A method for chronologically ordering archaeological deposits.
American Antiquity, 16: 293-301, 2001.- [3] O. Bastert and C. Matuszewski., Layered drawings of digraphs. In M. Kaufmann,, and D. Wagner, editors,
Drawing Graphs, 2025 of Lecture Notes in Computer Science, pages 87-120. Springer Berlin / Heidelberg, 2001.- [4] A. Ben-Hur, A. Elisseeff, and I. Guyon, A stability based method for dis-covering structure in clustered data
Pacific Symposium on Biocomputing, pages 6-17, 2002.- [5] J. Bertin.,
Graphics and Graphic Information Processing. Walter de Gruyter, Berlin, 1981.- [6] C. Chen, Generalized association plots: Information visualization via iteratively generated correlation matrices
Statistica Sinica, 12: 7-29, 2002.- [7] C. Chen, H. Hwu, W. Jang., C. Kao, Y. Tien., S. Tzeng, and H. Wu., Matrix visualization and information mining. In
Proceedings in Computational Statistics, pages 85-140, Heidelberg, 2004. Physika Verlag. - [8] J. Cohen, A coefficient of agreement for nominal scales
Educational and Psychological Measurement, 20(1): 37-46, 1960.- [9] A. Dasgupta and R. Kosara, Pargnostics: screen-space metrics for parallel coordinates
Visualization and Computer Graphics, IEEE Transactions on, 16(6): 1017-1026, nov.-dec. 2010.- [10] A. de Falguerolles, F. Friedrich, and G. Sawitzki, A tribute to J. Bertins graphical data analysis
Proceedings of the SoftStat 97, pages 11-20, 1997.- [11] M. Friendly, Corrgrams: Exploratory displays for correlation matrices
The American Statistician, 56(4): 316-324, 2002.- [12] M. Friendly and E. Kwan, Effect ordering for data displays
Computational Statistics and data Analysis, 43(4): 509-539, 2003.- [13] J. Hartigan and B. Kleiner, Mosaics for contingency tabless
Computing Science and Statistics, Proceedings of the 13th Symposium on the Inter-face, Springer-Verlag, 1981.- [14] M. Hashler, K. Hornik, and C. Buchta, Getting things in order: An introduction to the r package seriation
Journal of Statistical Software, 25(3): 1-34, 2008.- [15] H. Hofmann, Exploring categorical data: Interactive mosaic plots
Metrika, 51(1): 11-26, 2000.- [16] C. Hurley and R. Oldford, Pairwise display of high-dimensional information via eulerian tours and hamiltonian decompositions
Journal of Computational and Graphical Statistics, 19(4): 861-886, 2010.- [17] D. Kendall, Seriation from abundance matrices
Mathematics in the Archaeological and Historical Sciences, pages 214-252, 1971.- [18] R. Kosara and C. Ziemkiewicz., Parallel sets v2.1: Categorical data visualization. http://eagereyes.orgparallel-sets, 2009.
- [19] M. Krzywinski, Circos: an information aesthetic for comparative ge-nomics
Genome Res, 19: 1639-1645, 2009.- [20] E. K. wan, I. Lu, and M. Friendly, Tableplot: A new tool for assessing precise predictions
Journal of Psychology, 217: 38-48, 2009.- [21] J. Lenstra, Clustering a data array and the traveling salesman problem
Operations Research, 22: 413-414, 1974.- [22] A. Lex, M. Streit, C. Partl., K. Kashofer, and D. Schmalstieg, Comparative analysis of multidimensional, quantitative data
Visualization and Computer Graphics, IEEE Transactions on, 16(6): 1027-1035, nov.-dec. 2010.- [23] E. M. äkinen and H. Siirtola., Reordering the reorderable matrix as an algorithmic problem. In M. Anderson, P. Cheng, and V. Haarslev, editors,
, Theory and Application of Diagrams, 1889 of Lecture Notes in Computer Science, pages 453-468. Springer Berlin / Heidelberg, 2000.- [24] E. Mäkinen and H. Siirtola., The barycenter heuristic and the reorderable matrix.
Informatica, 29: 357-363, 2005.- [25] W. McCormick, P. Schweitzer, and T. White, Problem decomposition and data reorganization by a clustering technique
Operations Research, 20(5): 993-1009, 1972.- [26] S. Niermann, Optimizing the ordering of tables with evolutionary com-putation
The American Statistician, 59(1): 41, 2005.- [27] K. Pearson., On the Theory of Contingency and Its Relation to Association and Normal Correlation.
Drapers’ company research memoirs: Biomet-ric series. Cambridge University Press, 1904.- [28] A. Pilhöfer and A. Unwin., New approaches in visualization of categorical data: R-package extracat.
Journal of Statistical Software, accepted, 2011.- [29] R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2011. ISBN 3–900051-07– 0.
- [30] W. Rand, Objective criteria for the evaluation of clustering methods
Journal of the American Statistical Association, 66: 846-850, 1971.- [31] W. Rand, Performance criteria for graph clustering and markov cluster experiments
Technical Report INS-R0012 Centrum voor Wiskunde en Informatica, 2000.- [32] M. Schonlau, Visualizing categorical data arising in the health sciences using hammock plots
Technical report, RAND Corporation, 2003.- [33] S. Simonoff.,
Analysing Categorical Data. Springer-Verlag, 2003.- [34] K. Sugiyama, S. Tagama, and M. Toda, Methods for visual understanding of hierarchical system structures
IEEE Transactions on Systems, Man, and Cybernetics, 11(2): 109-125, 1981.- [35] M. Theus and S. Urbanek.,
Interactive Graphics for Data Analysis: Prin-ciples and Examples. Chapman & Hall, 2008.- [36] S. Urbanek and M. Theus., iPlots - high interaction graphics for r. In
Proceedings of the DSC 2003 Conference, 2003.- [37] L. Wilkinson and M. Friendly, The history of the cluster heat map
The American Statistician, 63(2): 179-184, 2009. |