Subscribe

Issue No.12 - Dec. (2012 vol.18)

pp: 2849-2858

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TVCG.2012.254

ABSTRACT

Contingency tables summarize the relations between categorical variables and arise in both scientific and business domains. Asymmetrically large two-way contingency tables pose a problem for common visualization methods. The Contingency Wheel has been recently proposed as an interactive visual method to explore and analyze such tables. However, the scalability and readability of this method are limited when dealing with large and dense tables. In this paper we present Contingency Wheel++, new visual analytics methods that overcome these major shortcomings: (1) regarding automated methods, a measure of association based on Pearson's residuals alleviates the bias of the raw residuals originally used, (2) regarding visualization methods, a frequency-based abstraction of the visual elements eliminates overlapping and makes analyzing both positive and negative associations possible, and (3) regarding the interactive exploration environment, a multi-level overview+detail interface enables exploring individual data items that are aggregated in the visualization or in the table using coordinated views. We illustrate the applicability of these new methods with a use case and show how they enable discovering and analyzing nontrivial patterns and associations in large categorical data.

INDEX TERMS

data visualisation, contingency wheel++, scalable visual analytics, large categorical data, business domains, scientific domains, two-way contingency tables, visualization methods, automated methods, Pearson residuals, frequency-based abstraction, visual elements, positive associations, negative associations, interactive exploration environment, multilevel overview+detail interface, coordinated views, nontrivial patterns, Motion pictures, Histograms, Frequency measurement, Visual analytics, Data visualization, visual analytics, Large categorical data, contingency table analysis, information interfaces and representation

CITATION

B. Alsallakh, W. Aigner, S. Miksch, M. E. Groller, "Reinventing the Contingency Wheel: Scalable Visual Analytics of Large Categorical Data",

*IEEE Transactions on Visualization & Computer Graphics*, vol.18, no. 12, pp. 2849-2858, Dec. 2012, doi:10.1109/TVCG.2012.254REFERENCES

- [1] B. Alsallakh, E. Gröller, S. Miksch,, and M. Suntinger., Contingency Wheel: Visual Analysis of Large Contingency Tables. In
Euro VA 2011: International Workshop on Visual Analytics, pages 53-56, Bergen, Nor-way, 2011. Eurographics Association. - [2] F. Bendix, R. Kosara, and H. Hauser., Parallel sets: visual analysis of categorical data. In
Proceedings of the IEEE Symposium on Information Visualization, pages 133-140, 2005.- [3] J. P, Benzécri
Correspondence Analysis Handbook. Marcel Dekker, New York, 1990.- [4] M. R. Berthold, N. Cebron, F. Dill,T. R. Gabriel,T. Kötter, T. Meinl, P. Ohl., C. Sieb, K. Thiel,, and B. Wiswedel., KNIME: The Konstanz information miner. In
Data Analysis, Machine Learning and Applications, Studies in Classification, Data Analysis, and Knowledge Organization, pages 319-326. Springer Berlin Heidelberg, 2008.- [5] J. Bertin.,
Semiology of graphics: diagrams, networks, maps. University of Wisconsin Press, Madison, Wisconsin, USA., 1983.- [6] A. Dix and G. Ellis., By chance: enhancing interaction with large data sets through statistical sampling. In
Proceedings of the Working Conference on Advanced Visual Interfaces, AVI ‘02, pages 167-176, New York, NY, USA, 2002. ACM. - [7] J.-D. Fekete and C. Plaisant., Interactive information visualization of a million items. In
IEEE Symposium on Information Visualization, 2002., pages 117-124, 2002.- [8] M. Friendly., Graphical methods for categorical data. In
SAS User Group International Conference Proceeding, 17, pages 190-200, 1992.- [9] M. J. Greenacre and J. Blasius,
Multiple correspondence analysis and related methods. Chapman & Hall/CRC, 2006.- [10] GroupLens. MovieLens data sets. http://www.grouplens.org/node73. Accessed: August 2012.
- [11] M. Hall, E. Frank, G. Holmes., B. Pfahringer, P. Reutemann,, and I. H. Witten., The WEKA data mining software: an update.
SIGKDD Explor. Newsl. 11(1): 10-18. Nov. 2009.- [12] R. L, Harris
Information Graphics: A Comprehensive Illustrated Reference. Oxford University Press, Inc., New York, NY, USA, 1999.- [13] M. Harrower and C. Brewer, ColorBrewer.org: An Online Tool for Selecting Colour Schemes for Maps
The Cartographic Journal, pages 2737, June 2003.- [14] J. A. Hartigan and B. Kleiner., Mosaics for contingency tables. In
Computer Science and Statistics: Proceedings of the 13th Symposium on the Interface, pages 268-273. Springer-Verlag, 1981.- [15] H. Hauser, F. Ledermann, and H. Doleisch., Angular brushing of extended parallel coordinates. In
IEEE Symposium on Information Visualization, pages 127-130, 2002.- [16] H. Hochheiser and B. Shneiderman, Dynamic query tools for time series data sets: timebox widgets for interactive exploration
Information Visualization, 3(1): 1-18, Mar. 2004.- [17] J. Johansson, P. Ljung, M. Jern,, and M. Cooper., Revealing structure in visualizations of dense 2d and 3d parallel coordinates.
Information Visualization, 5(2): 125-136, June 2006.- [18] S. Johansson, M. Jern, and J. Johansson., Interactive quantification of categorical variables in mixed data sets. In
Proceedings of the 12th International Conference on Information Visualisation, pages 3-10, Washington, DC, USA, 2008. IEEE Computer Society. - [19] I. T, Jolliffe
Principal Component Analysis. Springer, second edition, Oct. 2002.- [20] D. Keirn, F. Mansmann, J. Schneidewind., J. Thomas, and H. Ziegler., Visual analytics: Scope and challenges. In
Visual Data Mining, 4404 of Lecture Notes in Computer Science, pages 76-90. Springer Berlin / Heidelberg, 2008.- [21] R. Kosara, F. Bendix, and H. Hauser., Timehistograms for large, time-dependent data. In O. Deusscn, C. Hansen, D. Keim,, and D. Saupe, editors,
Symposium on Visualization (VisSym), pages 45-54, 340. Eurographics Association, 2004.- [22] R. Kosara, S. Miksch, and H. Hauser, Focus+context taken literally
IEEE Computer Graphics and Applications, 22: 22-29, 2002.- [23] S. Kriglstein, F. Scholz, M. Pohl., B. Alsallakh, and S. Miksch., Contingency wheel evaluation: Results from an interview study.
Technical Report CVAST-2012–2, Vienna University of Technology, Vienna, Austria, March 2012.- [24] J. B. Kruskal and M. Wish., Multidimensional scaling
Methods, 116(2): 463-504, 1978.- [25] M. Krzywinski, J. Schein, I. Birol., J. Connors, R. Gascoyne,D. Hors-man,S. J. Jones,, and M. A. Marra., Circos: an information aesthetic for cornnarative senornics.
Genome Research. 19(9): 1639-1645. 2009.- [26] D. Meyer, A. Zeileis, and K. Hornik., Visualizing independence using extended association plots. In K. Hornik, F. Leisch, and A. Zeileis, editors,
Proceedings of the 3rd International Workshop on Distributed Statistical Computing, Vienna, Austria, 2003.- [27] H. Piringer, and M. Buchetics., Exploring proportions: Comparative visualization of categorical data. In
IEEE Conference on Visual Analytics Science and Technology (VAST), pages 295-296, 2011.- [28] R Development Core Team.
R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2009.- [29] D. Rafiei and S. Curial, Effectively visualizing large networks through sampling
Visualization Conference, IEEE, pages 375-382, 2005.- [30] J. N. K. Rao and A. J. Scott., The analysis of categorical data from complex sample surveys: Chi-squared tests for goodness of fit and independence in two-way tables. The Journal of the American Statistical Association, 76: 221-230, 1981.
- [31] J. O. Robinson,
The Psychology of Visual Illusion. Dover Publications, Inc., 1998.- [32] J. Rodrigues,J. F. A. Traina, and J. Traina,C. Frequency plot, and relevance plot to enhance visual data exploration. In
Proceedings of Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI 2003), pages 117-124, Oct. 2003.- [33] G. E. Rosario,E. A. Rundensteiner,D. C. Brown,M. O. Ward,, and S. Huang., Mapping nominal values to numbers for effective visualization
Information Visualization, 3(2): 80-95, June 2004.- [34] D. W. Scott., On optimal and data-based histograms
Biometrika, 66(3): 605-610, Dec. 1979.- [35] B. Shneiderman, Tree visualization with tree-maps: 2-d space-filling approach
ACM Transactions on Graphics (TOG), 11(1): 92-99, Jan. 1992.- [36] B. Shneiderman., The eyes have it: a task by data type taxonomy for information visualizations. In
Proceedings of IEEE Symposium on Visual Languages, pages 336-343, 1996.- [37] J. S, Simonoff
Analyzing Categorical Data. Springer-Verlag, New York, USA, 2nd edition, 2003.- [38] M. C. Stone, K. Fishkin, and E. A. Bier., The movable filter as a user interface tool. In
Proceedings of the SIGCHI conference on Human factors in computing systems: celebrating interdependence, CHI ‘94, pages 306-312, New York, NY, USA, 1994. ACM. - [39] J. J. Thomas and K. A., Cook
Illuminating the Path: The Research and Development Agendafor Visual Analvtics. IEEE Computer Society, 2005.- [40] A. Unwin, M. Theus, and H. Hofmann.,
Graphics of Large Datasets: Visualizing a Million. Springer-Verlag New York, Inc., Secaucus, NJ, USA. 2006.- [41] M. Wertheimer., Laws of organization in perceptual forms. In W. D. Ellis, editor,
A sourcebook of Gestalt psychology, pages 71-88., Routledge and Kegan Paul, 1938.- [42] T. Xiong, S. Wang, A. Mayers,, and E. Monga., A new MCA-based divi-sive hierarchical algorithm for clustering categorical data. In
Proceedings of IEEE International Conference on Data Mining, pages 1058-1063. IEEE Computer Society, 2009. |