The Community for Technology Leaders
RSS Icon
Issue No.12 - Dec. (2012 vol.18)
pp: 2917-2926
Sean Kandel , Stanford University
Andreas Paepcke , Stanford University
Joseph M. Hellerstein , University of California, Berkeley
Jeffrey Heer , Stanford University
Organizations rely on data analysts to model customer engagement, streamline operations, improve production, inform business decisions, and combat fraud. Though numerous analysis and visualization tools have been built to improve the scale and efficiency at which analysts can work, there has been little research on how analysis takes place within the social and organizational context of companies. To better understand the enterprise analysts’ ecosystem, we conducted semi-structured interviews with 35 data analysts from 25 organizations across a variety of sectors, including healthcare, retail, marketing and finance. Based on our interview data, we characterize the process of industrial data analysis and document how organizational features of an enterprise impact it. We describe recurring pain points, outstanding challenges, and barriers to adoption for visual analytic tools. Finally, we discuss design implications and opportunities for visual analysis research.
Organizations, Data visualization, Distributed databases, Collaboration, Computer hacking, enterprise, Data, analysis, visualization
Sean Kandel, Andreas Paepcke, Joseph M. Hellerstein, Jeffrey Heer, "Enterprise Data Analysis and Visualization: An Interview Study", IEEE Transactions on Visualization & Computer Graphics, vol.18, no. 12, pp. 2917-2926, Dec. 2012, doi:10.1109/TVCG.2012.219
[1] R. Amar, J. Eagan, and J. Stasko., Low-level components of analytic activity in information visualization. In Proc. IEEE Information Visualization (Info Vis), pages 111-117, 2005.
[2] S. P. Callahan, J. Freire, E. Santos,C. E. Scheidegger,C. T. Silva,, and H. T. Vo., VisTrails: visualization meets data management. In Proc. ACM SIGMOD, pages 745-747, 2006.
[3] D. B. Carr,R. J. Littlefield,W. L. Nicholson,, and J. S. Littlefield., Scatterplot matrix techniques for large N Journal of the American Statistical Association, 82(398): 424-436, 1987.
[4] S. Chaudhuri, K. Ganjam, V. Ganti,, and R. Motwani., Robust and efficient fuzzy match for online data cleaning. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data, SIGMOD ‘03, pages 313-324, New York, NY, USA, 2003. ACM.
[5] G. Chin,O. A. Kuchar,, and K. E. Wolf., Exploring the analytical processes of intelligence analysts. In Proc. ACM Human Factors in Computing Systems (CHI), pages 11-20, 2009.
[6] P. Christen., Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ‘08, pages 1065-1068, New York, NY, USA, 2008. ACM.
[7] C. M. Danis,F. B. Viégas, M. Wattenberg, and J. Kriss., Your place or mine?: visualization as a community component. In Proc. ACM Human Factors in Computing Systems (CHI), pages 275-284, 2008.
[8] A. K. Elmagarmid,P. G. Ipeirotis,, and V. S. Verykios., Duplicate record detection: A survey IEEE Trans. Knowledge & Data Engineering, 19(1): 1-16, 2007.
[9] G. Fink, C. North, A. Endert,, and S. Rose., Visualizing cyber security: Usable workspaces. In Visualization for Cyber Security, 2009. VizSec 2009. 6th International Workshop on, pages 45-56, 2009.
[10] D. Fisher, I. Popov, S. Drucker,, and m. schraefel., Trust me, I'm partially right: Incremental visualization lets analysts explore large datasets faster. In Proc. ACM Human Factors in Computing Systems (CHI), pages 1673-1682, 2012.
[11] J. Freire, D. Koop, E. Santos,, and C. T. Silva., Provenance for computational tasks: A survey. Computing in Science and Engineering, 10: 11-21, 2008.
[12] D. Gotz and M. X. Zhou., Characterizing users’ visual analytic activity for insight provenance Information Visualization, 8: 42-55, 2009.
[13] L. M. Haas,M. A. Hernández, H. Ho, L. Popa,, and M. Roth., Clio grows up: from research prototype to industrial tool. In ACM SIGMOD, pages 805-810, 2005.
[14] J. Heer and M. Agrawala, Design considerations for collaborative visual analytics Information Visualization, 7: 49-62, 2008.
[15] J. Heer, J. Mackinlay, C. Stolte,, and M. Agrawala., Graphical histories for visualization: Supporting analysis, communication, and evaluation. IEEE Trans. Visualization & Computer Graphics (Proc. InfoVis), 14: 1189-1196, 2008.
[16] J. M. Hellerstein., Quantitative data cleaning for large databases, 2008. White Paper, United Nations Economic Commission for Europe.
[17] J. M. Hellerstein,P. J. Haas,, and H. J. Wang., Online aggregation. In Proc. ACM SIGMOD, pages 171-182, 1997.
[18] R. J, Heuer Psychology of Intelligence Analysis. Center for the Study of Intelligence, 1999.
[19] V. Hodge and J. Austin, A survey of outlier detection methodologies Artificial Intelligence Review, 22(2): 85-126, 2004.
[20] P. Isenberg, D. Fisher, M. Morris., K. Inkpen, and M. Czerwinski., An exploratory study of co-located collaborative visual analytics around a tabletop display. In Proc. IEEE Visual Analytics Science and Technology (VAST), pages 179-186, 2010.
[21] P. Isenberg, A. Tang, and S. Carpendale., An exploratory study of visual information analysis. In Proc. ACM Humanfactors in Computing Systems (CHI), pages 1217-1226, 2008.
[22] S. Kandel, J. Heer, C. Plaisant., J. Kennedy, F. van Ham,N. H. Riche, C. Weaver, B. Lee., D. Brodbeck, and P. Buono, Research directions in data wrangling: Visualizations and transformations for usable and credi-ble data Information Visualization, 10: 271-288, 2011.
[23] H. Kang, L. Getoor, B. Shneiderman., M. Bilgic, and L. Licamele, Interactive entity resolution in relational data: A visual analytic tool and its evaluation IEEE Trans. Visualization & Computer Graphics, 14(5): 999-1014, 2008.
[24] Y. Kang,C. Görg,, and J. Stasko., Evaluating visual analytics systems for investigative analysis: Deriving design principles from a case study. In Proc. IEEE Visual Analytics Science and Technology (VAST), pages 139-146, 2009.
[25] Y. Kang and J. Stasko., Characterizing the intelligence analysis process: Informing visual analytics design through a longitudinal field study. In Proc. IEEE Visual Analytics Science and Technology (VAST), pages 21-30, 2011.
[26] B. Kwon, B. Fisher, and J. S. Yi., Visual analytic roadblocks for novice investigators. In Proc. IEEE Visual Analytics Science and Technology (VAST), pages 3-11, 2011.
[27] J. Manyika, M. Chui, B. Brown., J. Bughin, R. Dobbs., C. Roxburgh, and A. H. Byers., Big data: The next frontier for innovation, competition, and productivity, May 2011.
[28] C. W. Olofson and D. Vesset., Worldwide Hadoop-MapReduce ecosystem software 2012–2016 forecast. May 2012.
[29] C. Olston, B. Reed, U. Srivastava., R. Kumar, and A. Tomkins., Pig latin: a not-so-foreign language for data processing. In Proc. ACM SIGMOD, pages 1099-1110, 2008.
[30] P. Pirolli and S. Card., The sensemaking process and leverage points for analyst technology as identified through cognitive task analysis. In Proc. International Conference on Intelligence Analysis, 2005.
[31] E. Rahm and P. A. Bernstein., A survey of approaches to automatic schema matching The VLDB Journal, 10: 334-350, 2001.
[32] E. Rahm and H. H. Do., Data cleaning: Problems and current approaches IEEE Data Engineering Bulletin, 23, 2000.
[33] G. G. Robertson,M. P. Czerwinski,, and J. E. Churchill., Visualization of mappings between schemas. In Proc. ACM Human Factors in Computing Systems (CHI), pages 431-439, 2005.
[34] D. M. Russell,M. J. Stefik, P. Pirolli, and S. K. Card., The cost structure of sensemaking. In Proc. ACM Human Factors in Computing Systems (CHI), pages 269-276, 1993.
[35] M. Sedlmair, P. Isenberg, D. Baur,, and A. Butz., Evaluating information visualization in large companies: Challenges, experiences and recommendations. In Proc. CHI Workshop Beyond Time and Errors: Novel Evaluation Methods for Information Visualization (BELIV), 2010.
[36] J. Srivastava, R. Cooley, M. Deshpande,, and P.-N. Tan., Web usage mining: discovery and applications of usage patterns from web data. SIGKDD Explorations Newsletter, 1(2): 12-23, Jan. 2000.
[37] M. Wattenberg and J. Kriss, Designing for social data analysis IEEE Trans. Visualization & Computer Graphics, 12(4): 549-557, July 2006.
21 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool