This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Knowledge Discovery in High-Dimensional Data: Case Studies and a User Survey for the Rank-by-Feature Framework
May/June 2006 (vol. 12 no. 3)
pp. 311-322
Jinwook Seo, IEEE Computer Society

Abstract—Knowledge discovery in high-dimensional data is a challenging enterprise, but new visual analytic tools appear to offer users remarkable powers if they are ready to learn new concepts and interfaces. Our three-year effort to develop versions of the Hierarchical Clustering Explorer (HCE) began with building an interactive tool for exploring clustering results. It expanded, based on user needs, to include other potent analytic and visualization tools for multivariate data, especially the rank-by-feature framework. Our own successes using HCE provided some testimonial evidence of its utility, but we felt it necessary to get beyond our subjective impressions. This paper presents an evaluation of the Hierarchical Clustering Explorer (HCE) using three case studies and an e-mail user survey (n = 57) to focus on skill acquisition with the novel concepts and interface for the rank-by-feature framework. Knowledgeable and motivated users in diverse fields provided multiple perspectives that refined our understanding of strengths and weaknesses. A user survey confirmed the benefits of HCE, but gave less guidance about improvements. Both evaluations suggested improved training methods.

[1] Affymetrix, NetAffix, https://www.affymetrix.com/analysis/netaffx index.affx, 2006.
[2] P.L. Bollyky and S.B. Wilson, “CD1d-Restricted T-Cell Subsets and Dendritic Cell Function in Autoimmunity,” Immunology and Cell Biology, vol. 82, pp. 307-314, 2004.
[3] W.H. Boylston, A. Gerstner, J.H. DeFord, M. Madsen, K. Flurkey, D.E. Harrison, and J. Papaconstantinou, “Altered Cholesterologenic and Lipogenic Transcriptional Profile in Livers of Aging Snell Dwarf (Pit1dw/dwJ) Mice,” Aging Cell, vol. 3, pp. 283-296, 2004.
[4] C. Chen and M.P. Czerwinski, “Empirical Evaluation of Information Visualizations: An Introduction,” Int'l J. Human-Computer Studies, vol. 53, pp. 631-635, 2000.
[5] C. Chen and Y. Yu, “Empirical Studies of Information Visualization: A Meta-Analysis,” Int'l J. Human-Computer Studies, vol. 53, pp. 851-866, 2000.
[6] S. Cluzet, C. Torregrosa, C. Jacquet, C. Lafitte, J. Fournier, L. Mercier, S. Salamagne, X. Briand, M.T. Esquerre-Tugaye, and B. Dumas, “Gene Expression Profiling and Protection of Medicago Truncatula against a Fungal Infection in Response to an Elicitor from Green Algae Ulva SPP,” Plant, Cell, and Environment, vol. 27, pp. 917-928, 2004.
[7] G.K. Geiss, M. Salvatore, T.M. Tumpey, V.S. Carter, X. Wang, C.F. Basler, J.K. Taubenberger, R.E. Bumgarner, P. Palese, M.G. Katze, and A. Garcia-Sastre, “Cellular Transcriptional Profiling in Influenza A Virus-Infected Lung Epithelial Cells: The Role of the Nonstructural NS1 Protein in the Evasion of the Host Innate Defense and Its Potential Contribution to Pandemic Influenza,” Proc. Nat'l Academy of Sciences of the USA, vol. 99, pp. 10736-10741, 2002.
[8] V. Gonzales and A. Kobsa, “A Workplace Study of the Adoption of Information Visualization Systems,” Proc. I-KNOW '03: Third Int'l Conf. Knowledge Management, pp. 92-102, 2003.
[9] A.M. Graziano and M.L. Raulin, Research Methods: A Process of Inquiry, fifth ed. Allyn & Bacon, 2004.
[10] C. Li and W.H. Wong, “Model-Based Analysis of Oligonucleotide Arrays: Expression Index Computation and Outlier Detection,” Proc. Nat'l Academy of Sciences of the USA, vol. 98, pp. 31-36, 2001.
[11] H. Lieberman, “The Tyranny of Evaluation,” http://Web.media. mit.edu/~lieber/MiscTyranny-Evaluation.html , 2006.
[12] E. Paux, V. Carocha, C. Marques, A. Mendes de Sousa, N. Borralho, P. Sivadon, and J. Grima-Pettenati, “Transcript Profiling of Eucalyptus Xylem Genes during Tension Wood Formation,” New Phytologist, vol. 167, pp. 89-100, 2005.
[13] C. Plaisant, “The Challenge of Information Visualization Evaluation,” Proc. Working Conf. Advanced Visual Interfaces, pp. 109-116, 2004.
[14] Research Systems Inc., Interactive Data Language, http://www.rsinc.comidl/, 2006.
[15] P. Saraiya, C. North, and K. Duca, “An Insight-Based Methodology for Evaluating Bioinformatics Visualizations,” IEEE Trans. Visualization and Computer Graphics, vol. 11, pp. 443-456, 2005.
[16] J. Scheidtmann, A. Frantzen, G. Frenzer, and W.F. Maier, “A Combinatorial Technique for the Search of Solid State Gas Sensor Materials,” Measurement Science and Technology, vol. 16, pp. 119-127, 2005.
[17] J. Seo, M. Bakay, Y.-W. Chen, S. Hilmer, B. Shneiderman, and E.P. Hoffman, “Interactively Optimizing Signal-to-Noise Ratios in Expression Profiling: Project-Specific Algorithm Selection and Detection P-Value Weighting in Affymetrix Microarrays,” Bioinformatics, vol. 20, pp. 2534-2544, 2004.
[18] J. Seo, M. Bakay, Z. Po, Y.-W. Chen, P. Clarkson, B. Shneiderman, and E.P. Hoffman, “Interactive Color Mosaic and Dendrogram Displays for Signal/Noise Optimization in Microarray Data Analysis,” Proc. IEEE Int'l Conf. Multimedia and Expo, pp. 461-464, 2003.
[19] J. Seo and B. Shneiderman, “Interactively Exploring Hierarchical Clustering Results,” Computer, vol. 35, pp. 80-86, 2002.
[20] J. Seo and B. Shneiderman, “A Knowledge Integration Framework for Information Visualization,” Lecture Notes in Computer Science, vol. 3379, pp. 207-220, 2005.
[21] J. Seo and B. Shneiderman, “A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data,” Information Visualization, vol. 4, pp. 99-113, 2005.
[22] Silicon Genetics, GeneSpring, http://www.silicongenetics.com/cgi/SiG.cgi/ Products/GeneSpringindex.smf, 2006.
[23] Systat Software Inc., SigmaPlot, http://www.systat.com/productsSigmaPlot/, 2006.
[24] The MathWorks, MATLAB, http://www.mathworks.com/productsmatlab/, 2006.
[25] P.D. Thompson, N. Moyna, R. Seip, T. Price, P. Clarkson, T. Angelopoulos, P. Gordon, L. Pescatello, P. Visich, R. Zoeller, J.M. Devaney, H. Gordish, S. Bilbie, and E.P. Hoffman, “Functional Polymorphisms Associated with Human Muscle Size and Strength,” Medicine & Science in Sports & Exercise, vol. 36, pp. 1132-1139, 2004.
[26] J.-M. Tsai, H.-C. Wang, J.-H. Leu, H.-H. Hsiao, A.H. J. Wang, G.-H. Kou, and C.-F. Lo, “Genomic and Proteomic Analysis of Thirty-Nine Structural Proteins of Shrimp White Spot Syndrome Virus,” J. Virology, vol. 78, pp. 11360-11370, 2004.
[27] P. Zhao, J. Seo, Z. Wang, Y. Wang, B. Shneiderman, and E.P. Hoffman, “In Vivo Filtering of In Vitro Expression Data Reveals MyoD Targets,” Comptes Rendus Biologies, vol. 326, pp. 1049-1065, 2003.

Index Terms:
Information visualization evaluation, case study, user survey, rank-by-feature framework, hierarchical clustering explorer.
Citation:
Jinwook Seo, Ben Shneiderman, "Knowledge Discovery in High-Dimensional Data: Case Studies and a User Survey for the Rank-by-Feature Framework," IEEE Transactions on Visualization and Computer Graphics, vol. 12, no. 3, pp. 311-322, May-June 2006, doi:10.1109/TVCG.2006.50
Usage of this product signifies your acceptance of the Terms of Use.