| | This Article | |
| |
| |
| | Share | |
| |
| |
| | Bibliographic References | |
| |
| |
| | Add to: | |
| |
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
| |
| | Search | |
| |
| |
| | |
An Interactive Approach to Mining Gene Expression Data
October 2005 (vol. 17 no. 10)
pp. 1363-1378
Effective identification of coexpressed genes and coherent patterns in gene expression data is an important task in bioinformatics research and biomedical applications. Several clustering methods have recently been proposed to identify coexpressed genes that share similar coherent patterns. However, there is no objective standard for groups of coexpressed genes. The interpretation of co-expression heavily depends on domain knowledge. Furthermore, groups of coexpressed genes in gene expression data are often highly connected through a large number of "intermediate” genes. There may be no clear boundaries to separate clusters. Clustering gene expression data also faces the challenges of satisfying biological domain requirements and addressing the high connectivity of the data sets. In this paper, we propose an interactive framework for exploring coherent patterns in gene expression data. A novel coherent pattern index is proposed to give users highly confident indications of the existence of coherent patterns. To derive a coherent pattern index and facilitate clustering, we devise an attraction tree structure that summarizes the coherence information among genes in the data set. We present efficient and scalable algorithms for constructing attraction trees and coherent pattern indices from gene expression data sets. Our experimental results show that our approach is effective in mining gene expression data and is scalable for mining large data sets.
[1] 1363 U. Alon , N. Barkai , D.A. Notterman , K. Gish , S. Ybarra , D. Mack , and A.J. Levine , “Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Array,” Proc. Nat'l Academy of Sciences USA, vol. 96, no. 12, pp. 6745-6750, June 1999. [2] M. Ankerst , M.M. Breunig , H.P. Kriegel , and J. Sander , “OPTICS: Ordering Points to Identify the Clustering Structure,” Proc. SIGMOD, pp. 49-60, 1999.[3] Z. Bar-Joseph , E.D. Demaine , D.K. Gifford , N. Srebro , A.M. Hamel , and T.S. Jaakkola , “K-ary Clustering with Optimal Leaf Ordering for Gene Expression Data,” Bioinformatics, vol. 19, no. 9, pp. 1070-1078, 2003. [4] A. Ben-Dor , R. Shamir , and Z. Yakhini , “Clustering Gene Expression Patterns,” J. Computational Biology, vol. 6, nos. 3-4, pp. 281-297, 1999. [5] M. Blatt , S. Wiseman , and E. Domany , “Super-Paramagnetic Clustering of Data,” Physical Rev. Letters, vol. 76, 1996.[6] Y. Cheng and G.M. Church , “Biclustering of Expression Data,” Proc. Eighth Int'l Conf. Intelligent Systems for Molecular Biology (ISMB), vol. 8, pp. 93-103, 2000.[7] R.J. Cho , M.J. Campbell , E.A. Winzeler , L. Steinmetz , A. Conway , L. Wodicka , T.G. Wolfsberg , A.E. Gabrielian , D. Landsman , D.J. Lockhart , and R.W. Davis , “A Genome-Wide Transcriptional Analysis of the Mitotic Cell Cycle,” Molecular Cell, vol. 2, no. 1, pp. 65-73, July 1998. [8] M.B. Eisen , P.T. Spellman , P.O. Brown , and D. Botstein , “Cluster Analysis and Display of Genome-Wide Expression Patterns,” Proc. Nat'l Academy of Sciences USA, vol. 95, no. 25, pp. 14863-14868, Dec. 1998. [9] M. Ester , H. Kriegel , J. Sander , and X. Xu , “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” Proc. Second Int'l Conf. Knowledge Discovery and Data Mining, pp. 226-231, 1996.[10] C. Fraley and A.E. Raftery , “How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis,” The Computer J., vol. 41, no. 8, pp. 578-588, 1998. [11] A.C. Gavin et al., “Functional Organization of the Yeast Proteome by Systematic Analysis of Protein Complexes,” Nature, vol. 415, no. 6868, pp. 123-124, Jan. 2002. [12] D. Ghosh and A.M. Chinnaiyan , “Mixture Modelling of Gene Expression Data from Microarray Experiments,” Bioinformatics, vol. 18, pp. 275-286, 2002. [13] E. Hartuv and R. Shamir , “A Clustering Algorithm Based on Graph Connectivity,” Information Processing Letters, vol. 76, nos. 4-6, pp. 175-181, 2000. [14] J. Herrero , A. Valencia , and J. Dopazo , “A Hierarchical Unsupervised Growing Neural Network for Clustering Gene Expression Patterns,” Bioinformatics, vol. 17, pp. 126-136, 2001. [15] L.J. Heyer , S. Kruglyak , and S. Yooseph , “Exploring Expression Data: Identification and Analysis of Coexpressed Genes,” Genome Research, vol. 9, no. 11, pp. 1106-1115, 1999. [16] A. Hinneburg and D.A. Keim , “An Efficient Approach to Clustering in Large Multimedia Database with Noise,” Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining, 1998.[17] V.R. Iyer et al., “The Transcriptional Program in the Response of Human Fibroblasts to Serum,” Science, vol. 283, pp. 83-87, 1999. [18] D. Jiang , J. Pei , and A. Zhang , “DHC: A Density-Based Hierarchical Clustering Method for Time Series Gene Expression Data,” Proc. Third IEEE Symp. Bio-Informatics and Bio-Engineering (BIBE '03), 2003.[19] D. Jiang , J. Pei , and A. Zhang , “Interactive Exploration of Coherent Patterns in Time-Series Gene Expression Data,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), 2003.[20] T. Kohonen , Self-Organization and Associative Memory. Berlin: Spring-Verlag, 1984.[21] J. Liu and W. Wang , “OP-Cluster: Clustering by Tendency in High Dimensional Space,” Proc. Third IEEE Int'l Conf. Data Mining (ICDM '03), 2003.[22] J.B. MacQueen , “Some Methods for Classification and Analysis of Multivariate Observations,” Proc. Fifth Berkeley Symp. Math. Statistics and Probability, pp. 281-297, Univ. of California, Berkeley, Univ. of California Press, Berkeley, 1967.[23] G.J. McLachlan , R.W. Bean , and D. Peel , “A Mixture Model-Based Approach to the Clustering of Microarray Expression Data,” Bioinformatics, vol. 18, pp. 413-422, 2002. [24] J. Pei , X. Zhang , M. Cho , H. Wang , and P.S. Yu , “MaPle: A Fast Algorithm for Maximal Pattern-Based Clustering,” Proc. Third IEEE Int'l Conf. Data Mining (ICDM '03), 2003.[25] P.A. Ralf-Herwig , C. Muller , C. Bull , H. Lehrach , and J. O'Brien , “Large-Scale Clustering of cDNA-Fingerprinting Data,” Genome Research, vol. 9, pp. 1093-1105, 1999. [26] M.F. Ramoni , P. Sebastiani , and I.S. Kohane , “Cluster Analysis of Gene Expression Dynamics,” Proc. Nat'l Academy of Science, vol. 99, no. 14, pp. 9121-9126, July 2002.[27] R. Šášik , T. Hwa , N. Iranfar , and W.F. Loomis , “Percolation Clustering: A Novel Algorithm Applied to the Clustering of Gene Expression Patterns in Dictyostelium Development,” Proc. Pacific Symp. Biocomputing, pp. 335-347, 2001.[28] E. Segal , H. Wang , and D. Koller , “Discovering Molecular Pathways from Protein Interaction and Gene Expression Data,” Bioinformatics, vol. 19, pp. i264-i272, 2003. [29] E. Segal , R. Yelensky , and D. Koller , “Genome-Wide Discovery of Transcriptional Modules from DNA Sequence and Gene Expression,” Bioinformatics, vol. 19, pp. i273-i282, 2003. [30] J. Seo and B. Shneiderman , “Interactively Exploring Hierarchical Clustering Results,” Computer, vol. 35, no. 7, pp. 80-86, July 2002.[31] R. Shamir and R. Sharan , “Click: A Clustering Algorithm for Gene Expression Analysis,” Proc. Eighth Int'l Conf. Intelligent Systems for Molecular Biology (ISMB '00), 2000.[32] F.D. Smet , J. Mathys , K. Marchal , G. Thijs , B.D. Moor , and Y. Moreau , “Adaptive Quality-Based Clustering of Gene Expression Profiles,” Bioinformatics, vol. 18, pp. 735-746, 2002. [33] P.T. Spellman , G. Sherlock , M.Q. Zhang , V.R. Iyer , K. Anders , M.B. Eisen , P.O. Brown , D. Bostein , and B. Futcher , “Comprehensive Identification of Cell Cycle-Regulated Genes of the Yeast Saccharomyces Cerevisiae by Microarray Hybridization,” Molecular Biology of the Cell, vol. 9, pp. 3272-3297, 1998.[34] P. Tamayo , D. Solni , J. Mesirov , Q. Zhu , S. Kitareewan , E. Dmitrovsky , E.S. Lander , and T.R. Golub , “Interpreting Patterns of Gene Expression with Self-Organizing Maps: Methods and Application to Hematopoietic Differentiation,” Proc. Nat'l Academy of Sciences USA, vol. 96, no. 6, pp. 2907-2912, Mar. 1999. [35] S. Tavazoie , D. Hughes , M.J. Campbell , R.J. Cho , and G.M. Church , “Systematic Determination of Genetic Network Architecture,” Nature Genetics, pp. 281-285, 1999.[36] S. Tomida , T. Hanai , H. Honda , and T. Kobayashi , “Analysis of Expression Profile Using Fuzzy Adaptive Resonance Theory,” Bioinformatics, vol. 18, pp. 1073-1083, 2002. [37] P. Uetz et al., “A Comprehensive Analysis of Protein-Protein Interactions in Saccharomyces Cerevisiae,” Nature, vol. 403, no. 6770, pp. 601-603, Feb. 2000.[38] H. Wang , W. Wang , J. Yang , and P.S. Yu , “Clustering by Pattern Similarity in Large Data Sets,” SIGMOD 2002, Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 394-405, 2002. [39] Y. Xu , V. Olman , and D. Xu , “Clustering Gene Expression Data Using a Graph-Theoretic Approach: An Application of Minimum Spanning Trees,” Bioinformatics, vol. 18, pp. 536-545, 2002. [40] J. Yang , W. Wang , H. Wang , and P.S. Yu , “ $\delta{\hbox{-}}{\rm{Cluster}}$ : Capturing Subspace Correlation in a Large Data Set,” Proc. 18th Int'l Conf. Data Eng. (ICDE 2002), pp. 517-528, 2002. [41] K.Y. Yeung , C. Fraley , A. Murua , A.E. Raftery , and W.L. Ruzzo , “Model-Based Clustering and Data Transformations for Gene Expression Data,” Bioinformatics, vol. 17, pp. 977-987, 2001. [42] D. Jiang , J. Pei , and A. Zhang , “Mining Coherent Gene Clusters from Gene-Sample-Time Microarray Data,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '04), 2004.
Index Terms:
Index Terms- Bioinformatics, gene expression (microarray) data, clustering, interactive data mining.
Citation:
Daxin Jiang, Jian Pei, Aidong Zhang, "An Interactive Approach to Mining Gene Expression Data," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 10, pp. 1363-1378, Oct. 2005, doi:10.1109/TKDE.2005.159