This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Novel Algorithm for Coexpression Detection in Time-Varying Microarray Data Sets
January-March 2008 (vol. 5 no. 1)
pp. 120-135
When analyzing the results of microarray experiments, biologists generally use unsupervised categorization tools. However, such tools regard each time point as an independent dimension and utilize the Euclidean distance to compute the similarities between expressions. Furthermore, some of these methods require the number of clusters to be determined in advance, which is clearly impossible in the case of a new dataset. Therefore, this study proposes a novel scheme, designated as the Variation-based Co-expression Detection (VCD) algorithm, to analyze the trends of expressions based on their variation over time. The proposed algorithm has two advantages. First, it is unnecessary to determine the number of clusters in advance since the algorithm automatically detects those genes whose profiles are grouped together and creates patterns for these groups. Second, the algorithm features a new measurement criterion for calculating the degree of change of the expressions between adjacent time points and evaluating their trend similarities. Three real-world microarray datasets are employed to evaluate the performance of the proposed algorithm.

[1] J. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, 1981.
[2] R.N. Dave and R. Krishnapuram, “Robust Clustering Methods: A Unified View,” IEEE Trans. Fuzzy Systems, vol. 5, no. 2, pp. 270-293, 1997.
[3] K.E. Rose, Gurewitz, and G.C. Foz, “Constrained Clustering as an Optimization Method,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, no. 8, pp. 785-794, Aug. 1993.
[4] D. Jiang, C. Tang, and A. Zhang, “Cluster Analysis for Gene Expression Data: A Survey,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 11, pp. 1370-1386, Nov. 2004.
[5] J. Macqueen, “Some Methods for Classification and Analysis of Multivariate Observations,” Proc. Fifth Berkeley Symp. Math. Statistics and Probability, vol. 1, pp. 281-297, 1967.
[6] I.S. Dhillon, Y. Guan, and J. Kogan, “Iterative Clustering of High Dimensional Text Data Augmented by Local Search,” Proc. Second IEEE Int'l Conf. Data Mining, p. 131, 2002.
[7] P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E.S. Lander, and T. Golub, “Interpreting Patterns of Gene Expression with Self Organizing Maps,” Proc. Nat'l Academy of Sciences USA, vol. 96, no. 6, pp. 2907-2912, 1999.
[8] M.B. Eisen, P.T. Spellman, P.O. Brown, and D. Botstein, “Cluster Analysis and Display of Genome-Wide Expression Patterns,” Proc. Nat'l Academy of Sciences USA, vol. 95, no. 25, pp. 14863-14868, 1998.
[9] D. Jiang, J. Pei, and A. Zhang, “DHC: A Density-Based Hierarchical Clustering Method for Time Series Gene Expression Data,” Proc. Third IEEE Symp. Bioinformatics and Bioeng., p. 393, 2003.
[10] R. Shamir and R. Sharan, “Click: A Clustering Algorithm for Gene Expression Analysis,” Proc. Eighth Int'l Conf. Intelligent Systems for Molecular Biology, pp. 6-7, 2000.
[11] A. Ben-Dor, R. Shamir, and Z. Yakhini, “Clustering Gene Expression Patterns,” J. Computational Biology, vol. 6, nos. 3/4, pp. 281-297, 1999.
[12] K.Y. Yeung, C. Fraley, A. Murua, A.E. Raftery, and W.L. Ruzzo, “Model-Based Clustering and Data Transformations for Gene Expression Data,” Bioinformatics, vol. 17, no. 10, pp. 977-987, 2001.
[13] A.V. Lukashin and R. Fuchs, “Analysis of Temporal Gene Expression Profiles: Clustering by Simulated Annealing and Determining the Optimal Number of Clusters,” Bioinformatics, vol. 17, no. 5, pp. 405-414, 2001.
[14] D. Ghosh and A.M. Chinnaiyan, “Mixture Modeling of Gene Expression Data from Microarray Experiments,” Bioinformatics, vol. 18, no. 2, pp. 275-286, 2002.
[15] G.J. McLachlan, R.W. Bean, and D. Peel, “A Mixture Model-Based Approach to the Clustering of Microarray Expression Data,” Bioinformatics, vol. 18, no. 3, pp. 413-422, 2002.
[16] D. Jiang, J. Pei, and A. Zhang, “An Interactive Approach to Mining Gene Expression Data,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 10, Oct. 2005.
[17] V.S. Tseng and C.-P. Kao, “Efficiently Mining Gene Expression Data via a Novel Parameterless Clustering Method,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 2, no. 4, pp.355-365, Oct.-Dec. 2005.
[18] A. Schliep, I.G. Costa, C. Steinhoff, and A. Schonhuth, “Analyzing Gene Expression Time-Courses,” IEEE/ACM Trans. Computational and Bioinformatics, vol. 2, no. 3, pp. 179-193, July-Sept. 2005.
[19] I.S. Dhillon, E. Marcotte, and R. Usman, “Diametrical Clustering for Identifying Anti-Correlated Gene Clusters,” Bioinformatics, vol. 19, no. 13, pp. 1612-1619, 2003.
[20] A.P. Gasch and M.B. Eisen, “Exploring the Conditional Co-Regulation of Yeast Gene Expression through Fuzzy k-Means Clustering,” Genome Biology, vol. 3, no. 11, pp. 1-22, 2002.
[21] L. Gonzalez and A. Marin, “Level-Based Extension of Relations,” Fuzzy Sets and Systems, vol. 84, pp. 95-96, 1996.
[22] G. Fu, “Optimization Methods for Fuzzy Clustering,” Fuzzy Sets and Systems, vol. 98, pp. 301-309, 1998.
[23] J.-H. Chiang, S. Yue, and Z.-X. Yin, “A New Fuzzy Cover Approach to Clustering,” IEEE Trans. Fuzzy Systems, vol. 12, no. 2, pp. 199-208, 2004.
[24] L.A. Zadeh, “Similarity Relations and Fuzzy Orderings,” Information Sciences, vol. 3, pp. 177-200, 1971.
[25] S. Ovchinnikov, “Similarity Relations, Fuzzy Partitions and Fuzzy Orderings,” Fuzzy Sets and Systems, vol. 40, pp. 107-126, 1991.
[26] F. Toshihiro, “Approximation Algorithms for Submodular Set Cover with Applications,” IEICE Trans. Information and System, vol. E83-D, no. 3, Mar. 2000.
[27] U. Manber, Introduction to Algorithms—A Creative Approach. Addison-Wesley, 1989.
[28] J.W. Shin, T.Y. Chang, C.H. Yu, P.Y. Hsu, S.C.W. Wang, C.L. Chin, B.W. Chen, and J.H. Chiang, “Blastocystis Hominis Modulates Oncogenesis in Human Intestinal Epithelial Cells: Protein-Protein Interaction and Pathway Module Prediction,” Proc. 12th Int'l Congress Protozoology, 2005.
[29] X. Peng et al., “Identification of Cell Cycle-Regulated Genes in Fission Yeast,” Molecular Biology of the Cell, vol. 16, pp. 1026-1042, 2005.
[30] G. Zhu, P.T. Spellman, T. Volpe, P.O. Brown, D. Botstein, T.N. Davis, and B. Futcher, “Two Yeast Forkhead Genes Regulated the Cell Cycle and Pseudohyphal Growth,” Nature, vol. 406, no. 6, pp.90-94, 2000.
[31] P.T. Spellman, G. Sherlock, M.Q. Zhang, V.R. Iyer, K. Anders, M.B. Eisen, P.O. Brown, D. Botstein, and B. Futcher, “Comprehensive Identification of Cell Cycle-Regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization,” Molecular Biology of the Cell, vol. 9, pp. 3273-3297, 1998.
[32] H.L. Turner, T.C. Bailey, W.J. Krzanowski, and C.A. Hemingway, “Biclustering Models for Structured Microarray Data,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 2, no. 4, pp.316-329, Oct.-Dec. 2005.
[33] H.K. Yoon, Yang, and C. Shahabi, “Feature Subset Selection and Feature Ranking for Multivariate Time Series,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 9, pp. 1186-1198, Sept. 2005.
[34] W.-H. Au, K.C.C. Chan, A.K.C. Wong, and Y. Wang, “Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 2, no. 2, pp. 83-101, Apr.-June 2005.
[35] H. Cho, I.S. Dhillon, Y. Guan, and S. Sra, “Minimum Sum-Squared Residue Co-Clustering of Gene Expression Data,” Proc. Fourth SIAM Int'l Conf. Data Mining, pp. 114-125, 2004.
[36] G-Means Algorithm, http://www.cs.utexas.edu/users/dml/Software gmeans.html, 2007.
[37] SGD Gene Ontology Term Finder, http://db.yeastgenome.org/cgi-bin/GOgoTermFinder , 2007.

Index Terms:
Pattern analysis, Time series analysis, Bioinformatics, Data mining, Clustering, Gene expression
Citation:
Zong-Xian Yin, Jung-Hsien Chiang, "Novel Algorithm for Coexpression Detection in Time-Varying Microarray Data Sets," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 5, no. 1, pp. 120-135, Jan.-March 2008, doi:10.1109/tcbb.2007.1052
Usage of this product signifies your acceptance of the Terms of Use.