The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - March-April (2013 vol.10)
pp: 286-299
Pradipta Maji , Machine Intell. Unit, Indian Stat. Inst., Kolkata, India
Sushmita Paul , Machine Intell. Unit, Indian Stat. Inst., Kolkata, India
ABSTRACT
Gene expression data clustering is one of the important tasks of functional genomics as it provides a powerful tool for studying functional relationships of genes in a biological process. Identifying coexpressed groups of genes represents the basic challenge in gene clustering problem. In this regard, a gene clustering algorithm, termed as robust rough-fuzzy c-means, is proposed judiciously integrating the merits of rough sets and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in cluster definition, the integration of probabilistic and possibilistic memberships of fuzzy sets enables efficient handling of overlapping partitions in noisy environment. The concept of possibilistic lower bound and probabilistic boundary of a cluster, introduced in robust rough-fuzzy c-means, enables efficient selection of gene clusters. An efficient method is proposed to select initial prototypes of different gene clusters, which enables the proposed c-means algorithm to converge to an optimum or near optimum solutions and helps to discover coexpressed gene clusters. The effectiveness of the algorithm, along with a comparison with other algorithms, is demonstrated both qualitatively and quantitatively on 14 yeast microarray data sets.
INDEX TERMS
Clustering algorithms, Approximation methods, Gene expression, Probabilistic logic, Prototypes, Robustness, Indexes,rough sets, Clustering algorithms, Approximation methods, Gene expression, Probabilistic logic, Prototypes, Robustness, Indexes, fuzzy sets, Microarray, gene clustering, overlapping clustering
CITATION
Pradipta Maji, Sushmita Paul, "Rough-Fuzzy Clustering for Grouping Functionally Similar Genes from Microarray Data", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no. 2, pp. 286-299, March-April 2013, doi:10.1109/TCBB.2012.103
REFERENCES
[1] H. Causton, J. Quackenbush, and A. Brazma, Microarray Gene Expression Data Analysis: A Beginner's Guide. Wiley-Blackwell, 2003.
[2] E. Domany, "Cluster Analysis of Gene Expression Data," J. Statistical Physics, vol. 110, nos. 3-6, pp. 1117-1139, 2003.
[3] P. Maji and S.K. Pal, Rough-Fuzzy Pattern Recognition: Applications in Bioinformatics and Medical Imaging. John Wiley & Sons, Inc., 2012.
[4] D. Jiang, C. Tang, and A. Zhang, "Cluster Analysis for Gene Expression Data: A Survey," IEEE Trans. Knowledge and Data Eng., vol. 16, no. 11, pp. 1370-1386, Nov. 2004.
[5] M.B. Eisen, P.T. Spellman, O. Patrick, and D. Botstein, "Cluster Analysis and Display of Genome-Wide Expression Patterns," Proc. Nat'l Academy of Sciences USA, vol. 95, no. 25, pp. 14-863-14-868, 1998.
[6] S. Tavazoie, D. Hughes, M.J. Campbell, R.J. Cho, and G.M. Church, "Systematic Determination of Genetic Network Architecture," Nature Genetics, vol. 22, no. 3, pp. 281-285, 1999.
[7] A. Brazma and J. Vilo, "Minireview: Gene Expression Data Analysis," Federation of European Biochemical Societies Letters, vol. 480, no. 1, pp. 17-24, 2000.
[8] P. D'haeseleer, X. Wen, S. Fuhrman, and R. Somogyi, "Mining the Gene Expression Matrix: Inferring Gene Relationships from Large Scale Gene Expression Data," Proc. Second Int'l Workshop Information Processing in Cells and Tissues, pp. 203-212, 1998.
[9] J. Herrero, A. Valencia, and J. Dopazo, "A Hierarchical Unsupervised Growing Neural Network for Clustering Gene Expression Patterns," Bioinformatics, vol. 17, no. 2, pp. 126-136, 2001.
[10] L.J. Heyer, S. Kruglyak, and S. Yooseph, "Exploring Expression Data: Identification and Analysis of Coexpressed Genes," Genome Research, vol. 9, no. 11, pp. 1106-1115, 1999.
[11] P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E.S. Lander, and T.R. Golub, "Interpreting Patterns of Gene Expression with Self-Organizing Maps: Methods and Application to Hematopoietic Differentiation," Proc. Nat'l Academy of Sciences USA, vol. 96, no. 6, pp. 2907-2912, 1999.
[12] A. Ben-Dor, R. Shamir, and Z. Yakhini, "Clustering Gene Expression Patterns," J. Computational Biology, vol. 6, nos. 3/4, pp. 281-297, 1999.
[13] E. Hartuv and R. Shamir, "A Clustering Algorithm Based on Graph Connectivity," Information Processing Letters, vol. 76, nos. 4-6, pp. 175-181, 2000.
[14] R. Shamir and R. Sharan, "CLICK: A Clustering Algorithm for Gene Expression Analysis," Proc. Eighth Int'l Conf. Intelligent Systems for Molecular Biology, 2000.
[15] E.P. Xing and R.M. Karp, "CLIFF: Clustering of High-Dimensional Microarray Data via Iterative Feature Filtering Using Normalized Cuts," Bioinformatics, vol. 17, no. 1, pp. 306-315, 2001.
[16] C. Fraley and A.E. Raftery, "How Many Clusters? Which Clustering Method? Answers via Model-Based Cluster Analysis," The Computer J., vol. 41, no. 8, pp. 578-588, 1998.
[17] D. Ghosh and A.M. Chinnaiyan, "Mixture Modelling of Gene Expression Data from Microarray Experiments," Bioinformatics, vol. 18, no. 2, pp. 275-286, 2002.
[18] G.J. McLachlan, R.W. Bean, and D. Peel, "A Mixture Model-Based Approach to the Clustering of Microarray Expression Data," Bioinformatics, vol. 18, no. 3, pp. 413-422, 2002.
[19] K.Y. Yeung, C. Fraley, A. Murua, A.E. Raftery, and W.L. Ruzz, "Model-Based Clustering and Data Transformations for Gene Expression Data," Bioinformatics, vol. 17, no. 10, pp. 977-987, 2001.
[20] D. Jiang, J. Pei, and A. Zhang, "DHC: A Density-Based Hierarchical Clustering Method for Time-Series Gene Expression Data," Proc. IEEE Third Int'l Symp. Bioinformatics and BioEng., pp. 393-400, 2003.
[21] L. Klebanov and A. Yakovlev, "How High is the Level of Technical Noise in Microarray Data?" Biology Direct, vol. 2, no. 9, 2007, doi: 10.1186/1745-6150-2-9.
[22] L.A. Zadeh, "Fuzzy Sets," Information and Control, vol. 8, pp. 338-353, 1965.
[23] Z. Pawlak, Rough Sets: Theoretical Aspects of Resoning About Data. Kluwer, 1991.
[24] J.C. Dunn, "A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact, Well-Separated Clusters," J. Cybernetics, vol. 3, no. 3, pp. 32-57, 1974.
[25] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithm. Plenum, 1981.
[26] P.J. Woolf and Y. Wang, "A Fuzzy Logic Approach to Analyzing Gene Expression Data," Physiological Genomics, vol. 3, pp. 9-15, 2000.
[27] D. Dembele and P. Kastner, "Fuzzy C-Means Method for Clustering Microarray Data," Bioinformatics, vol. 19, no. 8, pp. 973-980, 2003.
[28] E.R. Dougherty, J. Barrera, M. Brun, S. Kim, R.M. Cesar, Y. Chen, M. Bittner, and J.M. Trent, "Inference from Clustering with Application to Gene-Expression Microarrays," J. Computational Biology, vol. 9, no. 1, pp. 105-126, 2002.
[29] A.P. Gasch and M.B. Eisen, "Exploring the Conditional Coregulation of Yeast Gene Expression through Fuzzy K-Means Clustering," Genome Biology, vol. 3, no. 11, pp. 1-22, 2002.
[30] N. Belacel, M. Cuperlovic-Culf, M. Laflamme, and R. Ouellette, "Fuzzy J-Means and VNS Methods for Clustering Genes from Microarray Data," Bioinformatics, vol. 20, no. 11, pp. 1690-1701, 2004.
[31] R. Krishnapuram and J.M. Keller, "A Possibilistic Approach to Clustering," IEEE Trans. Fuzzy Systems, vol. 1, no. 2, pp. 98-110, May 1993.
[32] P. Maji and S.K. Pal, "RFCM: A Hybrid Clustering Algorithm Using Rough and Fuzzy Sets," Fundamenta Informaticae, vol. 80, no. 4, pp. 475-496, 2007.
[33] P. Maji and S.K. Pal, "Rough Set Based Generalized Fuzzy C-Means Algorithm and Quantitative Indices," IEEE Trans. System, Man, and Cybernetics, Part B: Cybernetics, vol. 37, no. 6, pp. 1529-1540, Dec. 2007.
[34] P. Maji, "Fuzzy-Rough Supervised Attribute Clustering Algorithm and Classification of Microarray Data," IEEE Trans. System, Man, and Cybernetics, Part B: Cybernetics, vol. 41, no. 1, pp. 222-233, Feb. 2011.
[35] J.P. Rousseeuw, "Silhouettes: A Graphical Aid to the Interpration and Validation of Cluster Analysis," J. Computational and Applied Math., vol. 20, no. 1, pp. 53-65, 1987.
[36] J.C. Bezdek and N.R. Pal, "Some New Indexes for Cluster Validity," IEEE Trans. System, Man, and Cybernetics, Part B: Cybernetics, vol. 28, no. 3, pp. 301-315, June 1988.
[37] S.K. Pal, A. Ghosh, and B.U. Shankar, "Segmentation of Remotely Sensed Images with Fuzzy Thresholding and Quantitative Evaluation," Int'l J. Remote Sensing, vol. 21, no. 11, pp. 2269-2300, 2000.
[38] E.I. Boyle, S. Weng, J. Gollub, H. Jin, D. Botstein, J.M. Cherry, and G. Sherlock, "GO::Term Finder Open Source Software for Accessing Gene Ontology Information and Finding Significantly Enriched Gene Ontology Terms Associated with a List of Genes," Bioinformatics, vol. 20, no. 18, pp. 3710-3715, 2004.
[39] J.L. Sevilla, V. Segura, A. Podhorski, E. Guruceaga, J.M. Mato, L.A. Martinez-Cruz, F.J. Corrales, and A. Rubio, "Correlation between Gene Expression and GO Semantic Similarity," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 2, no. 4, pp. 330-338, Oct.-Dec. 2005.
[40] H. Wang, Z. Wang, X. Li, B. Gong, L. Feng, and Y. Zhou, "A Robust Approach Based on Weibull Distribution for Clustering Gene Expression Data," Algorithms for Molecular Biology, vol. 6, no. 1, p. 14, 2011.
[41] L.C. Lai, A.L. Kosorukoff, P.V. Burke, and K.E. Kwast, "Dynamical Remodeling of the Transcriptome during Short-Term Anaerobiosis in Saccharomyces cerevisiae: Differential Response and Role of Msn2 and/or Msn4 and Other Factors in Galactose and Glucose Media," Molecular and Cellular Biology, vol. 25, no. 10, pp. 4075-4091, 2005.
[42] G.M. Walker, Yeast Physiology and Biotechnology. John Wiley & Sons, Inc., 1998.
[43] D.J. Timson, "Galactose Metabolism in Saccharomyces cerevisiae," Dynamic Biochemistry, Process Biotechnology and Molecular Biology, vol. 1, no. 1, pp. 63-73, 2007.
[44] B. Alberts, A. Johnson, J. Lewis, M. Raff, K. Roberts, and P. Walter, Molecular Biology of the Cell. Garland Science, 2007.
[45] H. Pelletier and J. Kraut, "Crystal Structure of a Complex between Electron Transfer Partners, Cytochrome c Peroxidase and Cytochrome C," Science, vol. 258, no. 5089, pp. 1748-1755, 1992.
215 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool