This Article 
 Bibliographic References 
 Add to: 
A Biologically Inspired Validity Measure for Comparison of Clustering Methods over Metabolic Data Sets
May-June 2012 (vol. 9 no. 3)
pp. 706-716
L. Kamenetzky, Partner Group, Max-Planck Inst. for Mol. Plant Physiol., Castelar, Argentina
D. H. Milone, Res. Center for Signals, Syst. & Comput. Intell., FICH-UNL, Santa Fe, Argentina
G. Stegmayer, Center for R&D of Inf. Syst. (CIDISI, UTN-FRSF, Santa Fe, Argentina
M. G. Lopez, Partner Group, Max-Planck Inst. for Mol. Plant Physiol., Castelar, Argentina
F. Carrari, Partner Group, Max-Planck Inst. for Mol. Plant Physiol., Castelar, Argentina
In the biological domain, clustering is based on the assumption that genes or metabolites involved in a common biological process are coexpressed/coaccumulated under the control of the same regulatory network. Thus, a detailed inspection of the grouped patterns to verify their memberships to well-known metabolic pathways could be very useful for the evaluation of clusters from a biological perspective. The aim of this work is to propose a novel approach for the comparison of clustering methods over metabolic data sets, including prior biological knowledge about the relation among elements that constitute the clusters. A way of measuring the biological significance of clustering solutions is proposed. This is addressed from the perspective of the usefulness of the clusters to identify those patterns that change in coordination and belong to common pathways of metabolic regulation. The measure summarizes in a compact way the objective analysis of clustering methods, which respects coherence and clusters distribution. It also evaluates the biological internal connections of such clusters considering common pathways. The proposed measure was tested in two biological databases using three clustering methods.

[1] E. Keedwell and A. Narayanan, Intelligent Bioinformatics: The Application of Artificial Intelligence Techniques to Bioinformatics Problems. Wiley, 2005.
[2] P.V. Gopalacharyulu, E. Lindfors, J. Miettinen, C.K. Bounsaythip, and M. Oresic, "An Integrative Approach for Biological Data Mining and Visualisation," Int'l J. Data Mining and Bioinformatics, vol. 2, no. 1, pp. 54-77, 2008.
[3] S. Datta and S. Datta, "Evaluation of Clustering Algorithms for Gene Expression Data," BMC Bioinformatics, vol. 7, article S17, 2006.
[4] G.B. Fogel, "Computational Intelligence Approaches for Pattern Discovery in Biological Systems," Briefings in Bioinformatics, vol. 9, no. 4, pp. 307-316, 2008.
[5] B. Andreopoulos, A. An, X. Wang, and M. Schroeder, "A Roadmap of Clustering Algorithms: Finding a Match for a Biomedical Application," Briefings in Bioinformatics, vol. 10, no. 3, pp. 297-314, 2009.
[6] M. Vignes and F. Forbes, "Gene Clustering via Integrated Markov Models Combining Individual and Pairwise Features," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 6, no. 2, pp. 260-270, Apr.-June 2009.
[7] O. Rubel, G. Weber, M.-Y. Huang, E.W. Bethel, M. Biggin, C. Fowlkes, C.L. Hendriks, S. Keranen, M. Eisen, D. Knowles, J. Malik, H. Hagen, and B. Hamann, "Integrating Data Clustering and Visualization for the Analysis of 3d Gene Expression Data," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 7, no. 1, pp. 64-79, Jan.-Mar. 2010.
[8] R. Xu and D.C. Wunsch II, Clustering, Wiley Interscience, 2009.
[9] C.J. Wolfe, I.S. Kohane, and A.J. Butte, "Systematic Survey Reveals General Applicability of 'Guilt-by-Association' within Gene Coexpression Networks," BMC Bioinformatics, vol. 6, article 227, 2005.
[10] V. Lacroix, L. Cottret, P. Thebault, and M.-F. Sagot, "An Introduction to Metabolic Networks and Their Structural Analysis," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 5, no. 4, pp. 594-617, Oct.-Dec. 2008.
[11] T. Tohge and A. Fernie, "Combining Genetic Diversity, Informatics and Metabolomics to Facilitate Annotation of Plant Gene Function," Nature Protocols, vol. 5, no. 6, pp. 1210-1227, June 2010.
[12] J. Handl, J. Knowles, and D.B. Kell, "Computational Cluster Validation in Post-Genomic Data Analysis," Bioinformatics, vol. 21, no. 15, pp. 3201-3212, 2005.
[13] J. Freudenberg, V. Joshi, Z. Hu, and M. Medvedovic, "Clean: Clustering Enrichment Analysis," BMC Bioinformatics, vol. 10, article 234, 2009.
[14] S. Datta and S. Datta, "Validation Measures for Clustering Algorithms Incorporating Biological Information," Proc. First Int'l Multi-Symp. Computer and Computational Sciences (IMSCCS'06), vol. 1, pp. 131-135, 2006.
[15] I. Gat-Viks, R. Sharan, and R. Shamir, "Scoring Clustering Solutions by Their Biological Relevance," Bioinformatics, vol. 19, no. 18, pp. 2381-2389, 2003.
[16] V. Pihur, S. Datta, and S. Datta, "Weighted Rank Aggregation of Cluster Validation Measures: A Monte Carlo Cross-Entropy Approach," Bioinformatics, vol. 23, no. 13, pp. 1607-1615, 2007.
[17] D. Huang and W. Pan, "Incorporating Biological Knowledge into Distance-Based Clustering Analysis of Microarray Gene Expression Data," Bioinformatics, vol. 22, no. 10, pp. 1259-1268, 2006.
[18] F. Gibbons and F. Roth, "Judging the Quality of Gene Expression-Based Clustering Methods Using Gene Annotation," Genome Research, vol. 12, pp. 1574-1581, 2002.
[19] M. Kanehisa and S. Goto, "KEGG: Kyoto Encyclopedia of Genes and Genomes," Nucleic Acids Research, vol. 28, pp. 27-30, 2000.
[20] L. Rieseberg and J. Wendel, Introgression and Its Consequences in Plants, vol. 1, R.G. Harrison, ed. Oxford Univ. Press, 1993.
[21] Z. Lippman, Y. Semel, and D. Zamir, "An Integrated View of Quantitative Trait Variation Using Tomato Interspecific Introgression Lines," Current Opinion in Genetics and Development, vol. 17, pp. 1-8, 2007.
[22] G. Stegmayer, D. Milone, L. Kamenetzky, M. Lopez, and F. Carrari, "Neural Network Model for Integration and Visualization of Introgressed Genome and Metabolite Data," Proc. IEEE Int'l Joint Conf. Neural Networks, pp. 3177-3183, 2009.
[23] M. Yano, S. Kanaya, M. Altaf-Ul-Amin, K. Kurokawa, M.Y. Hirai, and K. Saito, "Integrated Data Mining of Transcriptome and Metabolome Based on Bl-Som," J. Computer Aided Chemistry, vol. 7, pp. 125-136, 2006.
[24] K. Saito, M.Y. Hirai, and K. Yonekura-Sakakibara, "Decoding Genes with Coexpression Networks and Metabolomics - Majority Report by Precogs," Trends in Plant Science, vol. 13, pp. 36-43, 2008.
[25] C. Espinoza, T. Degenkolbe, C. Caldana, E. Zuther, A. Leisse, L. Willmitzer, D. Hincha, and M. Hannah, "Interaction with Diurnal and Circadian Regulation Results in Dynamic Metabolic and Transcriptional Changes during Cold Acclimation in Arabidopsis," PloS one, vol. 5, no. 11, pp. 1-19, 2010.
[26] S. Bandyopadhyay and M. Bhattacharyya, "A Biologically Inspired Measure for Coexpression Analysis," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 4, pp. 929-942, July/Aug. 2011.
[27] R. Tibshirani, G. Walther, and T. Hastie, "Estimating the Number of Clusters in a Data Set via the Gap Statistic," J. Royal Statistival Soc. B., vol. 63, pp. 411-423, 2001.
[28] R. Duda and P. Hart, Pattern Classification and Scene Analysis. Wiley, 2003.
[29] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Prentice-Hall, Inc., 1988.
[30] A.K. Jain, "Data Clustering: 50 Years Beyond k-Means," Pattern Recognition Letters, vol. 31, pp. 651-666, 2010.
[31] T. Kohonen, "Self-Organized Formation of Topologically Correct Feature Maps," Biological Cybernetics, vol. 43, pp. 59-69, 1982.
[32] D. Milone, G. Stegmayer, L. Kamenetzky, M. Lopez, J. Giovannoni, J.M. Lee, and F. Carrari, "∗omeSOM: a Software for Integration, Clustering and Visualization of Transcriptional and Metabolite Data Mined from Interspecific Crosses of Crop Plants," BMC Bioinformatics, vol. 11, article 438, 2010.
[33] S.A. Mingoti and J.O. Lima, "Comparing Som Neural Network with Fuzzy C-Means, k-Means and Traditional Hierarchical Clustering Algorithms," European J. Operational Research, vol. 174, no. 3, pp. 1742-1759, Nov. 2006.
[34] D. Davies and D. Bouldin, "A Cluster Separation Measure," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. PAMI-1, no. 4, pp. 224-227, Apr. 1979.
[35] J. Dunn, "Well Separated Clusters and Optimal Fuzzy Partitions," J. Cybernetics, vol. 4, pp. 95-104, 1974.
[36] D. Dotan-Cohen, S. Kasif, and A.A. Melkman, "Seeing the Forest for the Trees: Using the Gene Ontology to Restructure Hierarchical Clustering," Bioinformatics, vol. 25, no. 14, pp. 1789-1795, 2009.
[37] M. de Souto, I. Costa, D. de Araujo, T. Ludermir, and A. Schliep, "Clustering Cancer Gene Expression Data: a Comparative Study," BMC Bioinformatics, vol. 9, article 497, 2008.
[38] G. Brock, V. Pihur, S. Datta, and S. Datta, "Clvalid: An r Package for Cluster Validation," J. Statistical Software, vol. 25, no. 4, pp. 1-22, 2008.

Index Terms:
statistical analysis,biochemistry,biology computing,genetics,molecular biophysics,biological internal connections,biologically inspired validity,clustering methods,metabolic data sets,genes,metabolites,regulatory network,metabolic regulation,coherence,clusters distribution,Clustering methods,Clustering algorithms,Couplings,Bioinformatics,Coherence,Biological processes,metabolic pathways.,Clustering,validation measure,biological assessment
L. Kamenetzky, D. H. Milone, G. Stegmayer, M. G. Lopez, F. Carrari, "A Biologically Inspired Validity Measure for Comparison of Clustering Methods over Metabolic Data Sets," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 3, pp. 706-716, May-June 2012, doi:10.1109/TCBB.2012.10
Usage of this product signifies your acceptance of the Terms of Use.