The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - April-June (2009 vol.6)
pp: 260-270
ABSTRACT
Clustering of genes into groups sharing common characteristics is a useful exploratory technique for a number of subsequent computational analysis. A wide range of clustering algorithms have been proposed in particular to analyze gene expression data, but most of them consider genes as independent entities or include relevant information on gene interactions in a suboptimal way. We propose a probabilistic model that has the advantage to account for individual data (e.g., expression) and pairwise data (e.g., interaction information coming from biological networks) simultaneously. Our model is based on hidden Markov random field models in which parametric probability distributions account for the distribution of individual data. Data on pairs, possibly reflecting distance or similarity measures between genes, are then included through a graph, where the nodes represent the genes, and the edges are weighted according to the available interaction information. As a probabilistic model, this model has many interesting theoretical features. In addition, preliminary experiments on simulated and real data show promising results and points out the gain in using such an approach. Availability: The software used in this work is written in C++ and is available with other supplementary material at http://mistis.inrialpes.fr/people/forbes/transparentia/supplementary.html.
INDEX TERMS
Markov random fields, model-based clustering, metabolic networks, gene expression.
CITATION
Matthieu Vignes, Florence Forbes, "Gene Clustering via Integrated Markov Models Combining Individual and Pairwise Features", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.6, no. 2, pp. 260-270, April-June 2009, doi:10.1109/TCBB.2007.70248
REFERENCES
[1] J.D. Banfield and A.E. Raftery, “Model-Based Gaussian and Non Gaussian Clustering,” Biometrics, vol. 49, no. 3, pp. 803-821, Sept. 1993.
[2] Y. Benjamini and Y. Hochberg, “Controlling the False Discovery Rate—A Practical and Powerful Approach to Multiple Testing,” J.Royal Statistical Soc. B, vol. 57, no. 1, pp. 289-300, Feb. 1995.
[3] Y. Benjamini and D. Yekutieli, “The Control of the False Discovery Rate in Multiple Testing under Dependency,” Annals of Statistics, vol. 29, no. 4, pp. 1165-1188, Aug. 2001.
[4] C. Bouveyron, S. Girard, and C. Schmid, Class Specific Subspace Discriminant Analysis for High Dimensional Data, pp.139-150. Springer, 2006.
[5] M.P.S. Brown, W.N. Grundy, D. Lin, N. Cristianini, C.W. Sugnet, T.S. Furey, M. Ares Jr., and D. Haussler, “Knowledge-Based Analysis of Microarray Gene Expression Data by Using Support Vector Machines,” Proc. Nat'l Academy Sciences USA, vol. 97, no. 1, pp. 262-267, Jan. 2000.
[6] G. Celeux and G. Govaert, “Gaussian Parsimonious Clustering Models,” Pattern Recognition, vol. 28, pp. 781-793, 1995.
[7] G. Celeux, F. Forbes, and N. Peyrard, “EM Procedures Using Mean-Field Like Approximations for Markov-Model Based Image Segmentation,” Pattern Recognition, vol. 36, no. 1, pp. 131-144, Jan. 2003.
[8] S. Chu, J.L. DeRisi, M.B. Eisen, J. Mulholland, D. Botstein, P.O. Brown, and I. Herskowitz, “The Transcriptional Program of Sporulation in Budding Yeast,” Science, vol. 282, pp. 699-705, Oct. 1998.
[9] M.B. Eisen, P.T. Spellman, P.O. Brown, and D. Botstein, “Cluster Analysis and Display of Genome-Wide Expression Patterns,” Proc. Nat'l Academy Sciences USA, vol. 95, pp. 14863-14868, Dec. 1998.
[10] F. Forbes and N. Peyrard, “Hidden Markov Random Field Model Selection Criteria Based on Mean Field-Like Approximations,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 9, pp. 1089-1101, Sept. 2003.
[11] D. Hanisch, A. Zien, R. Zimmer, and T. Lengauer, “Co-Clustering of Biological Networks and Gene Expression,” Bioinformatics, vol. 18 (Suppl. 1), pp. S145-S154, July 2002.
[12] A.J. Hartemink, D.K. Gifford, T.S. Jaakkola, and R.A. Young, “Using Graphical Models and Genomic Expression Data to Statistically Validate Models of Genetic Regulatory Networks,” Proc. Pacific Symp. Biocomputing (PSB '02), pp. 422-433, Jan. 2002.
[13] T.R. Hugues, M.J. Marton, A.R. Jones, C.J. Roberts, R. Stoughton, C.D. Armour, H.A. Bennett, E. Coffey, H. Dai, Y.D. He, M.J. Kidd, A.M. King, M.R. Meyer, D. Slade, P.Y. Lum, S.B. Stepaniants, D.D. Shoemaker, D. Gachotte, K. Chakraburtty, J. Simon, M. Bard, and S.H. Friend, “Functional Discovery via a Compendium of Expression Profiles,” Cell, vol. 102, pp. 109-126, July 2000.
[14] M.P. Kurhekar, S. Adak, S. Jhunjhunwala, and K. Raghupathy, “Genome-Wide Pathway Analysis and Visualization Using Gene Expression Data,” Proc. Pacific Symp. Biocomputing (PSB '02), pp.462-473, Jan. 2002.
[15] G. Lanckriet, T. De Bie, N. Christianini, M.I. Jordan, and W. Noble, “A Statistical Framework for Genomic Data Fusion,” Bioinformatics, vol. 20, no. 16, pp. 2626-2635, Nov. 2004.
[16] E.M. Marcotte, M. Pellegrini, M.J. Thompson, T.O. Yeates, and D. Eisemberg, “A Combined Algorithm for Genome-Wide Prediction of Protein Function,” Nature, vol. 402, pp. 83-86, Nov. 1999.
[17] M. Medvedovic, K.Y. Yeung, and R.E. Bumgarner, “Bayesian Mixture Model Based Clustering of Replicated Microarray Data,” Bioinformatics, vol. 20, no. 8, pp. 763-774, Apr. 2004.
[18] P. Pavlidis, J. Weston, J. Cai, and W.N. Grundy, “Gene Functional Classification from Heterogeneous Data,” Proc. Fifth Ann. Int'l Conf. Computational Molecular Biology (RECOMB '01), pp. 249-255, Apr. 2001.
[19] E. Segal, H. Wang, and D. Koller, “Discovering Molecular Pathways from Protein Interaction and Gene Expression Data,” Bioinformatics, vol. 19 (Suppl. 1), pp. i264-i272, July 2003.
[20] G. Schwarz, “Estimating the Dimension of a Model,” Annals of Statistics, vol. 6, no. 2, pp. 131-134, Apr. 1978.
[21] P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E.S. Lander, and T.R. Golub, “Interpreting Patterns of Gene Expression with Self-Organizing Maps: Methods and Application to Hematopoietic Differentiation,” Proc. Nat'l Academy Sciences USA, vol. 96, no. 6, pp. 2907-2912, Mar. 1999.
[22] A. Tanay, R. Sharan, M. Kupiec, and R. Shamir, “Revealing Modularity and Organization in the Yeast Molecular Network by Integrated Analysis of Highly Heterogeneous Genomewide Data,” Proc. Nat'l Academy Sciences USA, vol. 101, no. 9, pp. 2981-2986, Mar. 2004.
[23] S. Tavazoie, J.D. Hughes, M.J. Campbell, R.J. Cho, and G.M. Church, “Systematic Determination of Genetic Network Architecture,” Nature Genetics, vol. 22, no. 3, pp. 281-285, July 1999.
[24] J.-P. Vert and M. Kanehisa, “Graph-Driven Features Extraction from Microarray Data Using Diffusion Kernels and Kernel CCA,” Advances in Neural Information Processing System, pp. 1425-1432, 2003.
[25] Y. Yamanishi, J.-P. Vert, A. Nakaya, and M. Kanehisa, “Extraction of Correlated Gene Clusters from Multiple Genomic Data by Generalized Kernel Canonical Correlation Analysis,” Bioinformatics, vol. 19, no. 1, pp. i323-i330, July 2003.
[26] K.Y. Yeung, C. Fraley, A. Murua, A. Raftery, and L. Ruzzo, “Model-Based Clustering and Data Transformations for Gene Expression Data,” Bioinformatics, vol. 17, no. 10, pp. 977-987, Oct. 2001.
[27] G. Celeux and J. Diebolt, “The SEM Algorithm: A Probabilistic Teacher Algorithm Derived from the EM Algorithm for the Mixture Problem,” Computational Statistics Quarterly, vol. 2, pp. 73-82, 1985.
24 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool