The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - September/October (2011 vol.8)
pp: 1309-1317
Jong Kyoung Kim , Pohang University of Science and Technology, Pohang
Seungjin Choi , Pohang University of Science and Technology, Pohang
ABSTRACT
Methods for discriminative motif discovery in DNA sequences identify transcription factor binding sites (TFBSs), searching only for patterns that differentiate two sets (positive and negative sets) of sequences. On one hand, discriminative methods increase the sensitivity and specificity of motif discovery, compared to generative models. On the other hand, generative models can easily exploit unlabeled sequences to better detect functional motifs when labeled training samples are limited. In this paper, we develop a hybrid generative/discriminative model which enables us to make use of unlabeled sequences in the framework of discriminative motif discovery, leading to semisupervised discriminative motif discovery. Numerical experiments on yeast ChIP-chip data for discovering DNA motifs demonstrate that the best performance is obtained between the purely-generative and the purely-discriminative and the semisupervised learning improves the performance when labeled sequences are limited.
INDEX TERMS
Graphical models, hybrid generative/discriminative models, motif discovery, probabilistic models, semisupervised learning.
CITATION
Jong Kyoung Kim, Seungjin Choi, "Probabilistic Models for Semisupervised Discriminative Motif Discovery in DNA Sequences", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.8, no. 5, pp. 1309-1317, September/October 2011, doi:10.1109/TCBB.2010.84
REFERENCES
[1] L. Elnitski, V.X. Jin, P.J. Farnham, and S.J. Jones, “Locating Mammalian Transcription Factor Binding Sites: A Survey of Computational and Experimental Techniques,” Genome Research, vol. 16, pp. 1455-1464, 2006.
[2] E. Segal, Y. Barash, I. Simon, N. Friedman, and D. Koller, “From Promoter Sequence to Expression: A Probabilistic Framework,” Proc. Int'l Conf. Research in Computational Molecular Biology, pp. 263-272, 2002.
[3] A.D. Smith, P. Sumazin, and M.Q. Zhang, “Identifying Tissue-Selective Transcription Factor Binding Sites in Vertebrate Promoters,” Proc. Nat'l Academy of Sciences USA, vol. 102, pp. 1560-1565, 2005.
[4] S. Sinha, “On Counting Position Weight Matrix Matches in a Sequence, with Application to Discriminative Motif Finding,” Bioinformatics, vol. 22, pp. e454-e463, 2006.
[5] E. Redhead and T.L. Bailey, “Discriminative Motif Discovery in DNA and Protein Sequences Using the DEME Algorithm,” BMC Bioinformatics, vol. 8, article no. 385, 2007.
[6] T.P. Minka, “Discriminative Models, Not Discriminative Training,” Technical Report TR-2005-144, Microsoft Research Cambridge, 2005.
[7] J.A. Lasserre, C.M. Bishop, and T.P. Minka, “Principled Hybrids of Generative and Discriminative Models,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, pp. 87-94, 2006.
[8] C.T. Harbison, D.B. Gordon, T.I. Lee, N.J. Rinaldi, K.D. Macisaac, T.W. Danford, N.M. Hannett, J.B. Tagne, D.B. Reynolds, J. Yoo, E.G. Jennings, J. Zeitlinger, D.K. Pokholok, M. Kellis, P.A. Rolfe, K.T. Takusagawa, E.S. Lander, D.K. Gifford, E. Fraenkel, and R.A. Young, “Transcriptional Regulatory Code of a Eukaryotic Genome,” Nature, vol. 431, pp. 99-104, 2004.
[9] T.L. Bailey and C. Elkan, “Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers,” Proc. Int'l Conf. Intelligent Systems for Molecular Biology, 1994.
[10] G. Thijs, M. Lescot, K. Marchal, S. Rombauts, B. De Moor, P. Rouze, and Y. Moreau, “A Higher-Order Background Model Improves the Detection of Promoter Regulatory Elements by Gibbs Sampling,” Bioinformatics, vol. 17, pp. 1113-1122, 2001.
[11] J.S. Liu, A.F. Neuwald, and C.E. Lawrence, “Bayesian Models for Multiple Local Sequence Alignment and Gibbs Sampling Strategies,” J. Am. Statistical Assoc., vol. 90, pp. 1156-1170, 1995.
[12] T.L. Bailey and C. Elkan, “The Value of Prior Knowledge in Discovering Motifs with MEME,” Proc. Int'l Conf. Intelligent Systems for Molecular Biology, 1995.
[13] J. Lasserre and C.M. Bishop, “Generative or Discriminative? Getting the Best of Both Worlds,” Bayesian Statistics, vol. 8, pp. 3-24, 2007.
[14] M. Collins, “Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithm,” Proc. ACL Conf. Empirical Methods in Natural Language Processing, pp. 1-8, 2002.
[15] O. Yakhnenko, A. Silvescu, and V. Honavar, “Discriminatively Trained Markov Model for Sequence Classification,” Proc. IEEE Int'l Conf. Data Mining, pp. 498-505, 2005.
[16] A. McCallum, C. Pal, G. Druck, and X. Wang, “Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification,” Proc. Nat'l Conf. Artificial Intelligence (AAAI), 2006.
[17] J. Buhler and M. Tompa, “Finding Motifs Using Random Projections,” J. Computational Biology, vol. 9, pp. 225-242, 2002.
[18] G.Z. Hertz, G.W. Hartzell III, and G.D. Stormo, “Identification of Consensus Patterns in Unaligned DNA Sequences Known to be Functionally Related,” Computational Applied Biosciences, vol. 6, pp. 81-92, 1990.
[19] C.E. Lawrence and A.A. Reilly, “An Expectation Maximization (EM) Algorithm for the Identification and Characterization of Common Sites in Unaligned Biopolymer Sequences,” Proteins, vol. 7, pp. 41-51, 1990.
[20] C.E. Lawrence, S.F. Altschul, M.S. Boguski, J.S. Liu, A.F. Neuwald, and J.C. Wootton, “Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment,” Science, vol. 262, pp. 208-214, 1993.
[21] J.S. Liu, “The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem,” J. Am. Statistical Assoc., vol. 89, pp. 958-966, 1994.
[22] L. Narlikar, R. Gordan, U. Ohler, and A.J. Hartemink, “Informative Priors Based on Transcription Factor Structural Class Improve De Novo Motif Discovery,” Bioinformatics, vol. 22, pp. e384-e392, 2006.
[23] L. Narlikar, R. Gordan, and A.J. Hartemink, “Nucleosome Occupancy Information Improves De Novo Motif Discovery,” Proc. Int'l Conf. Research in Computational Molecular Biology, pp. 107-121, 2007.
[24] E. Eden, D. Lipson, S. Yogev, and Z. Yakhini, “Discovering Motifs in Ranked Lists of DNA Sequences,” PLoS Computational Biology, vol. 3, article no. e39, 2007.
[25] R. Gordan, L. Narlikar, and A.J. Hartemink, “A Fast, Alignment-Free, Conservation-Based Method for Transcription Factor Binding Site Discovery,” Proc. Int'l Conf. Research in Computational Molecular Biology, pp. 98-111, 2008.
[26] F.P. Roth, J.D. Hughes, P.W. Estep, and G.M. Church, “Finding DNA Regulatory Motifs within Unaligned Noncoding Sequences Clustered by Whole-Genome mRNA Quantitation,” Nature Biotechnology, vol. 16, pp. 939-945, 1998.
[27] X.S. Liu, D.L. Brutlag, and J.S. Liu, “An Algorithm for Finding Protein-DNA Binding Sites with Applications to Chromatin-Immunoprecipitation Microarray Experiments,” Nature Biotechnology, vol. 20, pp. 835-839, 2002.
20 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool