Issue No. 05 - September/October (2011 vol. 8)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.84
Seungjin Choi , Pohang University of Science and Technology, Pohang
Jong Kyoung Kim , Pohang University of Science and Technology, Pohang
Methods for discriminative motif discovery in DNA sequences identify transcription factor binding sites (TFBSs), searching only for patterns that differentiate two sets (positive and negative sets) of sequences. On one hand, discriminative methods increase the sensitivity and specificity of motif discovery, compared to generative models. On the other hand, generative models can easily exploit unlabeled sequences to better detect functional motifs when labeled training samples are limited. In this paper, we develop a hybrid generative/discriminative model which enables us to make use of unlabeled sequences in the framework of discriminative motif discovery, leading to semisupervised discriminative motif discovery. Numerical experiments on yeast ChIP-chip data for discovering DNA motifs demonstrate that the best performance is obtained between the purely-generative and the purely-discriminative and the semisupervised learning improves the performance when labeled sequences are limited.
Graphical models, hybrid generative/discriminative models, motif discovery, probabilistic models, semisupervised learning.
Seungjin Choi, Jong Kyoung Kim, "Probabilistic Models for Semisupervised Discriminative Motif Discovery in DNA Sequences", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. , pp. 1309-1317, September/October 2011, doi:10.1109/TCBB.2010.84