2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2017)
Kansas City, MO, USA
Nov. 13, 2017 to Nov. 16, 2017
Hongbo Zhang , Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, No. 4800 Caoan Road, Shanghai 201804, China
De-Shuang Huang , Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, No. 4800 Caoan Road, Shanghai 201804, China
The rapid development of high-throughput sequencing technology provides unique opportunities for studies of transcription factor binding, while also bringing new computational challenges. Recently, a series of discriminative motif discovery (DMD) methods have been proposed and offer promising solutions for addressing these challenges. However, because of the huge computational cost, most of them have to choose approximate schemes that either sacrifice the accuracy of motif representation or tune motif parameter indirectly. In this paper, we propose Soft-bag based Motif Discovery (SMD) to discover motifs from ChIP-seq datasets. SMD formulates input sequences as a labeled bag naturally. Then, a generalized soft-margin SVM is applied to construct the objective function and a DMD-like scheme is designed to solve it. In contrast to those approximate learning strategies, SMD can optimize the motif PWM directly in a continuous space and be closer in accord with the known TF-DNA binding facts. The experimental results on real ChIP-seq datasets show that SMD substantially outperforms previous DMD methods (including DREME, HOMER and XXmotif).
Pulse width modulation, DNA, Bioinformatics, Linear programming, Training, Support vector machines, Optimization
H. Zhang and D. Huang, "Soft-bag based motif discovery for ChIP-seq datasets," 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Kansas City, MO, USA, 2017, pp. 146-149.