The Community for Technology Leaders
2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2015)
Washington, DC, USA
Nov. 9, 2015 to Nov. 12, 2015
ISBN: 978-1-4673-6798-1
pp: 569-574
Qiang Yu , School of Computer Science and Technology, Xidian University, Xi'an, 710071, China
Hongwei Huo , School of Computer Science and Technology, Xidian University, Xi'an, 710071, China
Ruixing Zhao , School of Computer Science and Technology, Xidian University, Xi'an, 710071, China
Dazheng Feng , School of Electronic Engineering, Xidian University, Xi'an, 710071, China
Jeffrey Scott Vitter , Department of Electrical Engineering & Computer Science, The University of Kansas, Lawrence, 66047, USA
Jun Huan , Department of Electrical Engineering & Computer Science, The University of Kansas, Lawrence, 66047, USA
ABSTRACT
The planted (l, d) motif search (PMS) is an important yet challenging problem in computational biology. Patterndriven PMS algorithms usually use k out of t input sequences as reference sequences to generate candidate motifs, and they can find all the (l, d) motifs in the input sequences. However, most of them simply take the first k sequences in the input as reference sequences without elaborate selection processes, and thus they may exhibit sharp fluctuations in running time, especially for large alphabets. In this paper, we build the reference sequence selection problem and propose a method named RefSelect to quickly solve it by evaluating the number of candidate motifs for the reference sequences. RefSelect can bring a practical time improvement of the state-of-the-art pattern-driven PMS algorithms. Experimental results show that RefSelect (1) makes the tested algorithms solve the PMS problem steadily in an efficient way, (2) particularly, makes them achieve a speedup of up to about 100 on the protein data, and (3) is also suitable for large data sets which contain hundreds or more sequences.
INDEX TERMS
reference sequences, Planted (l, d) motif search, pattern-driven
CITATION

Q. Yu, H. Huo, Ruixing Zhao, Dazheng Feng, J. S. Vitter and Jun Huan, "Reference sequence selection for motif searches," 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Washington, DC, USA, 2015, pp. 569-574.
doi:10.1109/BIBM.2015.7359745
294 ms
(Ver 3.3 (11022016))