loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Uniform Projection Method for Motif Discovery in DNA Sequences
April-June 2004 (vol. 1 no. 2)
pp. 91-94
Buhler and Tompa [5] introduced the random projection algorithm for the motif discovery problem and demonstrated that this algorithm performs well on both simulated and biological samples. We describe a modification of the random projection algorithm, called the uniform projection algorithm, which utilizes a different choice of projections. We replace the random selection of projections by a greedy heuristic that approximately equalizes the coverage of the projections. We show that this change in selection of projections leads to improved performance on motif discovery problems. Furthermore, the uniform projection algorithm is directly applicable to other problems where the random projection algorithm has been used, including comparison of protein sequence databases.

[1] 91 T.L. Bailey and C. Elkan , “Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization,” Machine Learning, vol. 21, nos. 1-2, pp. 51-80, 1995.[2] Y. Barash , G. Elidan , N. Friedman , and T. Kaplan , “Modeling Dependencies in Protein-DNA Binding Sites,” Proc. Seventh Int'l Conf. Research in Computational Molecular Biology (RECOMB), 2003.[3] J. Buhler , “Efficient Large-Scale Sequence Comparison by Locality-Sensitive Hashing,” Bioinformatics, vol. 17, no. 5, pp. 419-428, 2001. [4] J. Buhler , “Search Algorithms for Biosequences Using Random Projection,” PhD thesis, Univ. of Washington, 2001.[5] J. Buhler and M. Tompa , “Finding Motifs Using Random Projections,” J. Computational Biology, vol. 9, no. 2, pp. 225-242, 2002. [6] V. Chváital , “A Greedy Heuristic for the Set-Covering Problem,” Math. Operations Research, vol. 4, no. 3, pp. 233-235, 1979.[7] Algorithms in Combinatorial Design Theory, C.J. Colbourn and M.J. Colbourn, eds. Amsterdam: North-Holland Publishing, 1985.[8] T.H. Cormen , C.E. Leiserson , and R.L. Rivest , Introduction to Algorithms. Cambridge, Mass.: MIT Press, 1990.[9] E. Eskin and P.A. Pevzner , “Finding Composite Regulatory Patterns in DNA Sequences,” Bioinformatics, vol. 18, no. 1, pp. 354-363, 2002.[10] W. Feller , An Introduction to Probability Theory. New York: Wiley, 1971.[11] E. Halperin , J. Buhler , R. Karp , R. Krauthgamer , and B. Westover , “Detecting Protein Sequence Conservation via Metric Embeddings,” Bioinformatics, vol. 19, no. 1, pp. 122-122, 2003. [12] G. Hertz and G. Stormo , “Identifying DNA and Protein Patterns with Statistically Significant Alignments Of Multiple Sequences,” Bioinformatics, vol. 15, pp. 563-577, 1999. [13] P. Indyk and R. Motwani , “Approximate Nearest Neighbors: towards Removing the Curse of Dimensionality,” Proc. 13th Ann. ACM Symp. Theory of Computing, 1998.[14] U. Keich and P.A. Pevzner , “Finding Motifs in the Twilight Zone,” Bioinformatics, vol. 18, no. 10, pp. 1374-1381, 2002. [15] C. Lawrence , S. Altschul , M. Boguski , J. Liu , A. Neuwald , and J. Wootton , “Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment,” Science, vol. 262, pp. 208-214, 1993.[16] L. Marsan and M.F. Sagot , “Algorithms for Extracting Structured Motifs Using a Suffix Tree with an Application to Promoter and Regulatory Site Consensus Identification,” J. Computational Biology, vol. 7, nos. 3-4, pp. 345-362, 2000. [17] H. Niederreiter , “Random Number Generation and Quasi-Monte Carlo Methods,” Proc. CBMS-NSF Regional Conf. Series in Applied Math., vol. 63, 1992.[18] P.A. Pevzner and S.H. Sze , “Combinatorial Approaches to Finding Subtle Signals in DNA Sequences,” Proc. Eighth Int'l Conf. Intelligent Systems for Molecular Biology, pp. 269-278, 2000.[19] I. Rigoutsos and A. Floratos , “Combinatorial Pattern Discovery in Biological Sequences: The TEIRESIAS Algorithm,” Bioinformatics, vol. 14, no. 1, pp. 55-67, 1998.[20] D.R. Stinson , Combinatorial Designs: Constructions and Analysis. New York: Springer-Verlag, 2004.[21] J.H. vanLint and R.M. Wilson , A Course in Combinatorics. Cambridge Univ. Press, 1992.

Index Terms:
Motif discovery, transcription factor binding sites, random projection, combinatorial designs, low-discrepancy sequences.
Citation:
Benjamin Raphael, Lung-Tien Liu, George Varghese, "A Uniform Projection Method for Motif Discovery in DNA Sequences," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 1, no. 2, pp. 91-94, Apr.-June 2004, doi:10.1109/TCBB.2004.14
Usage of this product signifies your acceptance of the Terms of Use.