Issue No. 02 - April-June (2006 vol. 3)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2006.16
We propose a new algorithm for identifying cis-regulatory modules in genomic sequences. The proposed algorithm, named RISO, uses a new data structure, called box-link, to store the information about conserved regions that occur in a well-ordered and regularly spaced manner in the data set sequences. This type of conserved regions, called structured motifs, is extremely relevant in the research of gene regulatory mechanisms since it can effectively represent promoter models. The complexity analysis shows a time and space gain over the best known exact algorithms that is exponential in the spacings between binding sites. A full implementation of the algorithm was developed and made available online. Experimental results show that the algorithm is much faster than existing ones, sometimes by more than four orders of magnitude. The application of the method to biological data sets shows its ability to extract relevant consensi.
Box-link, factor tree, structured motif, promoter, binding site consensus.
Ana T. Freitas, Marie-France Sagot, Alexandra M. Carvalho, Arlindo L. Oliveira, "An Efficient Algorithm for the Identification of Structured Motifs in DNA Promoter Sequences", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 3, no. , pp. 126-140, April-June 2006, doi:10.1109/TCBB.2006.16