CSDL Home IEEE/ACM Transactions on Computational Biology and Bioinformatics 2004 vol.1 Issue No.04 - October-December

Subscribe

Issue No.04 - October-December (2004 vol.1)

pp: 159-170

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2004.36

ABSTRACT

<p><b>Abstract</b>—We consider the problem of finding the optimal <it>combination</it> of string patterns, which characterizes a given set of strings that have a numeric attribute value assigned to each string. Pattern combinations are scored based on the correlation between their occurrences in the strings and the numeric attribute values. The aim is to find the combination of patterns which is best with respect to an appropriate scoring function. We present an <tmath>O(N^2)</tmath> time algorithm for finding the optimal <it>pair</it> of <it>substring patterns</it> combined with Boolean functions, where <tmath>N</tmath> is the total length of the sequences. The algorithm looks for all possible Boolean combinations of the patterns, e.g., patterns of the form <tmath>p \land \lnot q</tmath>, which indicates that the pattern pair is considered to occur in a given string <tmath>s</tmath>, if <tmath>p</tmath> occurs in <tmath>s</tmath>, AND <tmath>q</tmath> does NOT occur in <tmath>s</tmath>. An efficient implementation using suffix arrays is presented, and we further show that the algorithm can be adapted to find the best <tmath>k{\hbox{-}}{\rm pattern}</tmath> Boolean combination in <tmath>O(N^k)</tmath> time. The algorithm is applied to mRNA sequence data sets of moderate size combined with their turnover rates for the purpose of finding regulatory elements that <it>cooperate, complement, or compete with</it> each other in enhancing and/or silencing mRNA decay. </p>

INDEX TERMS

Pattern discovery, Boolean patterns, suffix tree, suffix array.

CITATION

Hideo Bannai, Heikki Hyyr?, Ayumi Shinohara, Masayuki Takeda, Kenta Nakai, Satoru Miyano, "An O(N^2) Algorithm for Discovering Optimal Boolean Pattern Pairs",

*IEEE/ACM Transactions on Computational Biology and Bioinformatics*, vol.1, no. 4, pp. 159-170, October-December 2004, doi:10.1109/TCBB.2004.36