Issue No. 01 - January-March (2008 vol. 5)
The problem of discovering novel motifs of binding sites is important to theunderstanding of gene regulatory networks. Motifs are generally represented by matrices (PWM orPSSM) or strings. However, these representations cannot model biological binding sites wellbecause they fail to capture nucleotide interdependence. It has been pointed out by manyresearchers that the nucleotides of the DNA binding site cannot be treated independently, e.g. thebinding sites of zinc finger in proteins. In this paper, a new representation called Scored PositionSpecific Pattern (SPSP), which is a generalization of the matrix and string representations, isintroduced which takes into consideration the dependent occurrences of neighboring nucleotides.Even though the problem of discovering the optimal motif in SPSP representation is proved to beNP-hard, we introduce a heuristic algorithm called SPSP-Finder, which can effectively findoptimal motifs in most simulated cases and some real cases for which existing popular motiffindingsoftware, such as Weeder, MEME and AlignACE, fail.
Computing Methodologies, Pattern Recognition, Design Methodology, Pattern analysis
Francis Chin, Henry C.M. Leung, "DNA Motif Representation with Nucleotide Dependency", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 5, no. , pp. 110-119, January-March 2008, doi:10.1109/TCBB.2007.70220