loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Cluster Refinement Algorithm for Motif Discovery
PrePrint
ISSN: 1545-5963
Gang Li, The Chinese University of Hong Kong, Hong Kong
Tak-Ming Chan, The Chinese University of Hong Kong, Hong Kong
Kwong-Sak Leung, The Chinese University of Hong Kong, Hong Kong
Kin-Hong Lee, The Chinese University of Hong Kong, Hong Kong
Finding Transcription Factor Binding Sites, i.e., motif discovery, is crucial for understanding the gene regulatory relationship. Motifs are weakly conserved and motif discovery is a NP-hard problem. We propose a new approach called Cluster Refinement Algorithm for Motif Discovery (CRMD). CRMD employs a flexible statistical motif model allowing a variable number of motifs and motif instances. CRMD first uses a novel entropy-based clustering to find complete and good starting candidate motifs from the DNA sequences. CRMD then uses an effective greedy refinement to search for optimal motifs from the candidate motifs. The refinement is fast, and it changes the number of motif instances based on the adaptive thresholds. The performance of CRMD is further enhanced if the problem has one occurrence of motif instance per sequence. Using an appropriate similarity test of motifs, CRMD is also able to find multiple motifs. CRMD has been tested extensively on synthetic and real datasets. The experimental results verify that CRMD usually outperforms four other state-of-the-art algorithms in terms of the qualities of the solutions with competitive computing time. It finds a good balance between finding true motif instances and screening false motif instances, and is robust on problems of various levels of difficulty.
Index Terms:
Bioinformatics (genome or protein) databases, TFBS, motif discovery, CRMD
Citation:
Gang Li, Tak-Ming Chan, Kwong-Sak Leung, Kin-Hong Lee, "A Cluster Refinement Algorithm for Motif Discovery," IEEE/ACM Transactions on Computational Biology and Bioinformatics, 13 Feb. 2009. IEEE computer Society Digital Library. IEEE Computer Society, <http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.25>
Usage of this product signifies your acceptance of the Terms of Use.