The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - May-June (2013 vol.10)
pp: 696-707
Tak-Ming Chan , Dept. of Comput. Sci. & Eng., Chinese Univ. of Hong Kong, Shatin, China
Leung-Yau Lo , Dept. of Comput. Sci. & Eng., Chinese Univ. of Hong Kong, Shatin, China
Ho-Yin Sze-To , Dept. of Comput. Sci. & Eng., Chinese Univ. of Hong Kong, Shatin, China
Kwong-Sak Leung , Dept. of Comput. Sci. & Eng., Chinese Univ. of Hong Kong, Shatin, China
Xinshu Xiao , Dept. of Integrative Biol. & Physiol., Univ. of California Los Angeles, Los Angeles, CA, USA
Man-Hon Wong , Dept. of Comput. Sci. & Eng., Chinese Univ. of Hong Kong, Shatin, China
ABSTRACT
Understanding protein-DNA interactions, specifically transcription factor (TF) and transcription factor binding site (TFBS) bindings, is crucial in deciphering gene regulation. The recent associated TF-TFBS pattern discovery combines one-sided motif discovery on both the TF and the TFBS sides. Using sequences only, it identifies the short protein-DNA binding cores available only in high-resolution 3D structures. The discovered patterns lead to promising subtype and disease analysis applications. While the related studies use either association rule mining or existing TFBS annotations, none has proposed any formal unified (both-sided) model to prioritize the top verifiable associated patterns. We propose the unified scores and develop an effective pipeline for associated TF-TFBS pattern discovery. Our stringent instance-level evaluations show that the patterns with the top unified scores match with the binding cores in 3D structures considerably better than the previous works, where up to 90 percent of the top 20 scored patterns are verified. We also introduce extended verification from literature surveys, where the high unified scores correspond to even higher verification percentage. The top scored patterns are confirmed to match the known WRKY binding cores with no available 3D structures and agree well with the top binding affinities of in vivo experiments.
INDEX TERMS
Proteins, Three-dimensional displays, Association rules, Pattern matching, DNA, Diseases,TF-TFBS associated pattern discovery, proteins, bioinformatics, bonds (chemical), data mining, DNA, genetics, molecular biophysics, molecular configurations, in vivo experiment, associated protein-DNA pattern discovery modeling, protein-DNA interaction, transcription factor binding site, gene regulation, one-sided motif discovery, sequence usage, short protein-DNA binding core identification, high resolution 3D structure, subtype analysis application, disease analysis application, association rule mining, existing TFBS annotation, formal unified model, both-sided model, associated TF-TFBS pattern discovery, instance-level evaluation, top unified score pattern, 3D structure binding core, scored pattern verification, literature survey extended verification, high unified score, high verification percentage, top scored pattern, WRKY binding core, top binding affinity, Proteins, Three-dimensional displays, Association rules, Pattern matching, DNA, Diseases, binding rules, Bioinformatics, protein-DNA interactions, motif discovery
CITATION
Tak-Ming Chan, Leung-Yau Lo, Ho-Yin Sze-To, Kwong-Sak Leung, Xinshu Xiao, Man-Hon Wong, "Modeling Associated Protein-DNA Pattern Discovery with Unified Scores", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no. 3, pp. 696-707, May-June 2013, doi:10.1109/TCBB.2013.60
50 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool