Issue No. 03 - July-September (2010 vol. 7)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.95
Jia Zeng , Soochow University, Suzhou
Xiao-Yu Zhao , Hong Kong Polytechnic University, Hong Kong
Xiao-Qin Cao , City University of Hong Kong, Hong Kong
Hong Yan , City University of Hong Kong, Hong Kong and The University of Sydney, Sydney
This paper integrates the signal, context, and structure features for genome-wide human promoter recognition, which is important in improving genome annotation and analyzing transcriptional regulation without experimental supports of ESTs, cDNAs, or mRNAs. First, CpG islands are salient biological signals associated with approximately 50 percent of mammalian promoters. Second, the genomic context of promoters may have biological significance, which is based on n-mers (sequences of n bases long) and their statistics estimated from training samples. Third, sequence-dependent DNA flexibility originates from DNA 3D structures and plays an important role in guiding transcription factors to the target site in promoters. Employing decision trees, we combine above signal, context, and structure features to build a hierarchical promoter recognition system called SCS. Experimental results on controlled data sets and the entire human genome demonstrate that SCS is significantly superior in terms of sensitivity and specificity as compared to other state-of-the-art methods. The SCS promoter recognition system is available online as supplemental materials for academic use and can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.95.
Promoter recognition, feature extraction, classifier combination, genome analysis.
X. Cao, J. Zeng, X. Zhao and H. Yan, "SCS: Signal, Context, and Structure Features for Genome-Wide Human Promoter Recognition," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. , pp. 550-562, 2008.