2016 IEEE 16th International Conference on Bioinformatics and Bioengineering (BIBE) (2016)
Oct. 31, 2016 to Nov. 2, 2016
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/BIBE.2016.68
This paper proposes an ensemble approach based on data partitioning for large-scale DNA motif analysis. Motif prediction using genome-scale dataset is challenging due to high time and space complexity. Existing ensemble approaches, while demonstrated improve performances, are only applicable to small datasets. Our approach called ENSPART first partitions the input dataset into non-overlapping subsets which serve as input to multiple distinct motif prediction tools. It is assumed that the core motifs of a transcription factor protein exists in all data subsets. We employed seven motif prediction tools to obtain initial candidate motifs and they are merged according to their sequence content similarity. An alignment-free method is used to establish motif similarity. A novel motifs merging method is proposed to merge similar motifs obtained by tools in different data partitions. Ten genome-wide ChIP datasets are collected for evaluation. We compare our approach with MEME-ChIP and obtained improved results for nine out of ten of the datasets in terms of Area Under Curve (AUC). Most datasets obtained improved AUC value between 5 to 10%. Our approach shows the promising of data partitioning based ensemble approach for large-scale motif prediction.
Bioinformatics, DNA, Prediction algorithms, Genomics, Merging, Partitioning algorithms, Complexity theory
N. K. Lee, A. C. Choong and N. Omar, "ENSPART: An Ensemble Framework Based on Data Partitioning for DNA Motif Analysis," 2016 IEEE 16th International Conference on Bioinformatics and Bioengineering (BIBE), Taichung, Taiwan, 2016, pp. 87-94.