2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2017)
Kansas City, MO, USA
Nov. 13, 2017 to Nov. 16, 2017
Vi Dam , College of Information Science and Technology, University of Nebraska at Omaha, Omaha, Nebraska, 68182, USA
Hesham H. Ali , College of Information Science and Technology, University of Nebraska at Omaha, Omaha, Nebraska, 68182, USA
As Next Generation Sequencing (NGS) technologies continue to expand rapidly, the need to assemble and manipulate NGS data, available in the form of short genomic reads, remains the primary source of biological data in many Bioinformatics applications. As a result, many assemblers have been developed to assemble NSG short reads into long genomic sequences or contigs ready for advanced analysis such as Whole Genome Wide Studies (GWAS). However, the lack of high levels of robustness and reproducibility continue to limit the impact of Bioinformatics research and many biomedical researchers remain skeptical of results obtained from bioinformatics applications. In this study, we conduct a comparative study of various widely used assemblers and compare their performances using several NGS datasets associated with various organisms. We highlight the advantages and disadvantage of each assembler and explore the factors that impact the performance of each approach. In addition, we survey the assembly-free compression approach recently developed to process NGS short reads to analyze their performance in comparing genomic sequences represented by sets of short reads. We use phylogeny trees obtained from simulated and real datasets to evaluate the accuracy of each assembly-free approach. We test the hypothesis that non-assembly approaches could potentially overcome the limitations and inaccuracies of assembly approaches in comparing sequences, especially for large read sizes. Moreover, we proposed a hybrid approach by integrating both assembly and non-assembly approach for classifying genomic sequences. The proposed approach incorporates results obtained from partially assembling short reads as input for assembly-free methods to complete the NGS manipulation process. Preliminary superior results show that the hybrid approach is potential in comparing genomic sequences.
Bioinformatics, Genomics, Sequential analysis, Phylogeny, Computational complexity, Next generation networking
V. Dam and H. H. Ali, "On the integration of assembly and non-assembly approaches for comparing biological sequences," 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Kansas City, MO, USA, 2017, pp. 2232-2234.