Issue No. 03 - May-June (2012 vol. 9)

ISSN: 1545-5963

pp: 679-692

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2011.68

N. A. Hamilton , Inst. for Mol. Biosci., Univ. of Queensland, St. Lucia, QLD, Australia

K. Burrage , Comput. Lab., Oxford Univ., Oxford, UK

A. Bustamam , Dept. of Math., Univ. of Indonesia, Depok, Indonesia

ABSTRACT

Markov clustering (MCL) is becoming a key algorithm within bioinformatics for determining clusters in networks. However, with increasing vast amount of data on biological networks, performance and scalability issues are becoming a critical limiting factor in applications. Meanwhile, GPU computing, which uses CUDA tool for implementing a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient, and low-cost option to achieve substantial performance gains over CPU approaches. The use of on-chip memory on the GPU is efficiently lowering the latency time, thus, circumventing a major issue in other parallel computing environments, such as MPI. We introduce a very fast Markov clustering algorithm using CUDA (CUDA-MCL) to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of MCL. We utilized ELLPACK-R sparse format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks data sets in bioinformatics applications. As the results show, CUDA-MCL is significantly faster than the original MCL running on CPU. Thus, large-scale parallel computation on off-the-shelf desktop-machines, that were previously only possible on supercomputing architectures, can significantly change the way bioinformaticians and biologists deal with their data.

INDEX TERMS

parallel architectures, bioinformatics, graphics processing units, large-scale systems, Markov processes, supercomputing architectures, fast parallel Markov clustering, bioinformatics, ELLPACK-R sparse format, biological networks, critical limiting factor, GPU computing, CUDA tool, massively parallel computing environment, on-chip memory, fast Markov clustering algorithm, parallel sparse matrix-matrix computations, parallel sparse Markov matrix normalizations, fine-grain massively parallel processing, interaction networks data sets, large-scale parallel computation, off-the-shelf desktop-machines, Graphics processing unit, Proteins, Instruction sets, Bioinformatics, Parallel processing, Multicore processing, Markov processes, bioinformatics., Markov clustering, graphs and networks, GPU computing, PPI networks, CUDA, ELLPACK-R sparse format, scalable parallel programming, parallelism and concurrency, performance evaluation

CITATION

N. A. Hamilton, K. Burrage, A. Bustamam, "Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Computing on GPU with CUDA and ELLPACK-R Sparse Format",

*IEEE/ACM Transactions on Computational Biology and Bioinformatics*, vol. 9, no. , pp. 679-692, May-June 2012, doi:10.1109/TCBB.2011.68