An Efficient Parallel Implementation of the Hidden Markov Methods for Genomic Sequence-Search on a Massively Parallel System
Issue No. 01 - January (2008 vol. 19)
Brian Smith , IEEE
Carlos P. Sosa , IEEE
Bioinformatics databases used for sequence comparison and sequence alignment are growing exponentially.This has popularized programs that carry out database searches. Current implementations of sequence alignmentmethods based on hidden Markov models (HMM) have proven to be computationally intensive, hence, amenable toarchitectures with multiple processors. In this paper we describe a modified version of the original parallelimplementation of hidden Markov models on a massively parallel system. This is part of the HMMER bioinformaticscode. HMMER 2.3.2 uses profile hidden Markov models (HMM) for sensitive database searching based on statisticaldescriptions of sequence family's consensus . Two of the nine programs were further parallelized to take advantage oflarge number of processors, namely, hmmsearch and hmmpfam. For our study, we start by porting the parallel virtualmachine (PVM) versions of these two programs currently available as part of the HMMER suite of programs. We reportthe performance of these non-optimized versions as baselines. Our work also includes the introduction of an alternatesequence file indexing, multiple master configuration, dynamic data collection and finally load balancing via the indexedsequence files. This set of optimizations constitutes our modified version for massively parallel systems. Our results showparallel performance improvements of more than one order of magnitude (16x) for hmmsearch and hmmpfam.
Hidden Markov models, HMMER, massively parallel systems, multiple master parallelization, parallel implementation, genomic sequence-search, bioinformatics.
Karl Jiang, Amanda Peters, Brian Smith, Carlos P. Sosa, Oystein Thorsen, "An Efficient Parallel Implementation of the Hidden Markov Methods for Genomic Sequence-Search on a Massively Parallel System", IEEE Transactions on Parallel & Distributed Systems, vol. 19, no. , pp. 15-23, January 2008, doi:10.1109/TPDS.2007.70712