Identification of Viral Protein Genotypic Determinants Using Combinatorial Filtering and Active Learning
2010 IEEE International Conference on Bioinformatics and Bioengineering (2010)
Philadelphia, Pennsylvania USA
May 31, 2010 to June 3, 2010
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/BIBE.2010.25
RNA viruses such as HIV, Influenza, impose very significant disease burden throughout the world. Identifying key protein residue determinants that affect a given viral phenotype is an important step in learning the genotype-phenotype mapping and making clinic decisions. This identification is currently done through a laborious experimental process which is arguably inefficient, incomplete, and unreliable. We describe a supervised combinatorial filtering algorithm that systematically and efficiently infers the correct set of key residue positions from all available labeled data. We demonstrate its consistency, validate it on a variety of datasets, show the superior power to conventional identification methods, and describe its use under incremental relaxation of constraints. For cases where more data is needed to fully converge to an answer, we introduce an active learning algorithm to help choose the most informative experiment from a set of unlabeled candidate strains or mutagenesis experiments, so as to minimize the expected total laboratory time or financial cost. As an example, we demonstrate the savings afforded by this algorithm in identifying the molecular determinants of fusogenicity from a previously published dataset of Feline Immunodeficiency Virus Envelope proteins.
Combinatorial Filtering, Active Learning, Key residues identification
C. Wu, R. Rosenfeld and A. S. Walsh, "Identification of Viral Protein Genotypic Determinants Using Combinatorial Filtering and Active Learning," 2010 IEEE International Conference on Bioinformatics and Bioengineering(BIBE), Philadelphia, Pennsylvania USA, 2010, pp. 162-167.