• Publication
  • 2012
  • Issue No. 3 - May-June
  • Abstract - Faster Mass Spectrometry-Based Protein Inference: Junction Trees Are More Efficient than Sampling and Marginalization by Enumeration
 This Article 
 Bibliographic References 
 Add to: 
Faster Mass Spectrometry-Based Protein Inference: Junction Trees Are More Efficient than Sampling and Marginalization by Enumeration
May-June 2012 (vol. 9 no. 3)
pp. 809-817
W. S. Noble, Dept. of Genome Sci., Univ. of Washington, Seattle, WA, USA
O. Serang, Dept. of Pathology, Children's Hosp. Boston, Boston, MA, USA
The problem of identifying the proteins in a complex mixture using tandem mass spectrometry can be framed as an inference problem on a graph that connects peptides to proteins. Several existing protein identification methods make use of statistical inference methods for graphical models, including expectation maximization, Markov chain Monte Carlo, and full marginalization coupled with approximation heuristics. We show that, for this problem, the majority of the cost of inference usually comes from a few highly connected subgraphs. Furthermore, we evaluate three different statistical inference methods using a common graphical model, and we demonstrate that junction tree inference substantially improves rates of convergence compared to existing methods. The python code used for this paper is available at http://noble.gs.washington.edu/proj/fido.

[1] A.I. Nesvizhskii, O. Vitek, and R. Aebersold, "Analysis and Validation of Proteomic Data Generated by Tandem Mass Spectrometry," Nature Methods, vol. 4, no. 10, pp. 787-797, 2007.
[2] A. Keller, A.I. Nesvizhskii, E. Kolker, and R. Aebersold, "Empirical Statistical Model to Estimate the Accuracy of Peptide Identification Made by MS/MS and Database Search," Analytical Chemistry, vol. 74, pp. 5383-5392, 2002.
[3] L. Käll, J. Canterbury, J. Weston, W.S. Noble, and M.J. MacCoss, "A Semi-Supervised Machine Learning Technique for Peptide Identification from Shotgun Proteomics Datasets," Nature Methods, vol. 4, pp. 923-925, 2007.
[4] A.I. Nesvizhskii, A. Keller, E. Kolker, and R. Aebersold, "A Statistical Model for Identifying Proteins by Tandem Mass Spectrometry," Analytical Chemistry, vol. 75, pp. 4646-4658, 2003.
[5] T.S. Price, M.B. Lucitt, W. Wu, D.J. Austin, A. Pizarro, A.K. Yokum, I.A. Blair, G.A. FitzGerald, and T. Grosser, "EBP, A Program for Protein Identification Using Multiple Tandem Mass Spectrometry Datasets," Molecular Cellular Proteomics, vol. 6, no. 3, pp. 527-536, 2007.
[6] O. Serang, M.J. MacCoss, and W.S. Noble, "Efficient Marginalization to Compute Protein Posterior Probabilities from Shotgun Mass Spectrometry Data," J. Proteome Research, vol. 9, no. 10, pp. 5346-5357, 2010.
[7] C. Shen, Z. Wang, G. Shankar, X. Zhang, and L. Li, "A Hierarchical Statistical Model to Assess the Confidence of Peptides and Proteins Inferred from Tandem Mass Spectrometry," Bioinformatics, vol. 24, pp. 202-208, 2008.
[8] Y.F. Li, R.J. Arnold, Y. Li, P. Radivojac, Q. Sheng, and H. Tang, "A Bayesian Approach to Protein Inference Problem in Shotgun Proteomics," Proc. 12th Ann. Int'l Conf. Computational Molecular Biology, pp. 167-180, 2008.
[9] O. Serang and W.S. Noble, "A Review of Statistical Methods for Protein Identification Using Tandem Mass Spectrometry," Statistics and Its Interface, vol. 5, no. 1, pp. 3-20, 2012.
[10] N. Dojer, A. Gambin, A. Mizera, B. Wilczynski, and J. Tiuryn, "Applying Dynamic Bayesian Networks to Perturbed Gene Expression Data," BMC Bioinformatics, vol. 7, no. 1, p. 249, 2006.
[11] L. Totir, R. Fernando, and J. Abraham, "An Efficient Algorithm to Compute Marginal Posterior Genotype Probabilities for Every Member of a Pedigree with Loops," Genetics Selection Evolution, vol. 41, pp. 1-11, 2009.
[12] J.K. Eng, A.L. McCormack, and J.R. YatesIII, "An Approach to Correlate Tandem Mass Spectral Data of Peptides with Amino Acid Sequences in a Protein Database," J. Am. Soc. for Mass Spectrometry, vol. 5, pp. 976-989, 1994.
[13] C.Y. Park, A.A. Klammer, L. Käll, M.P. MacCoss, and W.S. Noble, "Rapid and Accurate Peptide Identification from Tandem Mass Spectra," J. Proteome Research, vol. 7, no. 7, pp. 3022-3027, 2008.
[14] S. Arnborg, D.G. Corneil, and A. Proskurowski, "Complexity of Finding Embeddings in a $k$ -Tree," SIAM J. Algebraic and Discrete Methods, vol. 8, no. 2, pp. 277-284, 1987.
[15] S.K. Andersen, K.G. Olesen, and F.V. Jensen, "HUGIN—A Shell for Building Bayesian Belief Universes for Expert Systems," Readings in Uncertain Reasoning, Morgan Kaufmann Publishers Inc., pp. 332-337, 1990.
[16] Y. Weiss, "Correctness of Local Probability Propagation in Graphical Models with Loops," Neural Computation, vol. 12, no. 1, pp. 1-41, 2000.

Index Terms:
trees (mathematics),biology computing,expectation-maximisation algorithm,inference mechanisms,Markov processes,mass spectroscopic chemical analysis,molecular biophysics,Monte Carlo methods,proteins,connected subgraphs,tandem mass spectrometry,mass spectrometry-based protein inference,protein identification method,statistical inference method,graphical models,expectation maximization,Markov chain Monte Carlo model,marginalization,approximation heuristics,junction tree inference,python code,Proteins,Peptides,Junctions,Databases,Computational modeling,Bioinformatics,Complexity theory,Bayesian inference.,Mass spectrometry,protein identification,graphical models
W. S. Noble, O. Serang, "Faster Mass Spectrometry-Based Protein Inference: Junction Trees Are More Efficient than Sampling and Marginalization by Enumeration," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 3, pp. 809-817, May-June 2012, doi:10.1109/TCBB.2012.26
Usage of this product signifies your acceptance of the Terms of Use.