The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - Sept.-Oct. (2012 vol.9)
pp: 1273-1280
Pengyi Yang , Sch. of Inf. Technol., Univ. of Sydney, Sydney, NSW, Australia
Jie Ma , State Key Lab. of Proteomics, Beijing Inst. of Radiat. Med., Beijing, China
Penghao Wang , Sch. of Stat. & Math., Univ. of Sydney, Sydney, NSW, Australia
Yunping Zhu , State Key Lab. of Proteomics, Beijing Inst. of Radiat. Med., Beijing, China
Bing B. Zhou , Sch. of Inf. Technol., Univ. of Sydney, Sydney, NSW, Australia
Yee Hwa Yang , Sch. of Stat. & Math., Univ. of Sydney, Sydney, NSW, Australia
ABSTRACT
A critical component in mass spectrometry (MS)-based proteomics is an accurate protein identification procedure. Database search algorithms commonly generate a list of peptide-spectrum matches (PSMs). The validity of these PSMs is critical for downstream analysis since proteins that are present in the sample are inferred from those PSMs. A variety of postprocessing algorithms have been proposed to validate and filter PSMs. Among them, the most popular ones include a semi-supervised learning (SSL) approach known as Percolator and an empirical modeling approach known as PeptideProphet. However, they are predominantly designed for commercial database search algorithms, i.e., SEQUEST and MASCOT. Therefore, it is highly desirable to extend and optimize those PSM postprocessing algorithms for open source database search algorithms such as X!Tandem. In this paper, we propose a Self-boosted Percolator for postprocessing X!Tandem search results. We find that the SSL algorithm utilized by Percolator depends heavily on the initial ranking of PSMs. Starting with a poor PSM ranking list may cause Percolator to perform suboptimally. By implementing Percolator in a cascade learning manner, we can progressively improve the performance through multiple boost runs, enabling many more PSM identifications without sacrificing false discovery rate (FDR).
INDEX TERMS
unsupervised learning, bioinformatics, mass spectroscopic chemical analysis, proteins, proteomics, public domain software, false discovery rate, X!Tandem, peptide identification, mass spectrometry, self-boosted percolator, proteomics, protein identification procedure, peptide-spectrum match, downstream analysis, semi-supervised learning approach, empirical modeling approach known, PeptideProphet program, Percolator program, PSM postprocessing algorithm, open source database search algorithm, cascade learning, PSM identification, Databases, Training, Peptides, Support vector machines, Algorithm design and analysis, Proteins, Computational biology, semi-supervised learning., Proteomics, mass spectrometry, percolator, X!Tandem, peptide-spectrum match (PSM), peptide identification
CITATION
Pengyi Yang, Jie Ma, Penghao Wang, Yunping Zhu, Bing B. Zhou, Yee Hwa Yang, "Improving X!Tandem on Peptide Identification from Mass Spectrometry by Self-Boosted Percolator", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no. 5, pp. 1273-1280, Sept.-Oct. 2012, doi:10.1109/TCBB.2012.86
REFERENCES
[1] X. Han, A. Aslanian, and J. YatesIII, "Mass Spectrometry for Proteomics," Current Opinion in Chemical Biology, vol. 12, no. 5, pp. 483-490, 2008.
[2] J. Eng, A. McCormack, and J. YatesIII, "An Approach to Correlate Tandem Mass Spectral Data Of Peptides with Amino Acid Sequences in a Protein Database," J. Am. Soc. for Mass Spectrometry, vol. 5, no. 11, pp. 976-989, 1994.
[3] R. Craig, J. Cortens, D. Fenyo, and R. Beavis, "Using Annotated Peptide Mass Spectrum Libraries for Protein Identification," J. Proteome Research, vol. 5, no. 8, pp. 1843-1849, 2006.
[4] H. Lam, E. Deutsch, J. Eddes, J. Eng, N. King, S. Stein, and R. Aebersold, "Development and Validation of a Spectral Library Searching Method for Peptide Identification from MS/MS," Proteomics, vol. 7, pp. 655-667, 2007.
[5] A. Frank, M. Savitski, M. Nielsen, R. Zubarev, and P. Pevzner, "De Novo Peptide Sequencing and Identification with Precision Mass Spectrometry," J. Proteome Research, vol. 6, no. 1, pp. 114-123, 2007.
[6] M. Bern, Y. Cai, and D. Goldberg, "Lookup Peaks: A Hybrid of De Novo Sequencing and Database Search for Protein Identification by Tandem Mass Spectrometry," Analytical Chemistry, vol. 79, no. 4, pp. 1393-1400, 2007.
[7] D. Perkins, D. Pappin, D. Creasy, and J. Cottrell, "Probability-Based Protein Identification by Searching Sequence Databases Using Mass Spectrometry Data," Electrophoresis, vol. 20, no. 18, pp. 3551-3567, 1999.
[8] R. Craig and R. Beavis, "TANDEM: Matching Proteins with Tandem Mass Spectra," Bioinformatics, vol. 20, no. 9, pp. 1466-1467, 2004.
[9] L. Geer, S. Markey, J. Kowalak, L. Wagner, M. Xu, D. Maynard, X. Yang, W. Shi, and S. Bryant, "Open Mass Spectrometry Search Algorithm," J. Proteome Research, vol. 3, no. 5, pp. 958-964, 2004.
[10] I. Shilov, S. Seymour, A. Patel, A. Loboda, W. Tang, S. Keating, C. Hunter, L. Nuwaysir, and D. Schaeffer, "The Paragon Algorithm, a Next Generation Search Engine That Uses Sequence Temperature Values and Feature Probabilities to Identify Peptides from Tandem Mass Spectra," Molecular and Cellular Proteomics, vol. 6, no. 9, pp. 1638-1655, 2007.
[11] B. Balgley, T. Laudeman, L. Yang, T. Song, and C. Lee, "Comparative Evaluation of Tandem MS Search Algorithms Using a Target-Decoy Search Strategy," Molecular and Cellular Proteomics, vol. 6, no. 9, pp. 1599-1608, 2007.
[12] E. Kapp et al., "An Evaluation, Comparison, and Accurate Benchmarking of Several Publicly Available MS/MS Search Algorithms: Sensitivity And Specificity Analysis," Proteomics, vol. 5, no. 13, pp. 3475-3490, 2005.
[13] A. Nesvizhskii, F. Roos, J. Grossmann, M. Vogelzang, J. Eddes, W. Gruissem, S. Baginsky, and R. Aebersold, "Dynamic Spectrum Quality Assessment and Iterative Computational Analysis of Shotgun Proteomic Data," Molecular and Cellular Proteomics, vol. 5, no. 4, pp. 652-670, 2006.
[14] M. Kallberg and H. Lu, "An Improved Machine Learning Protocol for the Identification of Correct Sequest Search Results," BMC Bioinformatics, vol. 11, article 591, 2010.
[15] A. Keller, A. Nesvizhskii, E. Kolker, and R. Aebersold, "Empirical Statistical Model to Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database Search," Analytical Chemistry, vol. 74, no. 20, pp. 5383-5392, 2002.
[16] L. Käll, J. Canterbury, J. Weston, W. Noble, and M. MacCoss, "Semi-Supervised Learning for Peptide Identification from Shotgun Proteomics Data Sets," Nature Methods, vol. 4, no. 11, pp. 923-925, 2007.
[17] H. Choi, D. Ghosh, and A. Nesvizhskii, "Statistical Validation of Peptide Identifications in Large-Scale Proteomics Using the Target-Decoy Database Search Strategy And Flexible Mixture Modeling," J. Proteome Research, vol. 7, no. 1, pp. 286-292, 2007.
[18] E. Deutsch et al., "A Guided Tour of the Trans-Proteomic Pipeline," Proteomics, vol. 10, no. 6, pp. 1150-1159, 2010.
[19] M. Brosch, L. Yu, T. Hubbard, and J. Choudhary, "Accurate and Sensitive Peptide Identification with Mascot Percolator," J. Proteome Research, vol. 8, no. 6, pp. 3176-3181, 2009.
[20] M. Spivak, J. Weston, L. Bottou, L. Käll, and W. Noble, "Improvements to the Percolator Algorithm for Peptide Identification from Shotgun Proteomics Data Sets," J. Proteome Research, vol. 8, no. 7, pp. 3737-3745, 2009.
[21] M. Bern and Y. Kil, "Comment on 'Unbiased Statistical Analysis for Multi-Stage Proteomic Search Strategies," J. Proteome Research, vol. 10, no. 4, pp. 2123-2127, 2011.
[22] J. Elias and S. Gygi, "Target-Decoy Search Strategy for Increased Confidence in Large-Scale Protein Identifications by Mass Spectrometry," Nature Methods, vol. 4, no. 3, pp. 207-214, 2007.
[23] A. Ben-Hur, C. Ong, S. Sonnenburg, B. Schölkopf, and G. Rätsch, "Support Vector Machines and Kernels for Computational Biology," PLoS Computational Biology, vol. 4, no. 10, p. e1000173, 2008.
[24] L. Everett, C. Bierl, and S. Master, "Unbiased Statistical Analysis for Multi-Stage Proteomic Search Strategies," J. Proteome Research, vol. 9, no. 2, pp. 700-707, 2010.
[25] J. Zhang, L. Xin, B. Shan, W. Chen, M. Xie, D. Yuen, W. Zhang, Z. Zhang, G. Lajoie, and B. Ma, "Peaks db: De Novo Sequencing Assisted Database Search for Sensitive and Accurate Peptide Identification," Molecular and Cellular Proteomics, vol. 11, 2011, DOI:10.1074/mcp.M111.010587.
[26] A. Keller, J. Eng, N. Zhang, X. Li, and R. Aebersold, "A Uniform Proteomics MS/MS Analysis Platform Utilizing Open XML File Formats," Molecular Systems Biology, vol. 1, article 2005.0017, 2005, doi:10.1038/msb4100024.
48 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool