This Article 
 Bibliographic References 
 Add to: 
Peptide Reranking with Protein-Peptide Correspondence and Precursor Peak Intensity Information
July-Aug. 2012 (vol. 9 no. 4)
pp. 1212-1219
Can Yang, Yale Sch. of Public Health, Yale Univ., New Haven, CT, USA
Zengyou He, Sch. of Software, Dalian Univ. of Technol., Dalian, China
Chao Yang, Hong Kong Univ. of Sci. & Technol., Kowloon, China
Weichuan Yu, Dept. of Electron. & Comput. Eng., Hong Kong Univ. of Sci. & Technol., Kowloon, China
Searching tandem mass spectra against a protein database has been a mainstream method for peptide identification. Improving peptide identification results by ranking true Peptide-Spectrum Matches (PSMs) over their false counterparts leads to the development of various reranking algorithms. In peptide reranking, discriminative information is essential to distinguish true PSMs from false PSMs. Generally, most peptide reranking methods obtain discriminative information directly from database search scores or by training machine learning models. Information in the protein database and MS1 spectra (i.e., single stage MS spectra) is ignored. In this paper, we propose to use information in the protein database and MS1 spectra to rerank peptide identification results. To quantitatively analyze their effects to peptide reranking results, three peptide reranking methods are proposed: PPMRanker, PPIRanker, and MIRanker. PPMRanker only uses Protein-Peptide Map (PPM) information from the protein database, PPIRanker only uses Precursor Peak Intensity (PPI) information, and MIRanker employs both PPM information and PPI information. According to our experiments on a standard protein mixture data set, a human data set and a mouse data set, PPMRanker and MIRanker achieve better peptide reranking results than PetideProphet, PeptideProphet+NSP (number of sibling peptides) and a score regularization method SRPI. The source codes of PPMRanker, PPIRanker, and MIRanker, and all supplementary documents are available at our website: Alternatively, these documents can also be downloaded from:

[1] R. Aebersold and M. Mann, "Mass Spectrometry-Based Proteomics," Nature, vol. 422, pp. 198-207, 2003.
[2] J.K. Eng, A.L. McCormack, and J.R. Yates III, "An Approach to Correlate Tandem Mass Spectra Data of Peptides with Amino Acid Sequences in a Protein Database," J. Am. Soc. for Mass Spectrometry, vol. 5, pp. 976-989, 1994.
[3] D. Perkins, D. Pappin, D. Creasy, and J. Cottrell, "Probability-Based Protein Identification by Searching Sequence Databases Using Mass Spectrometry Data," Electrophoresis, vol. 20, pp. 3551-3567, 1999.
[4] R. Craig and R. Beavis, "TANDEM: Matching Proteins with Tandem Mass Spectra," Bioinformatics, vol. 20, pp. 1466-1467, 2004.
[5] G. Alves, W. Wu, G. Wang, R. Shen, and Y. Yu, "Enhancing Peptide Identification Confidence by Combining Search Methods," J. Proteome Research, vol. 7, no. 8, pp. 3102-3113, 2008.
[6] B. Searle, M. Turner, and A. Nesvizhskii, "Improving Sensitivity by Probabilistically Combining Results from Multiple MS/MS Search Methodologies," J. Proteome Research, vol. 7, pp. 245-253, 2008.
[7] T. Kwon, H. Choi, C. Vogel, A. Nesvizhskii, and E. Marcotte, "MSblender: A Probabilistic Approach for Integrating Peptide Identifications from Multiple Database Search Engines," J. Proteome Research, vol. 10, pp. 2949-2958, 2011.
[8] W. Cannon, M. Rawlins, D. Baxter, S. Callister, M. Lipton, and D. Bryant, "Large Improvements in MS/MS-Based Peptide Identification Rates Using a Hybrid Analysis," J. Proteome Research, vol. 10, pp. 2306-2317, 2011.
[9] A. Keller, A. Nesvizhskii, E. Kolker, and R. Aebersold, "Empirical Statistical Model to Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database Search," Analytical Chemistry, vol. 74, pp. 5383-5392, 2002.
[10] M. Spivak, J. Weston, L. Bottou, L. Käll, and W. Noble, "Improvements to the Percolator Algorithm for Peptide Identification from Shotgun Proteomics Data Sets," J. Proteome Research, vol. 8, no. 7, pp. 3737-3745, 2009.
[11] Z. He and W. Yu, "Improving Peptide Identification with Single-Stage Mass Spectrum Peaks," Bioinformatics, vol. 25, pp. 2969-2974, 2009.
[12] L. Käll, J. Canterbury, J. Weston, W. Noble, and M. MacCoss, "Semi-Supervised Learning for Peptide Identification from Shotgun Proteomics Datasets," Nature Methods, vol. 4, pp. 923-925, 2007.
[13] A. Frank, "A Ranking-Based Scoring Function for Peptide-Spectrum Matches," J. Proteome Research, vol. 8, pp. 2241-2252, 2009.
[14] A. Klammer, X. Yi, M. MacCoss, and W. Noble, "Improving Tandem Mass Spectrum Identification Using Peptide Retention Time Prediction Across Diverse Chromatography Conditions," Analytical Chemistry, vol. 79, pp. 6111-6118, 2007.
[15] E. Strittmatter, L. Kangas, K. Petritis, H. Mottaz, G. Anderson, Y. Shen, J. Jacobs, D. Camp II, and R. Smith, "Application of Peptide LC Retention Time Information in a Discriminant Function for Peptide Identification by Tandem Mass Spectrometry," J. Proteome Research, vol. 3, pp. 760-769, 2004.
[16] A. Nesvizhskii, A. Keller, E. Kolker, and R. Aebersold, "A Statistical Model for Identifying Proteins by Tandem Mass Spectrometry," Analytical Chemistry, vol. 75, pp. 4646-4658, 2003.
[17] Z. He, H. Zhao, and W. Yu, "Score Regularization for Peptide Identification," BMC Bioinformatics, vol. 12, no. 1, p. S2, 2011.
[18] S.E. Ong, B. Blagoev, I. Kratchmarova, D.B. Kristensen, H. Steen, A. Pandey, and M. Mann, "Stable Isotope Labeling by Amino Acids in Cell Culture, SILAC, As a Simple and Accurate Approach to Expression Proteomics," Molecular and Cellular Proteomics, vol. 1, pp. 376-386, 2002.
[19] A.L. Rockwood and P. Haimi, "Efficient Calculation of Accurate Masses of Isotopic Peaks," J. Am. Soc. for Mass Spectrometry, vol. 17, pp. 415-419, 2006.
[20] L. Breiman, "Better Subset Selection Using the Non-Negative Garotte," Technometrics, vol. 37, pp. 373-384, 1995.
[21] C. Yang, C. Yang, and W. Yu, "A Regularized Method for Peptide Quantification," J. Proteome Research, vol. 9, pp. 2705-2712, 2010.
[22] P. Du and R. Angeletti, "Automatic Deconvolution of Isotope-Resolved Mass Spectra Using Variable Selection and Quantized Peptide Mass Distribution," Analytical Chemistry, vol. 78, pp. 3385-3392, 2006.
[23] J. Eckel-Passow, A. Oberg, T. Therneau, C. Mason, D. Mahoney, K. Johnson, J. Olson, and H. Bergen, "Regression Analysis for Comparing Protein Samples with 16O/18O Stable-Isotope Labeled Mass Spectrometry," Bioinformatics, vol. 22, pp. 2739-2745, 2006.
[24] K.R. Coombes, S. Tsavachidis, J.S. Morris, K.A. Baggerly, M.C. Hung, and H.M. Kuerer, "Improved Peak Detection and Quantification of Mass Spectrometry Data Acquired from Surface-Enhanced Laser Desorption and Ionization by Denoising Spectra with the Undecimated Discrete Wavelet Transform," Proteomics, vol. 5, pp. 4107-4117, 2005.
[25] P. Pedrioli, "Trans-Proteomic Pipeline: A Pipeline for Proteomic Analysis," Methods in Molecular Biology, vol. 604, pp. 213-238, 2010.
[26] J. Klimek, J.S. Eddes, L. Hohmann, J. Jackson, A. Peterson, S. Letarte, P.R. Gafken, J.E. Katz, P. Mallick, H. Lee, A. Schmidt, R. Ossola, J.K. Eng, R. Aebersold, and D.B. Martin, "The Standard Protein Mix Database: a Diverse Data Set to Assist in the Production of Improved Peptide and Protein Identification Software Tools," J. Proteome Research, vol. 7, pp. 96-103, 2008.
[27] H. Choi and A. Nesvizhskii, "False Discovery Rates and Related Statistical Concepts in Mass Spectrometry-Based Proteomics," J. Proteome Research, vol. 7, pp. 47-50, 2007.
[28] D. Tabb, "What's Driving False Discovery Rates?" J. Proteome Research, vol. 7, pp. 45-46, 2007.
[29] J. Elias and S. Gygi, "Target-Decoy Search Strategy for Increased Confidence in Large-Scale Protein Identifications by Mass Spectrometry," Nature Methods, vol. 4, pp. 207-214, 2007.
[30] L. Käll, J. Storey, M. MacCoss, and W. Noble, "Assigning Significance to Peptides Identified by Tandem Mass Spectrometry Using Decoy Databases," J. Proteome Research, vol. 7, pp. 29-34, 2007.

Index Terms:
Web sites,bioinformatics,learning (artificial intelligence),mass spectroscopy,molecular biophysics,proteins,training,bioinformatics,peptide reranking,protein-peptide correspondence,precursor peak intensity information,searching tandem mass spectra,protein database,peptide identification,peptide-spectrum matches,training machine learning models,MS1 spectra,PPMRanker methods,PPIRanker methods,MIRanker methods,PPMRanker,protein-peptide map information,standard protein mixture data set,human data set,mouse data set,website,Peptides,Proteins,Vectors,Databases,Bioinformatics,Computational biology,Tides,convex optimization.,Tandem mass spectrometry,PPM,PPI,peptide reranking
Can Yang, Zengyou He, Chao Yang, Weichuan Yu, "Peptide Reranking with Protein-Peptide Correspondence and Precursor Peak Intensity Information," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 4, pp. 1212-1219, July-Aug. 2012, doi:10.1109/TCBB.2012.29
Usage of this product signifies your acceptance of the Terms of Use.