The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - March/April (2011 vol.8)
pp: 368-380
Zengyou He , Hong Kong University of Science and Technology, Hong Kong
Can Yang , Hong Kong University of Science and Technology, Hong Kong
Weichuan Yu , Hong Kong University of Science and Technology, Hong Kong
ABSTRACT
Protein identification is a key and essential step in mass spectrometry (MS) based proteome research. To date, there are many protein identification strategies that employ either MS data or MS/MS data for database searching. While MS-based methods provide wider coverage than MS/MS-based methods, their identification accuracy is lower since MS data have less information than MS/MS data. Thus, it is desired to design more sophisticated algorithms that achieve higher identification accuracy using MS data. Peptide Mass Fingerprinting (PMF) has been widely used to identify single purified proteins from MS data for many years. In this paper, we extend this technology to protein mixture identification. First, we formulate the problem of protein mixture identification as a Partial Set Covering (PSC) problem. Then, we present several algorithms that can solve the PSC problem efficiently. Finally, we extend the partial set covering model to both MS/MS data and the combination of MS data and MS/MS data. The experimental results on simulated data and real data demonstrate the advantages of our method: 1) it outperforms previous MS-based approaches significantly; 2) it is useful in the MS/MS-based protein inference; and 3) it combines MS data and MS/MS data in a unified model such that the identification performance is further improved.
INDEX TERMS
Protein identification, proteomics, peptide mass fingerprinting, mass spectrometry, set covering, linear programming, optimization.
CITATION
Zengyou He, Can Yang, Weichuan Yu, "A Partial Set Covering Model for Protein Mixture Identification Using Mass Spectrometry Data", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.8, no. 2, pp. 368-380, March/April 2011, doi:10.1109/TCBB.2009.54
REFERENCES
[1] L. McHugh and J.W. Arthur, "Computational Methods for Protein Identification from Mass Spectrometry Data," PLoS Computational Biology, vol. 4, no. 2, p. e12, 2008.
[2] W.J. Henzel, T.M. Billeci, J.T. Stults, S.C. Wong, C. Grimley, and C. Watanabe, "Identifying Proteins from Two-Dimensional Gels by Molecular Mass Searching of Peptide Fragments in Protein Sequence Databases," Proc. Nat'l Academy of Sciences USA, vol. 90, no. 11, pp. 5011-5015, 1993.
[3] P. James, M. Quadroni, E. Carafoli, and G. Gonnet, "Protein Identification by Mass Profile Fingerprinting," Biochemical and Biophysical Research Comm., vol. 195, no. 1, pp. 58-64, 1993.
[4] M. Mann, P. Hojrup, and P. Roepstorff, "Use of Mass Spectrometric Molecular Weight Information to Identify Proteins in Sequence Databases," Biological Mass Spectrometry, vol. 22, no. 6, pp. 338-345, 1993.
[5] D.J. Pappin, P. Hojrup, and A.J. Bleasby, "Rapid Identification of Proteins by Peptide Mass Fingerprinting," Current Biology, vol. 3, no. 6, pp. 327-332, 1993.
[6] J.R. Yates, S. Speicher, P.R. Griffin, and T. Hunkapiller, "Peptide Mass Maps: A Highly Informative Approach to Protein Identification," Analytical Biochemistry, vol. 214, no. 2, pp. 297-408, 1993.
[7] J.K. Eng, A.L. Mccormack, and J.R. Yates, "An Approach to Correlate Tandem Mass Spectral Data of Peptides with Amino Acid Sequences in a Protein Database," J. Am. Soc. for Mass Spectrometry, vol. 5, no. 11, pp. 976-989, 1994.
[8] V. Dancik, T.A. Addona, K.R. Clauser, J.E. Vath, and P.A. Pevzner, "De Novo Peptide Sequencing via Tandem Mass Spectrometry," J. Computational Biology, vol. 6, nos. 3/4, pp. 327-342, 1999.
[9] B. Lu, A. Motoyama, C. Ruse, J. Venable, and J.R. Yates, "Improving Protein Identification Sensitivity by Combining MS and MS/MS Information for Shotgun Proteomics Using LTQ-Orbitrap High Mass Accuracy Data," Analytical Chemistry, vol. 80, no. 6, pp. 2018-2025, 2008.
[10] D. Mantini, F. Petrucci, P.D. Boccio, D. Pieragostino, M.D. Nicola, A. Lugaresi, G. Federici, P. Sacchetta, C.D. Ilio, and A. Urbani, "Independent Component Analysis for the Extraction of Reliable Protein Signal Profiles from MALDI-TOF Mass Spectra," Bioinformatics, vol. 24, no. 1, pp. 63-70, 2008.
[11] D.N. Perkins, D.J.C. Pappin, D.M. Creasy, and J.S. Cottrell, "Probability-Based Protein Identification by Searching Sequence Databases Using Mass Spectrometry Data," Electrophoresis, vol. 20, no. 18, pp. 3551-3567, 1999.
[12] W. Zhang and B.T. Chait, "Profound: An Expert System for Protein Identification Using Mass Spectrometric Peptide Mapping Information," Analytical Chemistry, vol. 72, no. 11, pp. 2482-2489, 2000.
[13] P.R. Baker and C.K.R, "Protein Prospector," http:/prospector. ucsf.edu, 2010.
[14] M. Tuloup, C. Hernandez, I. Coro, C. Hoogland, P.-A. Binz, and R.D. Appel, "Aldente and Biograph: An Improved Peptide Mass Fingerprinting Protein Identification Environment," Swiss Proteomics Soc. 2003 Congress: Understanding Biological Systems through Proteomics, http://www.expasy.org/toolsaldente/, pp. 174-176, 2003.
[15] W.J. Henzel, C. Watanabe, and J.T. Stults, "Protein Identification: The Origins of Peptide Mass Fingerprinting," J. Am. Soc. for Mass Spectrometry, vol. 14, no. 9, pp. 931-942, 2003.
[16] I. Shadforth, D. Crowther, and C. Bessant, "Protein and Peptide Identification Algorithms Using MS for Use in High-Throughput, Automated Pipelines," Proteomics, vol. 5, no. 16, pp. 4082-4095, 2005.
[17] J. Eriksson and D. Fenyö, "Probity: A Protein Identification Algorithm with Accurate Assignment of the Statistical Significance of the Results," J. Proteome Research, vol. 3, no. 1, pp. 32-36, 2004.
[18] J. Margnin, A. Masselot, C. Menzel, and J. Colinge, "OLAV-PMF: A Novel Scoring Scheme for High-Throughput Peptide Mass Fingerprinting," J. Proteome Research, vol. 3, no. 1, pp. 55-60, 2004.
[19] J.A. Siepen, E.J. Keevil, D. Knight, and S.J. Hubbard, "Prediction of Missed Cleavage Sites in Tryptic Peptides Aids Protein Identification in Proteomics," J. Proteome Research, vol. 6, no. 1, pp. 399-408, 2007.
[20] Z. Song, L. Chen, A. Ganapathy, X.-F. Wan, L. Brechenmacher, N. Tao, D. Emerich, G. Stacey, and D. Xu, "Development and Assessment of Scoring Functions for Protein Identification Using PMF Data," Electrophoresis, vol. 28, no. 5, pp. 864-870, 2007.
[21] D. Yang, K. Ramkissoon, E. Hamlett, and M.C. Giddings, "High-Accuracy Peptide Mass Fingerprinting Using Peak Intensity Data with Machine Learning," J. Proteome Research, vol. 7, no. 1, pp. 62-69, 2008.
[22] Z. He, C. Yang, and W. Yu, "Peak Bagging for Peptide Mass Fingerprinting," Bioinformatics, vol. 24, no. 10, pp. 1293-1299, 2008.
[23] O.N. Jensen, A.V. Podtelejnikov, and M. Mann, "Identification of the Components of Simple Protein Mixtures by High-Accuracy Peptide Mass Mapping and Database Searching," Analytical Chemistry, vol. 69, no. 23, pp. 4741-4750, 1997.
[24] Z.Y. Park and D.H. Russell, "Identification of Individual Proteins in Complex Protein Mixtures by High-Resolution,High-Mass-Accuracy MALDI TOF-Mass Spectrometry Analysis of In-Solution Thermal Denaturation/Enzymatic Digestion," Analytical Chemistry, vol. 73, no. 11, pp. 2558-2564, 2001.
[25] J. Eriksson and D. Fenyö, "Protein Identification in Complex Mixtures," J. Proteome Research, vol. 4, no. 2, pp. 387-393, 2005.
[26] P. Slavik, "Improved Performance of the Greedy Algorithm for Partial Cover," Information Processing Letters, vol. 64, no. 5, pp. 251-254, 1997.
[27] M. Bläser, "Computing Small Partial Coverings," Information Processing Letters, vol. 85, no. 6, pp. 327-331, 2003.
[28] R. Gandhi, S. Khuller, and A. Srinivasan, "Approximation Algorithms for Partial Covering Problems," J. Algorithms, vol. 53, no. 1, pp. 55-84, 2004.
[29] T. Fujito, "On Combinatorial Approximation of Covering 0-1 Integer Programs and Partial Set Cover," J. Combinatorial Optimization, vol. 8, no. 4, pp. 439-452, 2004.
[30] J. Köemann, O. Parekh, and D. Segev, "A Unified Approach to Approximating Partial Covering Problems," Proc. 14th Ann. European Symp. Algorithms (ESA '06), Y. Azar and T. Erlebach, eds., pp. 468-479, Sept. 2006.
[31] J. Mestre, "Lagrangian Relaxation and Partial Cover," Proc. 25th Int'l Symp. Theoretical Aspects of Computer Science (STACS '08), S. Albers and P. Weil, eds., http://drops.dagstuhl.de/opus/volltexte/ 20081315, pp. 539-550, 2008.
[32] D.S. Hochbaum, "Approximating Covering and Packing Problems: Set Cover, Vertex Cover, Independent Set, and Related Problems," Approximation Algorithms for NP-Hard Problems, pp. 94-143, PWS Publishing Co., 1997.
[33] J. Samuelsson, D. Dalevi, F. Levander, and T. Rögnvaldsson, "Modular, Scriptable and Automated Analysis Tools for High-Throughput Peptide Mass Fingerprinting," Bioinformatics, vol. 20, no. 18, pp. 3628-3635, 2004.
[34] A.I. Nesvizhskii and R. Aebersold, "Interpretation of Shotgun Proteomic Data: The Protein Inference Problem," Molecular & Cellular Proteomics, vol. 4, no. 10, pp. 1419-1440, 2005.
[35] A.I. Nesvizhskii, A. Keller, E. Kolker, and R. Aebersold, "A Statistical Model for Identifying Proteins by Tandem Mass Spectrometry," Analytical Chemistry, vol. 75, no. 17, pp. 4646-4658, 2003.
[36] B. Zhang, M.C. Chambers, and D.L. Tabb, "Proteomic Parsimony through Bipartite Graph Analysis Improves Accuracy and Transparency," J. Proteome Research, vol. 6, no. 9, pp. 3549-3557, 2007.
[37] P. Alves, R.J. Arnold, M.V. Novotny, P. Radivojac, J.P. Reilly, and H. Tang, "Advancement in Protein Inference from Shotgun Proteomics Using Peptide Detectability," Proc. 2007 Pacific Symp. Biocomputing (PSB '07), pp. 409-420, 2007.
[38] Y.F. Li, R.J. Arnold, Y. Li, P. Radivojac, Q. Sheng, and H. Tang, "A Bayesian Approach to Protein Inference Problem in Shotgun Proteomics," Proc. 12th Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB '08), pp. 167-180, 2008.
[39] K. Baerenfaller, J. Grossmann, M.A. Grobei, R. Hull, M. Hirsch-Hoffmann, S. Yalovsky, P. Zimmermann, U. Grossniklaus, W. Gruissem, and S. Baginsky, "Genome-Scale Proteomics Reveals Arabidopsis Thaliana Gene Models and Proteome Dynamics," Science, vol. 320, pp. 938-941, 2008.
[40] N.E. Castellana, S.H. Payne, Z. Shen, M. Stanke, V. Bafna, and S.P. Briggs, "Discovery and Revision of Arabidopsis Genes by Proteogenomics," Proc. Nat'l Academy of Sciences USA, vol. 105, no. 52, pp. 21 034-21 038, 2008.
[41] J.E. Elias and S.P. Gygi, "Target-Decoy Search Strategy for Increased Confidence in Large-Scale Protein Identifications by Mass Spectrometry," Nature Methods, vol. 4, no. 3, pp. 207-214, 2007.
[42] N. Jaitly, A. Mayampurath, K. Littlefield, J.N. Adkins, G.A. Anderson, and R.D. Smith, "Decon2LS: An Open-Source Software Package for Automated Processing and Visualization of High Resolution Mass Spectrometry Data," BMC Bioinformatics, vol. 10, article no. 87, 2009.
[43] M.E. Monroe, N. Tolic, N. Jaitly, J.L. Shaw, J.N. Adkins, and R.D. Smith, "VIPER: An Advanced Software Package to Support High-Throughput LC-MS Peptide Identification," Bioinformatics, vol. 23, no. 15, pp. 2021-2023, 2007.
[44] R. Craig and R.C. Beavis, "Tandem: Matching Proteins with Tandem Mass Spectra," Bioinformatics, vol. 20, no. 9, pp. 1466-1467, 2004.
6 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool