CSDL Home IEEE/ACM Transactions on Computational Biology and Bioinformatics 2012 vol.9 Issue No.05 - Sept.-Oct.
Issue No.05 - Sept.-Oct. (2012 vol.9)
Pengyi Yang , Sch. of Inf. Technol., Univ. of Sydney, Sydney, NSW, Australia
Jie Ma , State Key Lab. of Proteomics, Beijing Inst. of Radiat. Med., Beijing, China
Penghao Wang , Sch. of Stat. & Math., Univ. of Sydney, Sydney, NSW, Australia
Yunping Zhu , State Key Lab. of Proteomics, Beijing Inst. of Radiat. Med., Beijing, China
Bing B. Zhou , Sch. of Inf. Technol., Univ. of Sydney, Sydney, NSW, Australia
Yee Hwa Yang , Sch. of Stat. & Math., Univ. of Sydney, Sydney, NSW, Australia
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2012.86
A critical component in mass spectrometry (MS)-based proteomics is an accurate protein identification procedure. Database search algorithms commonly generate a list of peptide-spectrum matches (PSMs). The validity of these PSMs is critical for downstream analysis since proteins that are present in the sample are inferred from those PSMs. A variety of postprocessing algorithms have been proposed to validate and filter PSMs. Among them, the most popular ones include a semi-supervised learning (SSL) approach known as Percolator and an empirical modeling approach known as PeptideProphet. However, they are predominantly designed for commercial database search algorithms, i.e., SEQUEST and MASCOT. Therefore, it is highly desirable to extend and optimize those PSM postprocessing algorithms for open source database search algorithms such as X!Tandem. In this paper, we propose a Self-boosted Percolator for postprocessing X!Tandem search results. We find that the SSL algorithm utilized by Percolator depends heavily on the initial ranking of PSMs. Starting with a poor PSM ranking list may cause Percolator to perform suboptimally. By implementing Percolator in a cascade learning manner, we can progressively improve the performance through multiple boost runs, enabling many more PSM identifications without sacrificing false discovery rate (FDR).
unsupervised learning, bioinformatics, mass spectroscopic chemical analysis, proteins, proteomics, public domain software, false discovery rate, X!Tandem, peptide identification, mass spectrometry, self-boosted percolator, proteomics, protein identification procedure, peptide-spectrum match, downstream analysis, semi-supervised learning approach, empirical modeling approach known, PeptideProphet program, Percolator program, PSM postprocessing algorithm, open source database search algorithm, cascade learning, PSM identification, Databases, Training, Peptides, Support vector machines, Algorithm design and analysis, Proteins, Computational biology, semi-supervised learning., Proteomics, mass spectrometry, percolator, X!Tandem, peptide-spectrum match (PSM), peptide identification
Pengyi Yang, Jie Ma, Penghao Wang, Yunping Zhu, Bing B. Zhou, Yee Hwa Yang, "Improving X!Tandem on Peptide Identification from Mass Spectrometry by Self-Boosted Percolator", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no. 5, pp. 1273-1280, Sept.-Oct. 2012, doi:10.1109/TCBB.2012.86