DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2013.110
Chao Yang , The Hong Kong University of Science and Technology, Hong Kong
Zengyou He , Dalian University of Technology, Dalian
Weichuan Yu , The Hong Kong University of Science and Technology, Hong Kong
In a shotgun proteomics experiment, proteins are the most biologically meaningful output. The success of proteomics studies depends on the ability to accurately and efficiently identify proteins. Many methods have been proposed to facilitate the identification of proteins from peptide identification results. However, the relationship between protein identification and peptide identification has not been thoroughly explained before. In this paper, we devote ourselves to a combinatorial perspective of the protein inference problem. We employ combinatorial mathematics to calculate the conditional protein probabilities (Protein probability means the probability that a protein is correctly identified) under three assumptions, which lead to a lower bound, an upper bound and an empirical estimation of protein probabilities, respectively. The combinatorial perspective enables us to obtain an analytical expression for protein inference. Our method achieves comparable results with ProteinProphet in a more efficient manner in experiments on two datasets of standard protein mixtures and two datasets of real samples. Based on our model, we study the impact of unique peptides and degenerate peptides (degenerate peptides are peptides shared by at least two proteins) on protein probabilities. Meanwhile, we also study the relationship between our model and ProteinProphet.
Proteins, Peptides, Probability, Bioinformatics, Upper bound, Estimation, Equations, Probability Bounds, Protein Identification, Combinatorial Perspective, Analytical Formulation
W. Yu, Z. He and C. Yang, "A Combinatorial Perspective of the Protein Inference Problem," in IEEE/ACM Transactions on Computational Biology and Bioinformatics.