The Community for Technology Leaders
RSS Icon
Issue No.01 - January/February (2012 vol.9)
pp: 286-293
Noah Daniels , Tufts University, Medford
Anoop Kumar , Tufts University, Medford
Lenore Cowen , Tufts University, Medford
Matt Menke , Tufts University, Medford
Using the Matt structure alignment program, we take a tour of protein space, producing a hierarchical clustering scheme that divides protein structural domains into clusters based on geometric dissimilarity. While it was known that purely structural, geometric, distance-based measures of structural similarity, such as Dali/FSSP, could largely replicate hand-curated schemes such as SCOP at the family level, it was an open question as to whether any such scheme could approximate SCOP at the more distant superfamily and fold levels. We partially answer this question in the affirmative, by designing a clustering scheme based on Matt that approximately matches SCOP at the superfamily level, and demonstrates qualitative differences in performance between Matt and DaliLite. Implications for the debate over the organization of protein fold space are discussed. Based on our clustering of protein space, we introduce the Mattbench benchmark set, a new collection of structural alignments useful for testing sequence aligners on more distantly homologous proteins.
Proteins, Indexes, Benchmark testing, Clustering algorithms, Measurement, Training, Bioinformatics,automated classification., SCOP, hierarchical classification, structure alignment, fold space
Noah Daniels, Anoop Kumar, Lenore Cowen, Matt Menke, "Touring Protein Space with Matt", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no. 1, pp. 286-293, January/February 2012, doi:10.1109/TCBB.2011.70
[1] S. Altschul, T. Madden, A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and L. Lipman, “Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs,” Nucleic Acids Research, vol. 25, pp. 3389-3402, 1997.
[2] A. Andreeva, D. Howorth, S. Brenner, T. Hubbard, C. Chothia, and A. Murzin, “SCOP Database in 2004: Refinements Integrate Structure and Sequence Family Data,” Nucleic Acids Research, vol. 32, pp. D226-229, 2004.
[3] C. Berbalk, C. Schwaiger, and P. Lackner, “Accuracy Analysis of Multiple Structure Alignments,” Protein Science, vol. 18, pp. 2027-2035, 2009.
[4] S. Cheek, Y. Qi, S. Krishna, L. Kinch, and N.V. Grishin, “SCOPmap: Automated Assignment of Protein Structures to Evolutionary Superfamilies,” BMC Bioinformatics, vol. 7, article 197, 2006.
[5] P.-H. Chi, C.-R. Shyu, and D. Xu, “A Fast SCOP Fold Classification System Using Content-Based E-Predict Algorithm,” BMC Bioinformatics, vol. 7, article 362, 2006.
[6] I.-G. Choi and S.-H. Kim, “Evolution of Protein Structural Classes and Protein Sequence Families,” Proc. Nat'l Academy of Science USA, vol. 103, pp. 14056-14061, 2006.
[7] N. Daniels, A. Kumar, L. Cowen, and M. Menke, “Touring Protein Space with Matt,” Proc. Int'l Symp. Bioinformatics Research and Applications, vol. 6053, pp. 18-28, Jan. 2010.
[8] R. Day, D. Beck, R. Armen, and V. Daggett, “A Consensus View of Fold Space: Combining SCOP, CATH, and the Dali Domain Dictionary,” Protein Science, vol. 12, pp. 2150-2160, 2003.
[9] M. Gerstein and M. Levitt, “Comprehensive Assessment of Automatic Structural Alignment against a Manual Standard, the SCOP Classification of Proteins,” Proc. Protein Science, pp. 445-456, 1998.
[10] G. Getz, M. Vendruscolo, D. Sachs, and E. Domany, “Automatic Assignment of SCOP and CATH Protein Structure Classifications from FSSP Scores,” Proteins: Structure Function and Genetics, vol. 46, pp. 405-415, 2002.
[11] J. Gibrat, T. Madej, and S. Bryant, “Surprising Similarities in Structure Comparison,” Current Opinion in Structural Biology, vol. 6, pp. 377-385, 2006.
[12] L. Greene, T. Lewis, S. Addou, A. Cuff, T. Dallman, M. Dibley, O. Redfern, F. Pearl, R. Nambudiry, A. Reid, I. Silitoe, C. Yeats, J. Thornton, and C. Orengo, “The CATH Domain Structure Database: New Protocols and Classification Levels Give a More Comprehensive Resource for Exploring Evolution,” Nucleic Acids Research, vol. 35, pp. D291-297, 2007.
[13] C. Hadley and D. Jones, “A Systematic Comparison of Protein Structure Classifications: SCOP, CATH, and FSSP,” Structure, vol. 7, pp. 1099-1112, 1999.
[14] A. Harrison, F. Pearl, R. Mott, J. Thornton, and C. Orengo, “Quantifying the Similarity within Fold Space,” J. Molecular Biology, vol. 323, pp. 909-926, 2002.
[15] T. Holland, S. Veretnik, I.N. Shindyalov, and P. Bourne, “Partitioning Protein Structures into Domains: Why Is It So Difficult?” J. Molecular Biology, vol. 361, pp. 562-590, 2006.
[16] L. Holm and J. Park, “DaliLite Workbench for Protein Structure Comparison,” Bioinformatics, vol. 16, pp. 566-567, 2000.
[17] L. Holm and C. Sander, “Mapping the Protein Universe,” Science, vol. 260, pp. 595-602, 1996.
[18] L. Holm and C. Sander, “Touring Protein Fold Space with Dali/FSSP,” Nucleic Acids Research, vol. 26, pp. 316-319, 1998.
[19] R. Kolodny, D. Petrey, and B. Honig, “Protein Structure Comparison: Implications for the Nature of Fold Space, and Structure and Function Prediction,” Current Opinion in Structural Biology, vol. 16, pp. 393-398, 2006.
[20] T. Madej, J.-F. Gibrat, and S. Bryant, “Threading a Database of Protein Cores,” Proteins, vol. 23, pp. 356-369, 1995.
[21] M. Menke, B. Berger, and L. Cowen, “Matt: Local Flexibility Aids Protein Multiple Structure Alignment,” PLoS Computational Biology, vol. 4, no. 1, p. e10, 2008, doi:10.1371/journal.pcbi. 0040010.
[22] K. Mizuguchi, C. Deane, T. Blundell, and J. Overington, “HOMSTRAD: A Database of Protein Structure Alignments for Homologous Families,” Protein Science, vol. 11, pp. 2469-2471, 1998.
[23] A. Murzin, S. Brenner, T. Hubbard, and C. Chothia, “SCOP: A Structural Classification of Proteins Database for the Investigation of Sequences and Structures,” J. Molecular Biology, vol. 297, pp. 536-540, 1995.
[24] C. Orengo, A. Michie, S. Jones, D. Jones, M. Swindells, and J. Thornton, “Cath—A Hierarchic Classification of Protein Domain Structures,” Structure, vol. 5, no. 8, pp. 1093-1108, 1997.
[25] F. Pearl, C. Bennett, J. Bray, A. Harrison, N. Martin, A. Shepherd, I. Sillitoe, J. Thornton, and C. Orengo, “The CATH Database: An Extended Protein Family Resource for Structural and Functional Genomics,” Nucleic Acids Research, vol. 31, pp. 452-455, 2003.
[26] O. Redfern, A. Harrison, T. Dallman, F. Pearl, and C. Orengo, “CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures,” PLOS Computational Biology, vol. 3, p. e232, 2007, doi:10.1371/journal.pcji.0030232.
[27] J. Rocha, J. Segura, R. Wilson, and S. Dasgupta, “Flexible Structural Protein Alignment by a Sequence of Local Transformations,” Bioinformatics, vol. 25, pp. 1625-1631, 2009.
[28] B. Rost, “Did Evolution Leap to Create the Protein Universe?” Current Opinion in Structural Biology, vol. 12, pp. 409-416, 2002.
[29] R. Sadreyev, B.-H. Kim, and N. Grishin, “Discrete-Continous Duality of Protein Structure Space,” Current Opinion in Structural Biology, vol. 19, pp. 321-328, 2009.
[30] V. Sam, C. Tai, J. Garnier, J.F. Gibrat, B. Lee, and P. Munson, “ROC and Confusion Analysis of Structure Comparison Methods Identify the Main Causes of Divergence from Manual Protein Classification,” BMC Bioinformatics, vol. 7, article 206, 2006.
[31] V. Sam, C. Tai, J. Garnier, J.F. Gibrat, B. Lee, and P. Munson, “Towards an Automatic Classification of Protein Structural Domains Based on Structural Similarity,” BMC Bioinformatics, vol. 9, article 74, 2008.
[32] I. Shindyalov and P. Bourne, “An Alternative View of Protein Fold Space,” Proteins, vol. 38, pp. 513-514, 2000.
[33] M. Simonsen, T. Mailund, and C.N.S. Pedersen, “Rapid Neighbour-Joining,” Proc. Eighth Int'l Workshop Algorithms in Bioinformatics (WABI '08), pp. 113-122, 2008.
[34] S. Suhrer, M. Wederstein, and M. Sippl, “QSCOP-SCOP Quantified by Structural Relationships,” Bioinformatics, vol. 23, pp. 513-514, 2007.
[35] R. Valas, S. Yang, and P. Bourne, “Nothing about Protein Structure Classification Makes Sense Except in the Light of Evolution,” Current Opinion in Structural Biology, vol. 19, pp. 392-334, 2009.
[36] I. VanWalle, I. Lasters, and L. Wyns, “SABmark—A Benchmark for Sequence Alignment that Covers the Entire Known Fold Space,” Bioinformatics, vol. 21, pp. 1267-1268, 2005.
[37] S. Veretnik, P. Bourne, N. Alexandrov, and I. Shindyalov, “Toward Consistent Assignment of Structural Domains in Proteins,” J. Molecular Biology, vol. 339, pp. 647-678, 2004.
[38] M. Vuk and T. Curk, “Roc Curve, Lift Chart and Calibration Plot,” Metodolo sÿki zvezki, vol. 2, pp. 89-108, 2006.
[39] A. Zemla, B. Geisbrecht, J. Smith, M. Lam, B. Kirkpatrick, M. Wagner, T. Slezak, and C. Zhou, “STRALCP-Structure Alignment-Based Clustering of Proteins,” Nucleic Acids Research, vol. 35, p. e150, 2007.
[40] Y. Zhang and J. Skolnick, “TM-Align: A Protein Structure Alignment Algorithm Based on the TM-Score,” Nucleic Acids Research, vol. 33, no. 7, pp. 2302-2309, 2005.
43 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool