This Article 
 Bibliographic References 
 Add to: 
Machine Learning Techniques for the Automated Classification of Adhesin-Like Proteins in the Human Protozoan Parasite Trypanosoma cruzi
October-December 2009 (vol. 6 no. 4)
pp. 695-702
Ana M. González, Universidad Autónoma de Madrid, Madrid
Francisco J. Azuaje, Research Center for Publich Health (CRP-Santé), Luxembourg
José L. Ramírez, Instituto de Estudios Avanzados, Caracas
José F. da Silveira, Escola Paulista de Medicina, UNIFESP, Brazil
José R. Dorronsoro, Universidad Autónoma de Madrid, Madrid
This paper reports on the evaluation of different machine learning techniques for the automated classification of coding gene sequences obtained from several organisms in terms of their functional role as adhesins. Diverse, biologically-meaningful, sequence-based features were extracted from the sequences and used as inputs to the in silico prediction models. Another contribution of this work is the generation of potentially novel and testable predictions about the surface protein DGF-1 family in Trypanosoma cruzi. Finally, these techniques are potentially useful for the automated annotation of known adhesin-like proteins from the trans-sialidase surface protein family in T. cruzi, the etiological agent of Chagas disease.

[1] “WHO Expert Committee: Control of Chagas Disease,” Technical Report905, World Health Organization, 2002.
[2] L. Andrade and N. Andrews, “The Trypanosoma cruzi-Host-Cell Interplay: Location, Invasion, Retention,” Nature Rev. Microbiology, vol. 3, pp. 819-823, 2005.
[3] A. Frasch, “Functional Diversity in the Trans-Sialidase and Mucin Families in Trypanosoma cruzi,” Parasitology Today, vol. 16, pp. 282-286, 2002.
[4] N. Yoshida, “Molecular Basis of Mammalian Cell Invasion by Trypanosoma cruzi,” Anais da Academic Brasileira de Ciencias, vol. 78, no. 1, pp. 87-111, 2006.
[5] S.J. Kahn, D. Nguyen, M. Wleklinski, T. Granston, and M. Kahn, “Trypanosoma cruzi: Monoclonal Antibodies to the Surface Glycoprotein Superfamily Differentiate Subsets of the 85-kDa Surface Glycoproteins and Confirm Simultaneous Expression of Variant 85-kDa Surface Glycoproteins,” Experimental Parasitology, vol. 92, no. 1, pp. 48-56, 1999.
[6] S.J. Kahn and M. Wleklinski, “The Surface Glycoproteins Superfamily of Trypanosoma cruzi: Encode a Superfamily of Variant T Cell Epitopes,” The J.Immunology, vol. 159, pp. 4444-4451, 1999.
[7] R. Giordano, D.L. Fouts, D. Tewari, W. Colli, and M.J.M. Alves, “Cloning of a Surface Membrane Glycoprotein Specific for the Infective Form of Trypanosoma cruzi Having Adhesive Properties to Laminin,” The J. Biological Chemistry, vol. 274, no. 6, pp. 3461-3468, 1999.
[8] M.H. Magdesian, R. Giordano, H. Ulrich, M.A. Juliano, L. Juliano, R.I. Schumacher, W. Colli, and M.J. Alves, “Infection by Trypanosoma cruzi: Identification of a Parasite Ligand and Its Host Cell Receptor,” The J.Biological Chemistry, vol. 276, no. 22, pp. 19382-19389, 2001.
[9] P.M. Manque, D. Eichinger, M.A. Juliano, L. Juliano, J.E. Araya, and N. Yoshida, “Characterization of the Cell Adhesion Site of Trypanosoma cruzi Metacyclic Stage Surface Glycoprotein gp82,” Infection and Immunity, vol. 68, no. 2, pp. 478-484, 2000.
[10] G. Sachdeva, K. Kumar, P. Jain, and S. Ramachandran, “SPANN: A Software Program for Prediction of Adhesins and Adhesin-Like Proteins Using Neural Networks,” Bioinformatics, vol. 21, no. 4, pp. 483-491, 2005.
[11] N.M. El-Sayed et al. “The Genome Sequence of Trypanosoma cruzi, Etiologic Agent of Chagas Disease,” Science, vol. 309, pp. 409-415, 2005.
[12] M. Seringhaus, A. Paccanaro, A. Borneman, M. Snyder, and M. Gerstein, “Predicting Essential Genes in Fungal Genomes,” Genome Research, vol. 16, pp. 1126-1135, 2006.
[13] V. Brendel, P. Bucher, I.R. Nourbakhsh, B.E. Blaisdell, and S. Karlin, “Methods and Algorithms for Statistical Analysis of Protein Sequences,” Proc. Nat'l Academy Sciences USA, vol. 89, pp. 2002-2006, 1992.
[14] R. Duda, P. Hart, and D. Stork, Pattern Classification. Wiley Interscience, 2001.
[15] H. Yang and J. Moody, “Data Visualization and Feature Selection: New Algorithm for Nongaussian Data,” Advances in Neural Information Processing Systems, pp. 687-693, 2000.
[16] R.A. Fisher, “The Use of Multiple Measurement in Taxonomic Problems,” Ann. Eugenics, vol. 7, pp. 179-188, 1936.
[17] K. Fukunaga, Introduction to Statistical Pattern Recognition. Academic Press, 1990.
[18] R.C. González and R.E. Woods, Digital Image Processing. Prentice-Hall, 2007.
[19] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene Selection for Cancer Classification Using Support Vector Machines,” Machine Learning, vol. 46, pp. 389-422, 2002.
[20] C. Cortes and V. Vapnik, “Support-Vector Networks,” Machine Learning, vol. 20, pp. 273-297, 1995.
[21] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge Univ. Press, 2000.
[22] H. Qian and S. Huang, “Comparison of False Discovery Rate Methods in Identifying Genes with Differential Expression,” Genomics, vol. 86, pp. 495-503, 2005.
[23] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction Pattern Classification. Springer-Verlag, 2001.
[24] T.M. Mitchell, Machine Learning. McGraw-Hill Int'l Editions, 1997.
[25] K. Fukunaga and J. Mantock, “Nonpararametric Discriminant Analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 5, no. 6, pp.671-678, Nov. 1983.
[26] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc., vol. 39, no. 1, pp. 1-38, 1977.
[27] C.M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
[28] E. Parzen, “On the Estimation of Probability Density Function and the Mode,” The Annals of Math. Statistics, vol. 33, pp. 1065-1076, 1962.
[29] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis. Wiley, 1973.
[30] C.M. Bishop, Neural Network for Pattern Recognition. Oxford, 1998.
[31] C. Santa Cruz and J.R. Dorronsoro, “A Nonlinear Discriminant Algorithm for Feature Extraction and Data Classification,” IEEE Trans. Neural Networks, vol. 9, no. 6, pp. 1370-1376, Nov. 1998.
[32] A.M. González and J.R. Dorronsoro, “Natural Learning in NLDA Networks,” Neural Networks, vol. 20, no. 5, pp. 610-620, 2007.
[33] Y. Freund and R.E. Schapire, “A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting,” J. Computer and System Sciences, vol. 55, no. 1, pp. 119-139, 1997.
[34] E. Allwein, R.E. Schapire, and Y. Singer, “Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers,” J. Machine Learning Research, vol. 1, pp. 113-141, 2000.
[35] F.J. Azuaje, J.L. Ramírez, and J.F. da Silveira, “In Silico, Biologically-Inspired Modelling of Genomic Variation Generation in Surface Proteins of Trypanosoma cruzi,” Kinetoplastic Biology and Disease, vol. 6, no. 6, 16, 2007.
[36] P. Baldi, S. Brunak, and Y. Chauvin, “Assessing the Accuracy of Prediction Algorithms for Classification: An Overview,” Bioinformatics, vol. 16, no. 5, pp. 412-424, 2000.
[37] J.L. Ramírez, in preparation.

Index Terms:
Chagas disease, adhesin-like proteins, genomic data mining, machine learning.
Ana M. González, Francisco J. Azuaje, José L. Ramírez, José F. da Silveira, José R. Dorronsoro, "Machine Learning Techniques for the Automated Classification of Adhesin-Like Proteins in the Human Protozoan Parasite Trypanosoma cruzi," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 6, no. 4, pp. 695-702, Oct.-Dec. 2009, doi:10.1109/TCBB.2008.125
Usage of this product signifies your acceptance of the Terms of Use.