Fuzzy ARTMAP Prediction of Biological Activities for Potential HIV-1 Protease Inhibitors Using a Small Molecular Data Set
Issue No. 01 - January-February (2011 vol. 8)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.50
Christopher Badi' Abdul-Wahid , Central Washington University, Ellensburg
Grant I. Barker , Central Washington University, Ellensburg
Lukas C. Magill , Central Washington University, Ellensburg
Sarah Abdul-Wahid , Central Washington University, Ellensburg
Răzvan Andonie , Central Washington University, Ellensburg and Transylvania University of Braşov, Romania
Levente Fabry-Asztalos , Central Washington University, Ellensburg
Obtaining satisfactory results with neural networks depends on the availability of large data samples. The use of small training sets generally reduces performance. Most classical Quantitative Structure-Activity Relationship (QSAR) studies for a specific enzyme system have been performed on small data sets. We focus on the neuro-fuzzy prediction of biological activities of HIV-1 protease inhibitory compounds when inferring from small training sets. We propose two computational intelligence prediction techniques which are suitable for small training sets, at the expense of some computational overhead. Both techniques are based on the FAMR model. The FAMR is a Fuzzy ARTMAP (FAM) incremental learning system used for classification and probability estimation. During the learning phase, each sample pair is assigned a relevance factor proportional to the importance of that pair. The two proposed algorithms in this paper are: 1) The GA-FAMR algorithm, which is new, consists of two stages: a) During the first stage, we use a genetic algorithm (GA) to optimize the relevances assigned to the training data. This improves the generalization capability of the FAMR. b) In the second stage, we use the optimized relevances to train the FAMR. 2) The Ordered FAMR is derived from a known algorithm. Instead of optimizing relevances, it optimizes the order of data presentation using the algorithm of Dagher et al. In our experiments, we compare these two algorithms with an algorithm not based on the FAM, the FS-GA-FNN introduced in . We conclude that when inferring from small training sets, both techniques are efficient, in terms of generalization capability and execution time. The computational overhead introduced is compensated by better accuracy. Finally, the proposed techniques are used to predict the biological activities of newly designed potential HIV-1 protease inhibitors.
Fuzzy neural networks, evolutionary computing and genetic algorithms, computational chemistry, data mining.
Christopher Badi' Abdul-Wahid, Grant I. Barker, Lukas C. Magill, Sarah Abdul-Wahid, Răzvan Andonie, Levente Fabry-Asztalos, "Fuzzy ARTMAP Prediction of Biological Activities for Potential HIV-1 Protease Inhibitors Using a Small Molecular Data Set", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. , pp. 80-93, January-February 2011, doi:10.1109/TCBB.2009.50