This Article 
 Bibliographic References 
 Add to: 
Using Kernel Alignment to Select Features of Molecular Descriptors in a QSAR Study
September/October 2011 (vol. 8 no. 5)
pp. 1373-1384
William W.L. Wong, University of Toronto, Toronto
Forbes J. Burkowski, University of Waterloo, Waterloo
Quantitative structure-activity relationships (QSARs) correlate biological activities of chemical compounds with their physicochemical descriptors. By modeling the observed relationship seen between molecular descriptors and their corresponding biological activities, we may predict the behavior of other molecules with similar descriptors. In QSAR studies, it has been shown that the quality of the prediction model strongly depends on the selected features within molecular descriptors. Thus, methods capable of automatic selection of relevant features are very desirable. In this paper, we present a new feature selection algorithm for a QSAR study based on kernel alignment which has been used as a measure of similarity between two kernel functions. In our algorithm, we deploy kernel alignment as an evaluation tool, using recursive feature elimination to compute a molecular descriptor containing the most important features needed for a classification application. Empirical results show that the algorithm works well for the computation of descriptors for various applications involving different QSAR data sets. The prediction accuracies are substantially increased and are comparable to those from earlier studies.

[1] I. Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,” J. Machine Learning Research, vol. 3, pp. 1157-1182, 2003.
[2] R. Guha and P.C. Jurs, “Development of Linear, Ensemble, and Nonlinear Models for the Prediction and Interpretation of the Biological Activity of a Set of PDGFR Inhibitors,” J. Chemical Information and Computer Sciences, vol. 44, pp. 2179-2189, 2004.
[3] R. Todeschini and V. Consonni, Handbook of Molecular Descriptors. Wiley-VCH, 2000.
[4] B. Schölkopf, K. Tsuda, and J.P. Vert, Kernel Methods in Computational Biology. MIT Press, 2004.
[5] C. Merkwirth, H.A. Mauser, T. Schulz-Gasch, O. Roche, M. Stahl, and T. Lengauer, “Ensemble Methods for Classification in Cheminformatics,” J. Chemical Information and Computer Sciences, vol. 44, pp. 1971-1978, 2004.
[6] Y. Liu, “A Comparative Study on Feature Selection Methods for Drug Discovery,” J. Chemical Information and Computer Sciences, vol. 44, pp. 1823-1828, 2004.
[7] V. Venkatraman, A.R. Dalby, and Z.R. Yang, “Evaluation of Mutual Information and Genetic Programming for Feature Selection in QSAR,” J. Chemical Information and Computer Sciences, vol. 44, pp. 1686-1692, 2004.
[8] N. Baurin, J.C. Mozziconacci, E. Arnoult, P. Chavatte, C. Marot, and L. Morin-Allory, “2D QSAR Consensus Prediction for High-Throughput Virtual Screening. An Application to COX-2 Inhibition Modeling and Screening of the NCI Database,” J. Chemical Information and Computer Sciences, vol. 44, pp. 276-285, 2004.
[9] J.M. Sutter, S.L. Dixon, and P.C. Jurs, “Automated Descriptor Selection for Quantitative Structure-Activity Relationships Using Generalized Simulated Annealing,” J. Chemical Information and Computer Sciences, vol. 35, pp. 77-84, 1995.
[10] Y. Xue, C.W. Yap, L.Z. Sun, Z.W. Cao, J.F. Wang, and Y.Z. Chen, “Prediction of P-Glycoprotein Substrates by a Support Vector Machine Approach,” J. Chemical Information and Computer Sciences, vol. 44, pp. 1497-1505, 2004.
[11] Y. Xue, Z.R. Li, C.W. Yap, L.Z. Sun, X. Chen, and Y.Z. Chen, “Effect of Molecular Descriptor Feature Selection in Support Vector Machine Classification of Pharmacokinetic and Toxicological Properties of Chemical Agents,” J. Chemical Information and Computer Sciences, vol. 44, no. 5, pp. 1630-1638, 2004.
[12] N. Cristianini, A. Elisseeff, J. Shawe-Taylor, and J. Kandola, “On Kernel Target Alignment,” Advances in Neural Information Processing Systems 14, MIT Press, 2002.
[13] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene Selection for Cancer Classification Using Support Vector Machines,” Machine Learning, vol. 46, pp. 389-422, 2002.
[14] H. Yu, J. Yang, W. Wang, and J. Hah, “Discovering Compact and Highly Discriminative Features or Feature Combinations of Drug Activities Using Support Vector Machines,” Proc. IEEE CS Conf. Bioinformatics, pp. 220-228, 2003.
[15] L. Wang, “Feature Selection with Kernel Class Separability,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 9, pp. 1534-1546, Sept. 2008.
[16] C.H. Nguyen and T.B. Ho, “Kernel Matrix Evaluation,” Proc. Int'l Joint Conf. Artificial Intelligence, 2007.
[17] T. Joachims, “Making Large-Scale SVM Learning Practical,” Advances in Kernel Methods: Support Vector Learning, B. Schölkopf, C. Burges, and A. Smola, eds., MIT Press, 1999.
[18] L.I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons, 2004.
[19] Y. Tang, Y. Zhang, and Z. Huang, “Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 4, no. 3, pp. 365-381, July-Sept. 2007.
[20] J.J. Sutherland, L.A. O'Brien, and D.F. Weaver, “A Comparison of Methods for Modeling Quantitative Structure-Activity Relationships,” J. Medicinal Chemistry, vol. 47, no. 22, pp. 5541-5554, Oct. 2004.
[21] S.A. Depriest, D. Mayer, C.B. Naylor, and G.R. Marshall, “3D-QSAR of Angiotensin-Converting Enzyme and Thermolysin Inhibitors - A Comparison of CoMFA Models Based on Deduced and Experimentally Determined Active-Site Geometries,” J. Am. Chemical Soc., vol. 115, no. 13, pp. 5372-5384, 1993.
[22] H.C. Huang, T.S. Chamberlain, K. Seibert, C.M. Koboldt, and P.C. Isakson, “Diaryl Indenes and Nenzofurans - Novel Classes of Potent and Selective Cyclooxygenase-2 Inhibitors,” Bioorganic and Medicinal Chemistry Letters, vol. 5, pp. 2377-2380, 1995.
[23] P. Chavatte, S. Yous, C. Marot, N. Baurin, and D. Lesieur, “Three-Dimensional Quantitative Structure-Activity Relationships of Cyclo-Oxygenase-2 (COX-2) Inhibitors: A Comparative Molecular Field Analysis,” J. Medicinal Chemistry, vol. 44, no. 20, pp. 3223-3230, 2001.
[24] M.C. Broughton and S.F. Queener, “Pneumocystis-Carinii Dihydrofolate-Reductase Used to Screen Potential AntiPneumocystis Drugs,” Antimicrobial Agents and Chemother, vol. 35, pp. 1348-1355, 1991.
[25] J.J. Sutherland and D.F. Weaver, “Three-Dimensional Quantitative Structure-Activity and Structure-Selectivity Relationships of Dihydrofolate Reductase Inhibitors,” J. Computer-Aided Molecular Design, vol. 18, no. 5, pp. 309-331, 2004.
[26] G. Klebe, U. Abraham, and T. Mietzner, “Molecular Similarity Indexes in a Comparative-Analysis (Comsia) of Drug Molecules to Correlate and Predict Their Biological-Activity,” J. Medicinal Chemistry, vol. 37, no. 24, pp. 4130-4146, 1994.
[27] F.J. Burkowski and W.W.L. Wong, “Predicting Multiple Binding Modes Using a Kernel Method Based on a Vector Space Model Molecular Descriptor,” Int'l J. Computational Biology and Drug Design, vol. 2, no.1, pp. 58-80, 2009.
[28] M.J.R. Healy, “The Use of ${\rm{R^2}}$ as a Measure of Goodness of Fit,” J. Royal Statistical Soc., vol. A 147, pp. 608-609, 1984.
[29] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge Univ. Press, 2000.
[30] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. Cambridge Univ. Press, 2004.

Index Terms:
Feature selection, kernel alignment, quantitative structure-activity relationship (QSAR).
William W.L. Wong, Forbes J. Burkowski, "Using Kernel Alignment to Select Features of Molecular Descriptors in a QSAR Study," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 5, pp. 1373-1384, Sept.-Oct. 2011, doi:10.1109/TCBB.2011.31
Usage of this product signifies your acceptance of the Terms of Use.