Subscribe

Issue No.02 - February (2009 vol.21)

pp: 178-191

Bart Baesens , University of Southampton K.U.Leuven, Southampton Leuven

David Martens , K.U.Leuven, Leuven

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2008.131

ABSTRACT

Support vector machines (SVMs) are currently state-of-the-art for the classification task and, generally speaking, exhibit good predictive performance due to their ability to model nonlinearities. However, their strength is also their main weakness, as the generated nonlinear models are typically regarded as incomprehensible black-box models. In this paper, we propose a new Active Learning-Based Approach (ALBA) to extract comprehensible rules from opaque SVM models. Through rule extraction, some insight is provided into the logics of the SVM model. ALBA extracts rules from the trained SVM model by explicitly making use of key concepts of the SVM: the support vectors, and the observation that these are typically close to the decision boundary. Active learning implies the focus on apparent problem areas, which for rule induction techniques are the regions close to the SVM decision boundary where most of the noise is found. By generating extra data close to these support vectors that are provided with a class label by the trained SVM model, rule induction techniques are better able to discover suitable discrimination rules. This performance increase, both in terms of predictive accuracy as comprehensibility, is confirmed in our experiments where we apply ALBA on several publicly available data sets.

INDEX TERMS

Support vector machine, rule extraction, active learning, black box models, ALBA.

CITATION

Bart Baesens, David Martens, "Decompositional Rule Extraction from Support Vector Machines by Active Learning",

*IEEE Transactions on Knowledge & Data Engineering*, vol.21, no. 2, pp. 178-191, February 2009, doi:10.1109/TKDE.2008.131REFERENCES

- [1] B. Baesens, T. Van Gestel, S. Viaene, M. Stepanova, J. Suykens, and J. Vanthienen, “Benchmarking State-of-the-Art Classification Algorithms for Credit Scoring,”
J. Operational Research Soc., vol. 54, no. 6, pp. 627-635, 2003.- [2] M. Pazzani, S. Mani, and W. Shankle, “Acceptance by Medical Experts of Rules Generated by Machine Learning,”
Methods of Information in Medicine, vol. 40, no. 5, pp. 380-385, 2001.- [3] D. Martens, L. Bruynseels, B. Baesens, M. Willekens, and J. Vanthienen, “Predicting Going Concern Opinion with Data Mining,”
Decision Support Systems, vol. 45, pp. 765-777, 2008.- [4] R. Andrews, J. Diederich, and A. Tickle, “Survey and Critique of Techniques for Extracting Rules from Trained Artificial Neural Networks,”
Knowledge Based Systems, vol. 8, no. 6, pp. 373-389, 1995.- [5] B. Baesens, R. Setiono, C. Mues, and J. Vanthienen, “Using Neural Network Rule Extraction and Decision Tables for Credit-Risk Evaluation,”
Management Science, vol. 49, no. 3, pp. 312-329, 2003.- [6] M. Craven and J. Shavlik, “Extracting Tree-Structured Representations of Trained Networks,”
Advances in Neural Information Processing Systems, vol. 8, D. Touretzky, M. Mozer, and M. Hasselmo, eds., pp. 24-30, The MIT Press, citeseer.ist. psu.educraven96extracting.html , 1996.- [7] M. Craven, “Extracting Comprehensible Models from Trained Neural Networks,” PhD dissertation, Dept. of Computer Sciences, Univ. of Wisconsin-Madison, 1996.
- [8] V.N. Vapnik,
The Nature of Statistical Learning Theory. Springer-Verlag Inc., 1995.- [9] N. Cristianini and J. Shawe-Taylor,
An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge Univ. Press, 2000.- [10] U. Johansson, R. König, and L. Niklasson, “The Truth Is in There—Rule Extraction from Opaque Models Using Genetic Programming,”
Proc. 17th Int'l Florida AI Research Symp. Conf. (FLAIRS), 2004.- [11] D. Martens, B. Baesens, T. Van Gestel, and J. Vanthienen, “Comprehensible Credit Scoring Models Using Rule Extraction from Support Vector Machines,”
European J. Operational Research, vol. 183, no. 3, pp. 1466-1476, 2007.- [12] B.D. Ripley, “Neural Networks and Related Methods for Classification,”
J. Royal Statistical Soc. B, vol. 56, pp. 409-456, 1994.- [13] J. Huysmans, B. Baesens, and J. Vanthienen, “Using Rule Extraction to Improve the Comprehensibility of Predictive Models,” K.U.Leuven KBI, Research 0612, 2006.
- [14] T. Van Gestel, J. Suykens, B. Baesens, S. Viaene, J. Vanthienen, G. Dedene, B. De Moor, and J. Vandewalle, “Benchmarking Least Squares Support Vector Machine Classifiers,”
Machine Learning, vol. 54, no. 1, pp. 5-32, 2004.- [15] C. Bishop,
Neural Networks for Pattern Recognition. Oxford Univ. Press, 1996.- [16] J. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, and J. Vandewalle,
Least Squares Support Vector Machines. World Scientific, 2002.- [17] N. Barakat and J. Diederich, “Eclectic Rule-Extraction from Support Vector Machines,”
Int'l J. Computational Intelligence, vol. 2, no. 1, pp. 59-62, 2005.- [18] D. Martens, J. Huysmans, R. Setiono, J. Vanthienen, and B. Baesens, “Rule Extraction from Support Vector Machines: An Overview of Issues and Application in Credit Scoring,”
Rule Extraction from Support Vector Machines, ser. Studies in Computational Intelligence, vol. 80, chapter 2, pp. 33-63, Springer, 2008.- [19] H. Nùñez, C. Angulo, and A. Català, “Rule Extraction from Support Vector Machines,”
Proc. European Symp. Artificial Neural Networks (ESANN '02), pp. 107-112, 2002.- [20] G. Fung, S. Sandilya, and R. Rao, “Rule Extraction from Linear Support Vector Machines,”
Proc. 11th ACM SIGKDD Int'l Conf. Knowledge Discovery in Data Mining (KDD '05), pp. 32-40, 2005.- [21] J. Huysmans, B. Baesens, and J. Vanthienen, “ITER: An Algorithm for Predictive Regression Rule Extraction,”
Proc. Eighth Int'l Conf. Data Warehousing and Knowledge Discovery (DaWaK '06), vol. 4081, pp. 270-279, Springer Verlag, 2006.- [22] N. Barakat and A. Bradley, “Rule Extraction from Support Vector Machines: A Sequential Covering Approach,”
IEEE Trans. Knowledge and Data Eng., vol. 19, no. 6, pp. 729-741, June 2007.- [23] L. Breiman, J. Friedman, R. Olsen, and C. Stone,
Classification and Regression Trees. Wadsworth and Brooks, 1984.- [24] P. Clark and T. Niblett, “The CN2 Induction Algorithm,”
Machine Learning, vol. 3, no. 4, pp. 261-283, 1989.- [25] J. Quinlan,
C4.5 Programs for Machine Learning. Morgan Kaufmann, 1993.- [26] I. Taha and J. Ghosh, “Symbolic Interpretation of Artificial Neural Networks,”
IEEE Trans. Knowledge and Data Eng., vol. 11, no. 3, pp. 448-463, May/June 1999.- [27] G. Schmitz, C. Aldrich, and F. Gouws, “ANN-DT: An Algorithm for Extraction of Decision Trees from Artificial Neural Networks,”
IEEE Trans. Neural Networks, vol. 10, no. 6, pp. 1392-1401, 1999.- [28] O. Boz, “Converting a Trained Neural Network to a Decision Tree. Dectext—Decision Tree Extractor,” PhD dissertation, Dept. of Computer Science and Eng., Lehigh Univ., citeseer.ist.psu.eduboz00converting.html , 2000.
- [29] Z.-H. Zhou, Y. Jiang, and S.-F. Chen, “Extracting Symbolic Rules from Trained Neural Network Ensembles,”
AI Comm., vol. 16, no. 1, pp. 3-15, 2003.- [30] U. Johansson, R. König, and L. Niklasson, “Rule Extraction from Trained Neural Networks Using Genetic Programming,”
Proc. Joint 13th Int'l Conf. Artificial Neural Networks and 10th Int'l Conf. Neural Information Processing (ICANN/ICONIP '03), pp. 13-16, 2003.- [31] U. Markowska-Kaczmar and W. Trelak, “Extraction of Fuzzy Rules from Trained Neural Network Using Evolutionary Algorithm,”
Proc. European Symp. Artificial Neural Networks (ESANN '03), pp. 149-154, 2003.- [32] U. Markowska-Kaczmar and M. Chumieja, “Discovering the Mysteries of Neural Networks,”
Int'l J. Hybrid Intelligent Systems, vol. 1, nos. 3-4, pp. 153-163, 2004.- [33] J. Rabuñal, J. Dorado, A. Pazos, J. Pereira, and D. Rivero, “A New Approach to the Extraction of ANN Rules and to Their Generalization Capacity through GP,”
Neural Computation, vol. 16, no. 47, pp. 1483-1523, 2004.- [34] F. Chen, “Learning Accurate and Understandable Rules from SVM Classifiers,” master's thesis, Simon Fraser Univ., 2004.
- [35] R. Setiono, B. Baesens, and C. Mues, “Risk Management and Regulatory Compliance: A Data Mining Framework Based on Neural Network Rule Extraction,”
Proc. Int'l Conf. Information Systems (ICIS), 2006.- [36] D. Martens, M. De Backer, R. Haesen, M. Snoeck, J. Vanthienen, and B. Baesens, “Classification with Ant Colony Optimization,”
IEEE Trans. Evolutionary Computation, vol. 11, no. 5, pp. 651-665, 2007.- [37] J.R. Quinlan,
C4.5 Programs for Machine Learning. Morgan Kaufmann, 1993.- [38] W.W. Cohen, “Fast Effective Rule Induction,”
Proc. 12th Int'l Conf. Machine Learning (ICML '95), A. Prieditis and S. Russell, eds., pp.115-123, 1995.- [39] P.-N. Tan, M. Steinbach, and V. Kumar,
Introduction to Data Mining. Addison Wesley, 2005.- [40] I.H. Witten and E. Frank,
Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, 2000.- [41] N. Barakat and J. Diederich, “Learning-Based Rule-Extraction from Support Vector Machines,”
Proc. 14th Int'l Conf. Computer Theory and Applications (ICCTA), 2004.- [42] M. Mannino and M. Koushik, “The Cost-Minimizing Inverse Classification Problem: A Genetic Algorithm Approach,”
Decision Support Systems, vol. 29, pp. 283-300, 2000.- [43] T. Van Gestel, B. Baesens, P.V. Dijcke, J. Suykens, J. Garcia, and T. Alderweireld, “Linear and Non-Linear Credit Scoring by Combining Logistic Regression and Support Vector Machines,”
J.Credit Risk, vol. 1, no. 4, 2006.- [44] T. Van Gestel, D. Martens, B. Baesens, D. Feremans, J. Huysmans, and J. Vanthienen, “Forecasting and Analyzing Insurance Companies' Ratings,”
Int'l J. Forecasting, vol. 23, no. 3, pp. 513-529, 2007.- [45] D. Cohn, L. Atlas, and R. Ladner, “Improving Generalization with Active Learning,”
Machine Learning, vol. 15, no. 2, pp. 201-221, 1994.- [46] T. Downs, K. Gates, and A. Masters, “Exact Simplification of Support Vector Solutions,”
J. Machine Learning Research, vol. 2, pp.293-297, 2001.- [47] M. Tipping, “Sparse Bayesian Learning and the Relevance Vector Machine,”
J. Machine Learning Research, vol. 1, pp. 211-244, citeseer.ist.psu.edutipping01sparse.html , 2001.- [48] S. Hettich and S.D. Bay,
The UCI KDD Archive, Dept. of Information and Computer Science, Univ. of California, http:/kdd.ics.uci.edu, 1996.- [49] C.-W. Hsu and C.-J. Lin, “A Comparison of Methods for Multi-Class Support Vector Machines,”
IEEE Trans. Neural Networks, vol. 13, pp. 415-425, 2002.- [50] T.G. Dietterich, “Approximate Statistical Test for Comparing Supervised Classification Learning Algorithms,”
Neural Computation, vol. 10, no. 7, pp. 1895-1923, 1998.- [51] N. Barakat and A. Bradley, “Rule Extraction from Support Vector Machines: Measuring the Explanation Capability Using the Area Under the ROC Curve,”
Proc. 18th Int'l Conf. Pattern Recognition (ICPR '06), vol. 2, pp. 812-815, 2006.- [52] T. Fawcett, “Prie: A System for Generating Rulelists to Maximize ROC Performance,”
Data Mining and Knowledge Discovery, vol. 17, no. 2, pp. 207-224, 2008.- [53] M. Saar-Tsechansky and F. Provost, “Decision-Centric Active Learning of Binary-Outcome Models,”
Information Systems Research, vol. 18, no. 1, pp. 4-22, 2007.- [54]
Credit Scoring and Its Applications, L. Thomas, D. Edelman, and J.Crook, eds. SIAM, 2002. |