The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - May (2008 vol.20)
pp: 589-600
ABSTRACT
We present a method for explaining predictions for individual instances. The presented approach is general and can be used with all classification models that output probabilities. It is based on decomposition of a model's predictions on individual contributions of each attribute. Our method works for so called black box models such as support vector machines, neural networks, and nearest neighbor algorithms as well as for ensemble methods, such as boosting and random forests. We demonstrate that the generated explanations closely follow the learned models and present a visualization technique which shows the utility of our approach and enables the comparison of different prediction methods.
INDEX TERMS
Machine learning, Data mining, Data and knowledge visualization, Visualization techniques and methodologies
CITATION
Marko Robnik-Šikonja, "Explaining Classifications For Individual Instances", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 5, pp. 589-600, May 2008, doi:10.1109/TKDE.2007.190734
REFERENCES
[1] C.E. Shannon, “A Mathematical Theory of Communications,” The Bell System Technical J., vol. 27, pp. 379-423, 623-656, 1948.
[2] I.J. Good, Probability and the Weighing of Evidence. Charles Griffin and Company, 1950.
[3] D.W. Hosmer and S. Lemeshow, Applied Logistic Regression, second ed. Wiley, 2000.
[4] A. Tversky and D. Kahneman, “Judgement under Uncertainty: Heuristics and Biases,” Science, vol. 185, pp. 1124-1130, 1974.
[5] T.M. Mitchell, Machine Learning. McGraw-Hill, 1997.
[6] I. Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,” J. Machine Learning Research, vol. 3, pp. 1157-1182, 2003.
[7] M. Robnik-Šikonja, A. Likas, C. Constantinopoulos, and I. Kononenko, “An Efficient Method for Explaining the Decisions of the Probabilistic RBF Classification Network,” technical report, Univ. of Ljubljana (FRI) and Univ. of Ioannina (DCS), 2007.
[8] A. Niculescu-Mizil and R. Caruana, “Predicting Good Probabilities with Supervised Learning,” Proc. 22nd Int'l Conf. Machine Learning (ICML '05), L. De Raedt and S. Wrobel, eds., 2005.
[9] G.G. Towell and J.W. Shavlik, “Extracting Refined Rules from Knowledge-Based Neural Networks,” Machine Learning, vol. 13, no. 1, pp. 71-101, 1993.
[10] M. Craven and J.W. Shavlik, “Using Sampling and Queries to Extract Rules from Trained Neural Networks,” Proc. Int'l Conf. Machine Learning (ICML '94), pp. 37-45, 1994.
[11] R. Setiono and H. Liu, “Understanding Neural Networks via Rule Extraction,” Proc. Int'l Joint Conf. Artificial Intelligence (IJCAI '95), pp. 480-487, 1995.
[12] S. Thrun, “Extracting Rules from Artificial Neural Networks with Distributed Representations,” Advances in Neural Information Processing Systems (NIPS '95), G. Tesauro, D. Touretzky, and T.Leen, eds., vol. 7, pp. 505-512, 1995.
[13] M.W. Craven and J.W. Shavlik, “Extracting Tree-Structured Representations of Trained Networks,” Advances in Neural Information Processing Systems (NIPS '96), D.S. Touretzky, M.C.Mozer, and M.E. Hasselmo, eds., vol. 8, pp. 24-30, 1996.
[14] V. Palade, C.-D. Neagu, and R.J. Patton, “Interpretation of Trained Neural Networks by Rule Extraction,” Proc. Seventh Int'l Conf. Fuzzy Days Computational Intelligence, Theory and Applications, pp.152-161, 2001.
[15] A.S. d'Avila Garcez, K. Broda, and D.M. Gabbay, “Symbolic Knowledge Extraction from Trained Neural Networks: A Sound Approach,” Artificial Intelligence, vol. 125, nos. 1-2, pp. 155-207, 2001.
[16] K. Främling, “Explaining Results of Neural Networks by Contextual Importance and Utility,” Proc. Artificial Intelligence and Simulation of Behaviour Conf. (AISB), 1996.
[17] R. Andrews, J. Diederich, and A.B. Tickle, “Survey and Critique of Techniques for Extracting Rules from Trained Artificial Neural Networks,” Knowledge-Based Systems, vol. 8, no. 6, pp. 373-384, 1995.
[18] H. Jacobsson, “Rule Extraction from Recurrent Neural Networks: A Taxonomy and Review,” Neural Computation, vol. 17, no. 6, pp.1223-1263, 2005.
[19] J. Lubsen, J. Pool, and E. van der Does, “A Practical Device for the Application of a Diagnostic or Prognostic Function,” Methods of Information in Medicine, vol. 17, pp. 127-129, 1978.
[20] A. Jakulin, M. Možina, J. Demšar, I. Bratko, and B. Zupan, “Nomograms for Visualizing Support Vector Machines,” Proc. ACM SIGKDD '05, R. Grossman, R. Bayardo, and K.P.Bennett,eds., pp. 108-117, 2005.
[21] D. Caragea, V. Cook, and D. Honavar, “Towards Simple, Easy-to-Understand, Yet Accurate Classifiers,” Proc. Third IEEE Int'l Conf. Data Mining (ICDM '03), pp. 497-500, 2003.
[22] F. Poulet, “SVM and Graphical Algorithms: A Cooperative Approach,” Fourth IEEE Int'l Conf. Data Mining (ICDM '04), pp.499-502, 2004.
[23] L. Hamel, “Visualization of Support Vector Machines with Unsupervised Learning,” Proc. IEEE Symp. Computational Intelligence in Bioinformatics and Computational Biology, 2006.
[24] I. Kononenko, “Inductive and Bayesian Learning in Medical Diagnosis,” Applied Artificial Intelligence, vol. 7, no. 4, pp. 317-337, 1993.
[25] B. Becker, R. Kohavi, and D. Sommereld, “Visualizing the Simple Bayesian Classier,” Proc. KDD Workshop Issues in the Integration of Data Mining and Data Visualization, 1997.
[26] M. Možina, J. Demšar, M.W. Kattan, and B. Zupan, “Nomograms for Visualization of Naive Bayesian Classifier,” Proc. Eighth European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD '04), J.-F. Boulicaut, F. Esposito, F. Giannotti, and D. Pedreschi, eds., pp. 337-348, 2004.
[27] B. Poulin, R. Eisner, D. Szafron, P. Lu, R. Greiner, D.S. Wishart, A. Fyshe, B. Pearcy, C. Macdonell, and J. Anvik, “Visual Explanation of Evidence with Additive Classifiers,” Proc. 21st Nat'l Conf. Artificial Intelligence (AAAI), 2006.
[28] L. Breiman, “Random Forests,” Machine Learning J., vol. 45, pp. 5-32, 2001.
[29] D. Madigan, K. Mosurski, and R.G. Almond, “Graphical Explanation in Belief Networks,” J. Computational and Graphical Statistics, vol. 6, no. 2, pp. 160-181, 1997.
[30] T.F. Liao, Interpreting Probability Models: Logit, Probit, and Other Generalized Linear Models. Sage Publications, 1994.
[31] J.M. Robins, M.A. Hernán, and B. Brumback, “Marginal Structural Models and Causal Inference in Epidemiology,” Epidemiology, vol. 11, no. 5, pp. 550-560, 2000.
7 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool