This Article 
 Bibliographic References 
 Add to: 
Machine Learning: The State of the Art
November/December 2008 (vol. 23 no. 6)
pp. 49-55
Jue Wang, Chinese Academy of Sciences
Qing Tao, Chinese Academy of Sciences
The two fundamental problems in machine learning (ML) are statistical analysis and algorithm design. The former tells us the principles of the mathematical models that we establish from the observation data. The latter defines the conditions on which implementation of data models and data sets rely. A newly discovered challenge to ML is the Rashomon effect, which means that data are possibly generated from a mixture of heterogeneous sources. A simple classification standard can shed light on emerging forms of ML. This article is part of a special issue on AI in China.

1. H. Simon, "Why Should Machines Learn?" Machine Learning: An Artificial Intelligence Approach, R. Michalski, J. Carbonell, and T. Mitchell, eds., Tioga Press, 1983, pp. 25–38.
2. R. Solomonoff, "A New Method for Discovering the Grammars of Phrase Structure Language," Proc. Int'l Conf. Information Processing, Unesco, 1959, pp. 285–290.
3. E. Hunt, J. Marin, and P. Stone, Experiments in Induction, Academic Press, 1966.
4. L. Samuel, "Some Studies in Machine Learning Using the Game of Checkers, Part II," IBM J. Research and Development, vol. 11, no. 4, 1967, pp. 601–618.
5. J. Quinlan, "Induction of Decision Trees," Machine Learning, vol. 1, Mar. 1986, pp. 81–106.
6. Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, 1991.
7. F. Rosenblatt, The Perceptron: A Perceiving and Recognizing Automaton, tech. report 85-460-1, Aeronautical Lab., Cornell Univ., 1957.
8. R. Duda and P. Hart, Pattern Classification and Scene Analysis, John Wiley &Sons, 1973.
9. D.E. Rumelhart and J.L. McClelland, Parallel Distributed Processing, MIT Press, 1986.
10. L. Breiman, "Statistical Modeling: The Two Cultures," Statistical Science, vol. 16, no. 3, 2001, pp. 199–231.
11. V. Vapnik, The Nature of Statistical Learning Theory, Springer, 1995.
12. V. Vapnik and A. Chervonenkis, "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities," Theory of Probability and Applications, vol. 16, Jan. 1971, pp. 264–280.
13. A. Blumer et al., "Learnability and the Vapnik-Chervonenkis Dimension," J. ACM, vol. 36, no. 4, 1989, pp. 929–965.
14. J. Shawe-Taylor et al., "Structural Risk Minimization over Data-Dependent Hierarchies," IEEE Trans. Information Theory, vol. 44, no. 5, 1998, pp. 1926–1940.
15. Q. Tao, G.-W. Wu, and J. Wang, "A New Maximum Margin Algorithm for One-Class Problems and Its Boosting Implementation," Pattern Recognition, vol. 38, no. 7, 2005, pp. 1071–1077.
16. Q. Tao et al., "Posterior Probability Support Vector Machines for Unbalanced Data," IEEE Trans. Neural Networks, vol. 16, no. 6, 2005, pp. 1561–1573.
17. Q. Tao, G.-W. Wu, and J. Wang, "Learning Linear PCA with Convex Semi-Definite Programming," Pattern Recognition, vol. 40, no. 10, 2007, pp. 2633–2640.
18. Q. Tao, D.-J. Chu, and J. Wang, "Recursive Support Vector Machines for Dimensionality Reduction," IEEE Trans. Neural Networks, vol. 19, no. 1, 2008, pp. 189–193.
19. R. Schapire, "The Strength of Weak Learnability," Machine Learning, vol. 5, no. 2, 1990, pp. 197–227.
20. Y. Freund and R.E. Schapire, "A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting," J. Computer and System Sciences, vol. 55, no. 1, 1997, pp. 119–139.
21. R.E. Schapire et al., "Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods," Annals of Statistics, vol. 26, no. 5, 1998, pp. 1651–1686.
22. T. Zhang, "Statistical Behaviour and Consistency of Classification Methods Based on Convex Risk Minimization," Annals of Statistics, vol. 32, no. 1, 2004, pp. 56–85.
23. Y. Lin, "Support Vector Machines and the Bayes Rule in Classification," Data Mining and Knowledge Discovery, vol. 6, no. 3, 2002, pp. 259–275.
24. L. Valiant, "A Theory of Learnability," Comm. ACM, vol. 27, no. 11, 1984, pp. 1134–1142.
25. L. Breiman, "Prediction Games and Arcing Algorithms," Neural Computation, vol. 11, no. 7, 1999, pp. 1493–1517.
26. J. Friedman, T. Hastie, and R. Tibshirani, "Additive Logistic Regression: A Statistical View of Boosting," Annals of Statistics, vol. 28, no. 2, 2000, pp. 337–407.
27. I. Steinwart, Which Data-Dependent Bounds Are Suitable for SVM's? tech. report, Los Alamos Nat'l Lab., 2002;
28. T. Hastie and J. Zhu, "Comment," Statistical Science, vol. 21, no. 3, 2006, pp. 352–357.
29. M. Minsky and S. Parpert, Perceptron (expanded edition), MIT Press, 1988.
30. J. von Neumann, Mathematical Foundations of Quantum Mechanics, Princeton Univ. Press, 1932.
31. H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining, Kluwer Academic Publishers, 1998.
32. R. Tibshirani, "Regression Shrinkage and Selection via the Lasso," J. Royal Statistical Soc.: Series B, vol. 58, no. 1, 1996, pp. 267–288.
33. B. Efron et al., "Least Angle Regression," Annals of Statistics, vol. 32, no. 2, 2004, pp. 407–499.
34. H.L. Liang, W. Jue, and Y. YiYu, "User-Oriented Feature Selection for Machine Learning," Computer J., vol. 50, no. 4, 2007, pp. 421–434.
35. A. Blum and T. Mitchell, "Combining Labeled and Unlabeled Data with Co-training," Proc. 11th Ann. Conf. Computational Learning Theory, ACM Press, 1998, pp. 92–100.
36. R. Herbrich, T. Graepel, and K. Ober-Mayer, "Support Vector Learning for Ordinal Regression," Proc. 9th Int'l Conf. Artificial Neural Networks, IEEE Press, 1999, pp. 97–102.
37. S. Dzeroski and N. Lavrac, eds., Relational Data Mining, Springer, 2001.
38. H. Seung and D. Lee, "The Manifold Way of Perception," Science, vol. 290, no. 5500, 2000, pp. 2268–2269.
39. S. Roweis and L. Saul, "Nonlinear Dimensionality Reduction by Locally Linear Embedding," Science, vol. 290, no. 5500, 2000, pp. 2323–2326.
40. J. Tenenbaum, V.D. Silva, and J. Langford, "A Global Geometric Framework for Nonlinear Dimensionality Reduction," Science, vol. 290, no. 5500, 2000, pp. 2319–2323.
41. N. Shepard, C. Novland, and H. Jenkins, "Learning and Memorization of Classification," Psychological Monographs, vol. 75, no. 13, 1961, pp. 1–42.
42. M. Nosofsky, J. Palmeri, and C. McKinley, "Rule-Plus-Exception Model of Classification Learning," Psychological Rev., vol. 101, no. 1, 1994, pp. 53–79.
43. Y. Yao et al., "Rule + Exception Strategies for Security Information Analysis," IEEE Intelligent Systems, Sept./Oct. 2005, pp. 52–57.
44. J. Wang et al., "Multilevel Data Summarization from Information System: A 'Rule + Exception' Approach," AI Comm., vol. 16, no. 1, 2003, pp. 17–39.
45. E.P. Xing et al., "Distance Metric Learning, with Application to Clustering with Side-Information," Advances in NIPS, vol. 15, Jan. 2003, pp. 505–512.
46. T.G. Dietterich, R.H. Lathrop, and T. Lozano-Pérez, "Solving the Multiple-Instance Problem with Axis-Parallel Rectangles," Artificial Intelligence, vol. 89, nos. 1–2, 1997, pp. 31–71.
47. W. Zhu and F.-Y. Wang, "Reduction and Axiomization of Covering Generalized Rough Sets," Information Sciences, vol. 152, no. 1, 2003, pp. 217–230.

Index Terms:
machine learning, Rashomon effect, perceptron, nonlinear backpropagation, statistical analysis, algorithm design, feature selection, supervised learning, unsupervised learning, semisupervised learning, structural learning, symbolic learning methods, statistical learning methods, manifold learning, relational learning, learning to rank, rule + exception learning, metric learning, multi-instance learning
Jue Wang, Qing Tao, "Machine Learning: The State of the Art," IEEE Intelligent Systems, vol. 23, no. 6, pp. 49-55, Nov.-Dec. 2008, doi:10.1109/MIS.2008.107
Usage of this product signifies your acceptance of the Terms of Use.