|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| S. Ruggieri, "Efficient C4.5," IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 2, pp. 438-444, March/April, 2002. | |||
| BibTex | x | ||
| @article{ 10.1109/69.991727, author = {S. Ruggieri}, title = {Efficient C4.5}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {14}, number = {2}, issn = {1041-4347}, year = {2002}, pages = {438-444}, doi = {http://doi.ieeecomputersociety.org/10.1109/69.991727}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Knowledge and Data Engineering TI - Efficient C4.5 IS - 2 SN - 1041-4347 SP438 EP444 EPD - 438-444 A1 - S. Ruggieri, PY - 2002 KW - C4.5 KW - decision trees KW - inductive learning KW - supervised learning KW - data mining VL - 14 JA - IEEE Transactions on Knowledge and Data Engineering ER - | |||
We present an analytic evaluation of the runtime behavior of the C4.5 algorithm which highlights some efficiency improvements. Based on the analytic evaluation, we have implemented a more efficient version of the algorithm, called EC4.5. It improves on C4.5 by adopting the best among three strategies for computing the information gain of continuous attributes. All the strategies adopt a binary search of the threshold in the whole training set starting from the local threshold computed at a node. The first strategy computes the local threshold using the algorithm of C4.5, which, in particular, sorts cases by means of the
[1] K. Alsabti, S. Ranka, and V. Singh, “CLOUDS: Classification for Large Out-of-Core Datasets,” Proc. Int'l Conf. Knowledge Discovery and Data Mining, pp. 2-8, 1998.
[2] S.D. Bay, “UCI KDD Archive,” http:/kdd.ics.uci.edu, 1999.
[3] E. Keogh, C. Blake, and C.J. Merz, “UCI Repository of Machine Learning Databases,” http://www.ics.uci.edu/~mlearnMLRepository.html , 1998.
[4] T.H. Cormen,C.E. Leiserson, and R.L. Rivest,Introduction to Algorithms.Cambridge, Mass.: MIT Press/McGraw-Hill, 1990.
[5] T. Elomaa and J. Rousu, “General and Efficient Multisplitting of Numerical Attributes,” Machine Learning, vol. 36, no. 3, pp. 201-244, Sept. 1999.
[6] U.M. Fayyad and K.B. Irani,“On the handling of continuous-valued attributes in decision tree generation,” Machine Learning, vol. 8, pp. 87-102, 1992.
[7] T. Fukuda, Y. Morimoto, S. Morishira, and T. Tokuyama, Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules Proc. 22nd Int'l Conf. Very Large Databases, Dec. 1996.
[8] J. Gehrke, V. Ganti, R. Ramakrishnan, and W.-Y. Loh, “BOAT—Optimistic Decision Tree Construction,” Proc. ACM SIGMOD Int'l Conf. Management of Data, June 1999.
[9] J.E. Gehrke, R. Ramakrishnan, and V. Ganti, “RainForest—A Framework for Fast Decision Tree Construction of Large Datasets,” Data Mining and Knowledge Discovery, vol. 4, nos. 2 and 3, pp. 127-162, July 2000.
[10] S. Hong, "Use of Contextual Information for Feature Ranking and Discretization," IEEE Trans. Knowledge and Data Eng., vol. 9, no. 5, pp. 718-730, Sept./Oct. 1997.
[11] M. Joshi, G. Karypis, and V. Kumar, “ScalParC: A New Scalable and Efficient Parallel Classification Algorithm for Mining Large Datasets,” Proc. 1998 Int'l Parallel Processing Symp. and Symp. Parallel and Distributed Processing, pp. 573-579, 1998.
[12] T.S. Lim, W.Y. Loh, and Y.S. Shih, “A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Tree Old and New Classification Algorithms,” Machine Learning, vol. 40, no. 3, pp. 203-228, Sept. 2000.
[13] Rulequest Research Ltd, “C5. 0,” Online documentation,http:/www.rulequest.com, 1999.
[14] M. Mehta, R. Agrawal, and J. Rissanen, “SLIQ: A Fast Scalable Classifier for Data Mining,” Proc. Fifth Int'l Conf. Extending Database Technology, pp. 18-32, 1996.
[15] Quest Group, “Quest Synthetic Data Generation Code,” Online documentation,http://www.almaden.ibm.com/cs/questsyndata.html , 1999.
[16] J.R. Quinlan,"Induction of decision trees," Machine Learning, vol. 1, pp. 81-106, 1986.
[17] J.R. Quinlan, C4.5: Programs for Machine Learning,San Mateo, Calif.: Morgan Kaufman, 1992.
[18] J.R. Quinlan, “Improved Use of Continuous Attributes in C4.5,” J. Artificial Intelligence Research, vol. 4, pp. 77-90, 1996.
[19] R. Rastogi and K. Shim, “PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning,” Data Mining and Knowledge Discovery, vol. 4, no. 4, pp. 315-344, Oct. 2000.
[20] J. Shafer, R. Agrawal, and M. Mehta, “SPRINT: A Scalable Parallel Classifier for Data Mining,” Proc. 22th Int'l Conf. Very Large Databases, Sept. 1996.
[21] A. Srivastava, E.-H.(Sam) Han, V. Kumar, and V. Singh, “Parallel Formulations of Decision-Tree Classification Algorithms,” Data Mining and Knowledge Discovery, vol. 3, no. 3, pp. 237-261, Sept. 1999.
[22] A. Srivastava, V. Singh, E.H. Han, and V. Kumar, An Efficient Scalable Parallel Classifier for Data Mining, Technical Report TR-97-010, Dept. Computer Science, Univ. of Minnesota, Minneapolis, 1997.

