This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Fourier Spectrum-Based Approach to Represent Decision Trees for Mining Data Streams in Mobile Environments
February 2004 (vol. 16 no. 2)
pp. 216-229

Abstract—This paper presents a novel Fourier analysis-based approach to combine, transmit, and visualize decision trees in a mobile environment. Fourier representation of a decision tree has several interesting properties that are particularly useful for mining data streams from small mobile computing devices connected through limited-bandwidth wireless networks. This paper presents algorithms to compute the Fourier spectrum of a decision tree and outlines a technique to construct a decision tree from its Fourier spectrum. It offers a framework to aggregate decision trees in their Fourier representations. It also describes the MobiMine, a mobile data stream mining system, that uses the developed techniques for mining stock-market data from handheld devices.

[1] M. Ankerst, C. Elsen, M. Ester, and H. Kriegel, Visual Classification: An Interactive Approach to Decision Tree Construction Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 392-396, 1999.
[2] A. Azoff, Neural Network Time Series Forecasting of Financial Markets. New York: Wiley, 1994.
[3] N. Baba and M. Kozaki, An Intelligent Forecasting System of Stock Price Using Neural Networks Proc. Int'l Joint Conf. Neural Networks (IJCNN), pp. 371-377, 1992.
[4] L. Breiman, Bias, Variance and Arcing Classifiers Technical Report 460, Statistics Dept., Univ. of California at Berkeley, 1996.
[5] L. Breiman, Pasting Small Votes for Classification in Large Databases and On-Line Machine Learning, vol. 36, nos. 1-2, pp. 85-103, 1999.
[6] L. Breiman, J.H. Freidman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees. Belmont, Calif.: Wadsworth, 1984.
[7] J. Campbell, A. Lo, and A. MacKinley, The Econometrics of Financial Markets. Princeton Univ. Press, 1997.
[8] P. Domingos and G. Hulten, Mining High-Speed Data Streams Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 71-80 Aug. 2000.
[9] W. Fan, S. Stolfo, and J. Zhang, The Application of Adaboost for Distributed, Scalable and On-Line Learning Proc. Fifth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 362-366, 1999.
[10] J. Gehrke, V. Ganti, R. Ramakrishnan, and W. Loh, BOAT Optimistic Decision Tree Construction Proc. SIGMOD ACM, pp. 169-180, 1999.
[11] D. Goldberg, Genetic Algorithms and Walsh Functions: Part I, a Gentle Introduction Complex Systems, vol. 3, no. 2, pp. 129-152, 1989.
[12] W. Hoeffding, Probability Inequalities for Sums of Bounded Random Variables J. Am. Statistical Assoc., vol. 58, pp. 13-30, 1963.
[13] G. Jang, F. Lsi, and T. Parng, Intelligent Stock Trading Decision Support System Using Dual Adaptive-Structure Neural Networks J. Information Science Eng., vol. 9, pp. 271-297, 1993.
[14] H. Kargupta, B. Park, D. Hershberger, and E. Johnson, Collective Data Mining: A New Perspective towards Distributed Data Mining Advances in Distributed and Parallel Knowledge Discovery, H. Kargupta and P. Chan, eds., pp. 133-184, AAAI/MIT Press, 2000.
[15] H. Kargupta, B. Park, S. Pittie, L. Liu, D. Kushraj, and K. Sarkar, Mobimine: Monitoring the Stock Market from a PDA ACM SIGKDD Explorations, vol. 3, pp. 37-47, 2001.
[16] H. Kargupta, Fourier Representation for Meta-Level Analysis of Data Mining Models Comm., 2002.
[17] S. Kushilevitz and Y. Mansour, Learning Decision Trees Using Fourier Spectrum Proc. 23rd Ann. ACM Symp. Theory of Computing, pp. 455-464, 1991.
[18] N. Linial, Y. Mansour, and N. Nisan, Constant Depth Circuits, Fourier Transform, and Learnability J. ACM, vol. 40, pp. 607-620, 1993.
[19] R. Maclin and D. Opitz, An Empirical Evaluation of Bagging and Boosting Proc. 14th Int'l Conf. Artificial Intelligence, pp. 546-551, 1997.
[20] D. Margineantu and T. Dietterich, Pruning Adaptive Boosting Proc. 14th Int'l Conf. Machine Learning, pp. 211-218, 1997.
[21] C. Merz and M. Pazzani, A Principal Components Approach to Combining Regression Estimates Machine Learning, vol. 36, nos. 1-2, pp. 9-32, 1999.
[22] B. Park, Knowledge Discovery from Heterogeneous Data Streams Using Fourier Spectrum of Decision Trees PhD thesis, Washington State Univ., 2001.
[23] B. Park, R. Ayyagari, and H. Kargupta, A Fourier Analysis-Based Approach to Learn Classifier from Distributed Heterogeneous Data Proc. First SIAM Int'l Conf. Data Mining, 2001.
[24] M. Perrone and L. Cooper, When Networks Disagree: Ensemble Method for Neural Networks Neural Networks for Speech and Image Processing, R.J. Mammone, ed., Chapman-Hall, 1993.
[25] A. Prodromidis, S. Stolfo, and P. Chan, Pruning Classifiers in a Distributed Meta-Learning System Proc. First Nat'l Conf. New Information Technologies, pp. 151-160, 1998.
[26] J.R. Quinlan, Induction of Decision Trees Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.
[27] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kauffman, 1993.
[28] J.R. Quinlan, Bagging, Boosting and C4.5 Proc. AAAI '96 Nat'l Conf. Artificial Intelligence, pp. 725-730, 1996.
[29] J. Schlimmer and R. GrangerJr., Beyond Incremental Processing: Tracking Concept Drift Proc. AAAI, vol. 1, pp. 502-507, 1986.
[30] W.N. Street and Y. Kim, A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification Proc. Seventh ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 377-382, 2001.
[31] P. Utgoff, Incremental Induction of Decision Trees Machine Learning, vol. 4, pp. 161-186, 1989.
[32] P. Utgoff, An Improved Algorithm for Incremental Induction of Decision Trees Proc. 11th Int'l Conf. Machine Learning, pp. 318-325, 1994.
[33] J. Zirilli, Financial Prediction Using Neural Networks. Int'l Thomson Computer Press, 1997.

Index Terms:
Mobile data mining, decision trees, Fourier spectrum.
Citation:
Hillol Kargupta, Byung-Hoon Park, "A Fourier Spectrum-Based Approach to Represent Decision Trees for Mining Data Streams in Mobile Environments," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 2, pp. 216-229, Feb. 2004, doi:10.1109/TKDE.2004.1269599
Usage of this product signifies your acceptance of the Terms of Use.