This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Deterministic Annealing Approach for Parsimonious Design of Piecewise Regression Models
February 1999 (vol. 21 no. 2)
pp. 159-173

Abstract—A new learning algorithm is proposed for piecewise regression modeling. It employs the technique of deterministic annealing to design space partition regression functions. While the performance of traditional space partition regression functions such as CART and MARS is limited by a simple tree-structured partition and by a hierarchical approach for design, the deterministic annealing algorithm enables the joint optimization of a more powerful piecewise structure based on a Voronoi partition. The new method is demonstrated to achieve consistent performance improvements over regular CART as well as over its extension to allow arbitrary hyperplane boundaries. Comparison tests, on several benchmark data sets from the regression literature, are provided.

[1] H. Akaike, A New Look at the Statistical Model Identification IEEE Trans. Automatic Control, vol. 19, no. 6, pp. 716-723, 1974.
[2] R. Bellman and R. Roth, "Curve Fitting by Segmented Straight Lines," J. Am. Statistical Assoc., vol. 64, pp. 1,079-1,084, 1969.
[3] L. Breiman and J.H. Friedman, "Estimating Optimal Transformations for Multiple Regression," Computer Science and Statistics: Proc. 16th Symp. Interface, pp. 121-134, 1985.
[4] L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees.Belmont, Calif.: Wadsworth, 1984.
[5] J.M. Buhmann and H. Kühnel, "Vector Quantization with Complexity Costs," IEEE Trans Information Theory, vol. 39, pp. 1,133-1,145, July 1993.
[6] V. Cherkassky, Y. Lee, and H. Lari-Najafi, "Self-Organizing Network for Regression: Efficient Implementation and Comparative Evaluation," Proc. Int'l Joint Conf. Neural Networks, vol. 1, pp. 79-84, 1991.
[7] P. Chou,“Optimal partitioning for classification and regression trees,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, no. 4, pp. 340-354, Apr. 1991.
[8] P.A. Chou,T. Lookabaugh,, and R.M. Gray,“Optimal pruning with applications to tree-structured source coding and modeling,” IEEE Trans. Inform. Theory, vol. 35, no. 2, pp. 299-315, Mar. 1989.
[9] T.M. Cover, "Estimation by the Nearest Neighbor Rule," IEEE Trans. Inform. Theory, vol. 14, pp. 50-55, 1968.
[10] J.H. Friedman, "Multiple Adaptive Regression Splines," Ann. Stat. vol. 19, pp. 1-141, 1991.
[11] J.H. Friedman and W. Stuetzle, "Projection Pursuit Regression," J. Am. Statistical Assoc. vol. 76, pp. 817-823, 1981.
[12] A. Gersho and R.M. Gray, Vector Quantization and Signal Compression. Boston: Kluwer Academic, 1992.
[13] D. Harrison and D.L. Rubinfeld, "Hedonic Prices and the Demand for Clean Air," J. Environ. Economics&Management, vol. 5, pp. 81-102, 1978.
[14] B. Hassibi and D.G. Stork, "Second Order Derivative for Network Pruning: Optimal Brain Surgeon," Proc. NIPS5, 1993.
[15] G.E. Hinton and M. Revow, "Using Pairs of Data Points to Define Splits for Decision Trees," Advances in Neural Information Processing Systems, vol. 8, pp. 507-513, 1995.
[16] J.-H. Hwang, S.-R. Lay, M. Maechler, R.D. Martin, and J. Schimert, "Regression Modeling in Back-Propagation and Projection Pursuit Learning," IEEE Trans. Neural Networks, vol. 5, no. 3, pp. 342-353, 1994.
[17] T. Kohonen, "An Introduction to Neural Computing," Neural Networks, vol. 1, no. 1, pp. 3-16, 1988.
[18] Y. Linde, A. Buzo, R.M. Gray, An Algorithm for Vector Quantizer Design IEEE Trans. Comm., vol. 28, no. 1, pp. 84-95, 1980.
[19] W.-Y. Loh and N. Vanichsetakul, "Tree-Structured Classification via Generalized Discriminant Analysis (With Discussion)," J. Am. Statistical Assoc., vol. 83, no. 403, pp. 715-727, 1988.
[20] G.C. McDonald and R.C. Schwing, "Instabilities of Regression Estimates Relating Air Pollution to Mortality," Technometrics, vol. 15, pp. 463-482, 1973.
[21] D. Miller, A. Rao, K. Rose, and A. Gersho, "A Global Optimization Technique for Statistical Classifier Design," IEEE Trans. Signal Proc., vol. 44, no. 12, pp. 3,108-3,122, 1996.
[22] J. Moody and C.J. Darken, "Fast Learning in Networks of Locally-Tuned Processing Units," Neural Computation, vol. 1, no. 2, pp. 281-94, Summer 1989.
[23] A. Rao, D. Miller, K. Rose, and A. Gersho, "A Generalized VQ Method for Combined Compression and Estimation," Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 4, pp. 2,032-2035, 1996.
[24] A. Rao, D. Miller, K. Rose, and A. Gersho, "Mixture of Experts Regression Modeling by Deterministic Annealing," IEEE Trans. Signal Processing, vol. 45, no. 11, pp. 2,811-2,820, Nov. 1997.
[25] J. Rissanen, "Stochastic Complexity and Modeling," Ann. Stat., vol. 14, pp. 1,080-1,100, 1986.
[26] K. Rose, "A Mapping Approach to Rate-Distortion Computation and Analysis," IEEE Trans. Inform. Theory, vol. 40, pp. 1,939-1,952, 1994.
[27] K. Rose,E. Gurewitz,, and G.C. Fox,“Constrained clustering as an optimization method,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol.15, pp. 785-794, 1993.
[28] K. Rose, E. Gurewitz, and G.C. Fox, "Statistical Mechanics and Phase Transitions in Clustering," Phys. Rev. Lett., vol. 65, no. 8, pp. 945-948, 1990.
[29] K. Rose, E. Gurewitz, and G. Fox, "Vector Quantization by Deterministic Annealing," IEEE Trans Information Theory, vol. 38, no. 4, pp. 1249-1257, 1992.
[30] B. Silverman, "Density Estimation for Statistics and Data Analysis," Monographs on Statistics and Applied Probability.London: Chapman and Hall, 1986.
[31] G.R. Terrell and D.W. Scott, "Variable Kernel Density Estimation," Ann. Stat., vol. 20, no. 3, pp. 1,236-1,265, 1992.
[32] W.N. Venables and W.B. Ripley, Modern Applied Statistics With S-Plus.New York: Springer-Verlag, 1994.
[33] S.M. Weiss, R.S. Galen, and P.V. Tadepalli, "Optimizing the Predictive Value of Diagnostic Decision Rules," Proc. Nat'l Conf. Artificial Intelligence, AAAI, pp. 18.1.1-14,Seattle, 1987.
[34] X. Wu and K. Zhang, "A Better Tree-Structured Vector Quantizer," Proc. Data Compression Conf., pp. 392-401.Los Alamitos, Calif.: IEEE Computer Society Press, 1991.
[35] J. Zhao and J. Shawe-Taylor, "Neural Network Optimization for Good Generalization Performance," Proc. Int'l Conf. Artificial Neural Networks, pp. 561-564, 1994.

Index Terms:
Statistical regression, piecewise regression, deterministic annealing, parsimonious modeling, generalization, nearest-prototype models.
Citation:
Ajit V. Rao, David J. Miller, Kenneth Rose, Allen Gersho, "A Deterministic Annealing Approach for Parsimonious Design of Piecewise Regression Models," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 2, pp. 159-173, Feb. 1999, doi:10.1109/34.748824
Usage of this product signifies your acceptance of the Terms of Use.