This Article 
 Bibliographic References 
 Add to: 
Data Mining for Features Using Scale-Sensitive Gated Experts
December 1999 (vol. 21 no. 12)
pp. 1268-1279

Abstract—This article introduces a new tool for exploratory data analysis and data mining called Scale-Sensitive Gated Experts (SSGE) which can partition a complex nonlinear regression surface into a set of simpler surfaces (which we call features). The set of simpler surfaces has the property that each element of the set can be efficiently modeled by a single feedforward neural network. The degree to which the regression surface is partitioned is controlled by an external scale parameter. The SSGE consists of a nonlinear gating network and several competing nonlinear experts. Although SSGE is similar to the mixture of experts model of Jacobs et al. [10] the mixture of experts model gives only one partitioning of the input-output space, and thus a single set of features, whereas the SSGE gives the user the capability to discover families of features. One obtains a new member of the family of features for each setting of the scale parameter. In this paper, we derive the Scale-Sensitive Gated Experts and demonstrate its performance on a time series segmentation problem. The main results are: 1) the scale parameter controls the granularity of the features of the regression surface, 2) similar features are modeled by the same expert and different kinds of features are modeled by different experts, and 3) for the time series problem, the SSGE finds different regimes of behavior, each with a specific and interesting interpretation.

[1] M. Basseville and I.V. Nikiforov, Detection of Abrupt Changes: Theory and Application. Prentice Hall, 1993.
[2] T.M. Cover and J.A. Thomas, Elements of Information Theory. John Wiley&Sons, 1991.
[3] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis. Wiley, 1973.
[4] R. Durbin and D. Willshaw, “An Analogue Approach to the Travelling Salesman Problem Using an Elastic Net Method,” Nature, pp. 689-691, 1987.
[5] C. Fancourt and J. Principe, “A Neighborhood Map of Competing One Step Predictors for Piecewise Segmentation and Identification of Time Series,” Proc Int'l Conf. Neural Netorks, 1996.
[6] N. Gershenfeld, “Nonlinear Inference and Cluster-Weighted Modeling,” Proc. 1995 Florida Workshop Nonlinear Astronomy, vol. 1, pp. 1-6, 1995.
[7] S. Guiasu, Information Theory with Applications. McGraw-Hill, 1977.
[8] J. Hertz, A. Krogh, and R.G. Palmer, Introduction to the Theory of Neural Computation. Addison-Wesley, 1991.
[9] R.A. Jacobs and M.I. Jordan, “Learning Piecewise Control Strategies in a Modular Network Architecture,” IEEE Trans. Systems, Man, and Cybernetics, 1993.
[10] R.A. Jacobs, M.I. Jordan, S.J. Nowlan, and G.E. Hinton, “Adaptive Mixtures of Local Experts,” Neural Computation, vol. 3, pp. 79-87, 1991.
[11] M.I. Jordan and R.A. Jacobs, “Hierarchical Mixtures of Experts and the EM Algorithm,” Neural Computation, vol. 6, pp. 181-214, 1994.
[12] P. McCullagh and J.A. Nelder, Generalized Linear Models. London. Chapman and Hall, 1989.
[13] K. Pawelzik, J. Kohlmorgen, and K.-R. Müller, “Annealed Competition of Experts for a Segmentation and Classification of Switching Dynamics,” Neural Computation, vol. 8, no. 2 pp. 340-356, 1996.
[14] R.E. Quandt, “The Estimation of the Parameters of a Linear Regression System Obeying Two Separate Regimes,” J. Am. Statistical Assoc., pp. 873-880, 1958.
[15] C.R. Rao, Linear Statistical Inference and its Applications. New York: John Wiley and Sons, 1965.
[16] K. Rose, E. Gurewitz, and G.C. Fox, “Statistical Mechanics and Phase Transitions in Clustering, Physical Rev. Letters,” vol. 65, no. 8, pp. 945-948, 1990.
[17] D.E. Rumelhart, R. Durbin, R. Golden, and Y. Chauvin, “Backpropagation: The Basic Theory,” Backpropagation: Theory, Architectures, and Applications, Y. Chauvin and D.E. Rumelhart, eds., pp. 1-34, Hillsdale, N.J.: Lawrence Erlbaum Assoc., 1995.
[18] S. Shi, “Modeling the Temporal Structure of Time with Hidden Markov Experts,” PhD thesis, Dept. of Computer Science, Univ. of Colorado, 1998.
[19] L.W. Swokowski, Calculus with Analytic Geometry. Prindle Weber and Schmidt, 1984.
[20] F. Takens, “Detecting Strange Attractors in Turbulence,” Dynamical Systems and Turbulence, D.A. Rand, and L.S. Young, eds. Lecture Notes in Mathematics, vol. 898, pp. 366-381, Springer, 1981.
[21] Time Series Prediction: Forecasting the Future and Understanding the Past. A.S. Weigend and N.A. Gershenfeld, eds., Reading, Mass.: Addison-Wesley, 1994.
[22] A.S. Weigend, M. Mangeas, and A.N. Srivastava, “Nonlinear Gated Experts for Time Series: Discovering Regimes and Avoiding Overfitting,” Int'l J. Neural Systems, vol. 6, pp. 373-399, 1995.
[23] Y.F. Wong, “Clustering Data by Melting,” Neural Computation, vol. 5, no. 1, pp. 89-104, 1993.

Index Terms:
Mixture of experts, mixture model, classification and regression, time series segmentation, neural networks.
Ashok N. Srivastava, Renjeng Su, Andreas S. Weigend, "Data Mining for Features Using Scale-Sensitive Gated Experts," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 12, pp. 1268-1279, Dec. 1999, doi:10.1109/34.817407
Usage of this product signifies your acceptance of the Terms of Use.