The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2010 vol.22)
pp: 365-380
Huiwen Zeng , Synopsys, Portland
ABSTRACT
Reducing the dimensionality of a classification problem produces a more computationally-efficient system. Since the dimensionality of a classification problem is equivalent to the number of neurons in the first hidden layer of a network, this work shows how to eliminate neurons on that layer and simplify the problem. In the cases where the dimensionality cannot be reduced without some degradation in classification performance, we formulate and solve a constrained optimization problem that allows a trade-off between dimensionality and performance. We introduce a novel penalty function and combine it with bilevel optimization to solve the constrained problem. The performance of our method on synthetic and applied problems is superior to other known penalty functions such as weight decay, weight elimination, and Hoyer's function. An example of dimensionality reduction for hyperspectral image classification demonstrates the practicality of the new method. Finally, we show how the method can be extended to multilayer and multiclass neural network problems.
INDEX TERMS
Pruning, neural networks, penalty function, mixed-norm penalty.
CITATION
Huiwen Zeng, "Constrained Dimensionality Reduction Using a Mixed-Norm Penalty Function with Neural Networks", IEEE Transactions on Knowledge & Data Engineering, vol.22, no. 3, pp. 365-380, March 2010, doi:10.1109/TKDE.2009.107
REFERENCES
[1] G.E. Hinton, “Connectionist Learning Procedures,” Artificial Intelligence, vol. 40, no. 1, pp. 143-150, 1989.
[2] E.B. Baum, “What Size of Neural Net Gives Valid Generalization?” Neural Computation, vol. 1, no. 1, pp. 51-160, 1989.
[3] J.K. Kruschke and J.R. Movellan, “Benefits of Gain: Speeded Learning and Minimal Hidden Layers in Back-Propagation Networks,” IEEE Trans. Systems Man and Cybernetics, vol. 21, no. 1, pp. 273-280, Jan./Feb. 1991.
[4] C.M. Bishop, Neural Networks for Pattern Recognition. Oxford Univ. Press, 1995.
[5] A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis. Wiley, 2001.
[6] I. Koch and K. Naito, “Dimension Selection for Feature Selection and Dimension Reduction with Principal and Independent Component Analysis,” Neural Computation, vol. 19, pp. 513-545, 2007.
[7] J.B. Tenenbaum, V. de Silva, and J.C. Langford, “A Global Geometric Framework for Nonlinear Dimensionality Reduction,” Science, vol. 290, no. 5500, pp. 2319-2323, Dec. 2000.
[8] I. Borg and J. Lingoes, Multidimensional Similarity Structure Analysis. Springer-Verlag, 1987.
[9] K. Fukunaga, Introduction to Statistical Pattern Recognition, second ed. Academic Press, 1990.
[10] Y. LeCun, J. Denker, S. Solla, R.E. Howard, and L.D. Jackel, “Optimal Brain Damage,” Advances in Neural Information Processing Systems, D.S. Touretzky, ed., vol. 2, pp. 598-605, Morgan Kaufmann, 1990.
[11] B. Hassibi and D.G. Stork, “Second Order Derivatives for Network Pruning: Optimal Brain Surgeon,” Advances in Neural Information Processing Systems, S.J. Hanson, J.D. Cowan, C.L. Giles, eds., vol. 5, pp. 164-171, Morgan Kaufmann, 1993.
[12] A.S. Weigend, D.E. Rumelhart, and B.A. Huberman, “Generalization by Weight Elimination with Application to Forecasting,” Advances in Neural Information Processing Systems, R.P.Lippmann, J.E. Moody, and D.S. Touretzky, eds., vol. 3, pp.875-882, Morgan Kaufmann, 1991.
[13] J. Moody and T. R¨gnvaldsson, “Smoothing Regularizers for Projective Basis Function Networks,” Advances in Neural Information Processing Systems, M.C. Mozer, M.I. Jordan, and T. Petsche, eds., vol. 9, pp. 585-591, MIT Press, 1997.
[14] P.O. Hoyer, “Non-Negative Matrix Factorization with Sparseness Constraints,” J. Machine Learning Research, vol. 5, no. 9, pp. 1457-1469, 2004.
[15] J. Sietsma and R.J.F. Dow, “Neural Net Pruning—Why and How?” Proc. IEEE Int'l Conf. Neural Network, vol. 1, pp. 325-332, 1988.
[16] G. Castellano, A.M. Fanelli, and M. Pelillo, “An Iterative Pruning Algorithm for Feedforward Neural Networks,” IEEE Trans. Neural Networks, vol. 8, no. 3, pp. 519-531, May 1997.
[17] M.Y. Chow and J. Teeter, “An Analysis of Weight Decay As a Methodology of Reducing Three-Layer Feedforward Artificial Neural Networks for Classification Problems,” Proc. IEEE Int'l Conf. Neural Network, pp. 600-605, 1994.
[18] R. Fletcher, Practical Methods of Optimization, second ed. John Wiley Sons, 1987.
[19] N. Alexandrov and J.E. Dennis, “Algorithms for Bilevel Optimization,” Proc. AIAA/NASA/USAF/ISSMO Symp. Multidisciplinary Analysis and Optimization, pp. 810-816, 1994.
[20] http://color.psych.upenn.edu/hyperspectral/ bearfruitgraybearfruitgray.html, 2009.
[21] P.M. Williams, “Bayesian Regularisation and Pruning Using a Laplace Prior,” technical report, School of Cognitive and Computing Sciences, Univ. of Sussex, 1994.
[22] http://www.mathworks.com/access/helpdesk/ help/toolboxoptim/, 2009.
[23] R. Fletcher and M.J.D. Powell, “A Rapidly Convergent Descent Method for Minimization,” Computer J., vol. 6, no. 2, pp. 163-168, 1963.
[24] D. Goldfarb, “A Family of Variable Metric Updates Derived by Variational Means,” Math. of Computing, vol. 24, no. 109, pp. 23-26, 1970.
[25] S.P. Han, “A Globally Convergent Method for Nonlinear Programming,” J. Optimization Theory and Applications, vol. 22, no. 3, pp. 297-309, 1977.
[26] M.J.D. Powell, “A Fast Algorithm for Nonlinearly Constrained Optimization Calculations,” Numerical Analysis, G.A.Watson, ed., pp. 144-157, Springer-Verlag, 1978.
[27] K. Hornik, M. Stinchcombe, and H. White, “Universal Approximation of an Unknown Mapping and Its Derivatives Using Multilayer Feedforward Networks,” Neural Networks, vol. 3, no. 5, pp. 551-560, 1990.
[28] E.D. Sontag, “Feedback Stabilization Using Two-Hidden Layer Nets,” IEEE Trans. Neural Networks, vol. 3, no. 6, pp. 981-990, Nov. 1992.
[29] H. Zeng, “Dimensionality Reduction and Feature Selection Using a Mixed-Norm Penalty Function,” PhD thesis, Electrical Eng. Dept., North Carolina State Univ., 2005.
[30] H. Zeng and H.J. Trussell, “Feature Selection Using a Mixed-Norm Penalty Function,” Proc. IEEE Int'l Conf. Image Processing, Oct. 2006.
30 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool