This Article 
 Bibliographic References 
 Add to: 
Growing and Pruning Neural Tree Networks
March 1993 (vol. 42 no. 3)
pp. 291-299

A pattern classification method called neural tree networks (NTNs) is presented. The NTN consists of neural networks connected in a tree architecture. The neural networks are used to recursively partition the feature space into subregions. Each terminal subregion is assigned a class label which depends on the training data routed to it by the neural networks. The NTN is grown by a learning algorithm, as opposed to multilayer perceptrons (MLPs), where the architecture must be specified before learning can begin. A heuristic learning algorithm based on minimizing the L1 norm of the error is used to grow the NTN. It is shown that this method has better performance in terms of minimizing the number of classification errors than the squared error minimization method used in backpropagation. An optimal pruning algorithm is given to enhance the generalization of the NTN. Simulation results are presented on Boolean function learning tasks and a speaker independent vowel recognition task. The NTN compares favorably to both neural networks and decision trees.

[1] D. Rumelhart and J. McClelland,Parallel Distributed Processing. Cambridge, MA: M.I.T. Cambridge Press, 1986.
[2] P. Werbos, "Beyond regression: New tools for prediction and analysis in the behavioral sciences," Ph.D. dissertation, Harvard Univ., 1974.
[3] D. Parker, "Learning logic," Tech. Rep. TR-47, M.I.T., Center for Computational Research in Economics and Management Science, 1985.
[4] L. Breiman, J. Friedman, R. Olshen, and C. Stone,Classification and Regression Trees. Belmont, CA: Wadsworth International group, 1984.
[5] R. O. Duda and P. E. Hart,Pattern Classification and Scene Analysis. New York: Wiley, 1973.
[6] K. Fukunaga,Introduction to Statistical Pattern Recognition. New York: Academic, 1972.
[7] A. Waibel, "Modular construction of time delay neural networks for speech recognition,"Neural Computation, vol. 1, Mar. 1989.
[8] A. Rajavelu, M. Musavi, and M. Shirvaikar, "A neural network approach to character recognition,"Neural Networks, vol. 2, no. 5, pp. 387-394, 1989.
[9] Y. Le Cun et al., "Handwritten ZIP Code Recognition with Multilayer Networks,"Proc. 10th Int'l Conf. Pattern Recognition, Vol. 2, IEEE CS Press, Los Alamitos, Calif., Order No. 2063, 1990, pp. 35-40.
[10] S. Judd, "Learning in networks is hard," inProc. IEEE First Int. Conf. Neural Networks, vol. 2, June 1987, pp. 685-692.
[11] T. M. Cover, "Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition,"IEEE Trans. Electron. Comput., 1965.
[12] E. Baum, "On the capabilities of multilayer perceptrons,"J. Complexity, vol. 4, pp. 193-215, Sept. 1988.
[13] L. Hyafil and R. Rivest, "Constructing optimal decision trees is NP-complete,"Inform. Processing Lett., Vol. 5, no. 1, pp. 15-17, 1976.
[14] S. Amari, "A theory of adaptive pattern classifiers,"IEEE Trans. Electron. Comput., vol. EC-16, pp. 299-307, June 1967.
[15] A. Sankar and R. Mammone, "A new fast learning algorithm for feedforward neural networks using the L1 norm of the error," Tech. Rep. CAIP TR-115, CAIP Center, Rutgers Univ., 1990.
[16] M. Frean, "Small nets and short paths: Optimising neural computation," Ph.D. dissertation, Univ. Edinburgh, 1990.
[17] B. Juang, personal communication.
[18] A. Sankar and R. Mammone, "Neural tree networks," inNeural Network: Theory and Applications, R. Mammone and Y. Zeevi, Eds. New York: Academic, 1991, pp. 281-302.
[19] S. Kirkpatrick, C. Gelatt, Jr., and M. Vecchi, "Optimization by simulated annealing,"Science, vol. 220, May 1983.
[20] A. Sankar and R. Mammone, "Optimal pruning of neural tree networks for improved generalization," inProc. IJCNN, July 1991, pp. II-219-II-224.
[21] J. R. Quinlan, "Induction of decision trees,"Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.
[22] P. Utgoff, "Perceptron trees: A case study in hybrid concept representation," inProc. Seventh Nat. Conf. Artif. Intell., St. Paul, MN, Morgan-Kaufman, 1988.
[23] J. Flanagan,Speech Analysis Synthesis and Perception. Berlin, Germany: Springer-Verlag, 1972.
[24] L. Rabiner and R. Schafer,Digital Processing of Speech Signals. Englewood Cliffs, NJ: Prentice-Hall, 1978.
[25] R. P. Lippmann, "Review of neural networks for speech recognition,"Neural Computation, vol. 1, no. 1, pp. 1-38, 1989.
[26] K. Unnikrishnan, J. Hopfield, and D. Tank, "Connected-digit speakerdependent speech recognition using a neural network with time-delayed connections,"IEEE Trans. Signal Processing, vol. 39, pp. 698-713, Mar. 1991.
[27] K. Unnikrishnan, J. Hopfield, and D. Tank, "Speaker-independent digit recognition using a neural network with time-delayed connections,"Neural Computation, vol. 4, no. 1, 1991.
[28] A. C. Tsoi and R. Pearson, "Comparison of three classification techniques, CART, C4.5, and multi-layer perceptrons," presented at the NIPS Post Conference Workshop, Denver, CO, 1990.
[29] L. Atlas, R. Cole, Y. Muthusamy, A. Lippman, J. Connor, D. Park, M. El-Sharkawi, and R. M. II, "A performance comparison of trained multi-layer perceptrons and trained classification trees,"Proc. IEEE, vol. 78, Oct. 1990.
[30] A. Robinson, "Dynamic error propagation networks," Ph.D. dissertation, Cambridge Univ. Eng. Dep., 1989.

Index Terms:
neural tree networks; pattern classification method; feature space; class label; learning algorithm; classification errors; optimal pruning algorithm; function learning tasks; speaker independent vowel recognition; learning (artificial intelligence); pattern recognition; self-organising feature maps; trees (mathematics).
A. Sakar, R.J. Mammone, "Growing and Pruning Neural Tree Networks," IEEE Transactions on Computers, vol. 42, no. 3, pp. 291-299, March 1993, doi:10.1109/12.210172
Usage of this product signifies your acceptance of the Terms of Use.