This Article 
 Bibliographic References 
 Add to: 
First-Order Tree-Type Dependence between Variables and Classification Performance
February 2001 (vol. 23 no. 2)
pp. 233-239

Abstract—Structuralization of the covariance matrix reduces the number of parameters to be estimated from the training data and does not affect an increase in the generalization error asymptotically as both the number of dimensions and training sample size grow. A method to benefit from approximately correct assumptions about the first order tree dependence between components of the feature vector is proposed. We use a structured estimate of the covariance matrix to decorrelate and scale the data and to train a single-layer perceptron in the transformed feature space. We show that training the perceptron can reduce negative effects of inexact a priori information. Experiments performed with 13 artificial and 10 real world data sets show that the first-order tree-type dependence model is the most preferable one out of two dozen of the covariance matrix structures investigated.

[1] J.J. Atick and A.N. Redlich, “Towards a Theory of Early Visual Processing,” Neural Computation, vol. 2, pp. 308-320, 1990.
[2] C.K. Chow and C.N. Liu, “Approximating Discrete Probability Distributions with Dependence Trees,” IEEE Trans. Information Theory, vol. 14, pp. 462-467, 1968.
[3] A.D. Deev, “Representation of Statistics of Discriminant Analysis and Asymptotic Expansions in Dimensionalities Comparable with Sample Size,” Reports of Academy of Sciences of the USSR, vol. 195, no. 4, pp. 756-762, 1970 (in Russian).
[4] A.D. Deev, “Asymptotic Expansions for Distributions of StatisticsW,M,W* in Discriminant Analysis,” Statistical Methods of Classification, J.N. Blagoveshenskij, ed., vol. 31, pp. 6-57, Moscow: Moscow Univ. Press, 1972 (in Russian).
[5] A.D. Deev, “Discriminant Function Designed on Independent Blocks of Variables,” Eng. Cybernetics (Proc. Academy of Sciences of the USSR), no. 12, pp. 153-156, 1974 (in Russian).
[6] J.M. Friedman, “Regularized Discriminant Analysis,” J. Am. Statistical Assoc., vol. 84, pp. 165-175, 1989.
[7] K. Fukunaga, Introduction to Statistical Pattern Recognition, second edition. Academic Press, 1990.
[8] S. Halkaaer and O. Winter, “The Effect of Correlated Input Data on the Dynamics of Learning,” Advances in Neural Information Processing Systems, M.C. Mozer, M.I. Jordan, and T. Petsche, eds., vol. 9, pp. 169-175, Cambridge, Mass.: MIT Press, A Radford Book, 1996.
[9] I.B. Kruskal Jr., “On the Shortest Spanning Subtree of a Graph and the Travelling Salesman Problem,” Proc. Am. Math. Soc., vol. 7, pp. 48-50, 1956.
[10] Y. le Cun, I. Kanter, and S. Solla, “Eigenvalues of Covariance Matrices: Application to Neural-Network Learning,” Physical Review Letters, vol. 66, no. 18, pp. 2396-2399, 1991.
[11] G.J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition. New York: Wiley, 1992.
[12] L.D. Meshalkin, “Assignment of Numerical Values to Nominal Variables,” Statistical Problems Control, S. Raudys and L. Meshalkin, eds., vol. 14, pp. 49-56, Vilnius: Inst. of Math. and Cybernetics Press, 1976 (in Russian).
[13] L.D. Meshalkin and V.I. Serdobolskij, “Errors in Classifying Multivariate Observations,” Theory of Probabilities and Its Applications, vol. 23, no. 4, pp. 772-781, 1978 (in Russian).
[14] D. Morgera and D.B. Cooper, “Structurized Estimation: Sample Size Reduction for Adaptive Pattern Classification,” IEEE Trans. Information Theory, vol. 23, pp. 728-741, 1977.
[15] R. Prochorskas, V. Ziuznis, and N. Misiuniene, “Use of Different Classifiers to Predict Outcomes of Heart Attacks,” Problems of Ischemic Heart Diseases, pp. 216-267, Vilnius, Lithuania: Mokslas Publishing House, 1976 (in Russian).
[16] S. Raudys, “On Determining Training Sample Size of Linear Classifier,” Computing Systems, N.G. Zagoruiko ed., vol. 28, pp. 79-87, Inst. of Math. Press, Novosibirsk: Nauka, 1967 (in Russian).
[17] S. Raudys, “On the Amount of a priori Information in Designing the Classification Algorithm,” Eng. Cybernetics (Proc. Academy of Sciences of the USSR), no. 4, pp. 168-174, 1972 (in Russian).
[18] S. Raudys, “Methods to Overcome Dimensionality Problems in Statistical Pattern Recognition: A Review,” Zavodskaya Laboratorya (Factory Lab., Interdisciplinary USSR J.), no. 3, pp. 45&49-55, Moscow: Nauka, 1991 (in Russian).
[19] S. Raudys, “Evolution and Generalization of a Single Neurone: I. Single-layer Perception as Seven Statistical Classifiers,” Neural Networks, vol. 11, no. 2, pp. 283-296, 1998.
[20] S. Raudys, “Scaled Rotation Regularization,” Pattern Recognition, vol. 33, pp. 1989-1998, 2000.
[21] S. Raudys and S. Amari, “Effect of Initial Values in Simple Perception,” Proc. 1998 IEEE World Congress Computational Intelligence, IJCNN '98, pp. 1530-1535, 1998.
[22] S.J. Raudys and A.K. Jain, "Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, pp. 252-264, 1991.
[23] S. Raudys and V. Pikelis, “On Dimensionality, Sample Size, Classification Error and Complexity of the Classification Algorithm in Pattern Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 2, pp. 242-252, 1980.
[24] S. Raudys and A. Saudargiene, “Structures of the Covariance Matrices in the Classifier Design,” Proc. Joint IAPR Int'l Workshops/SSPR '98 and SPR '98, pp. 583-592, 1998.
[25] A. Saudargiene, “Structurization of the Covariance Matrix by Process Type and Block-Diagonal Models in the Classifier Design,” Informatica, vol. 10, no. 2, pp. 245-269, Vilnius: Inst. of Math. and Informatics Press, 1999.
[26] V.I. Serdobolskij, “The Moments of Discriminant Function and Classification for a Large Number of Variables,” S. Raudys, ed., vol. 38, pp. 27-51, Vilnius: Statistical Problems of Control. Inst. of Math. and Cyb. Press, 1979 (in Russian).
[27] V.I. Zarudskij, “The Use of Models of Simple Dependence Problems in Classification,” Statistical Problems of Control, S. Raudys, ed., vol. 38, pp. 53-75, Vilnius: Inst. of Math. and Cyb. Press, 1979 (in Russian).
[28] V.I. Zarudskij, “Determination of Some Graph Connections for Normal Vectors in Large Dimensional Case,” Algorithmic and Programic Supply of Applied Multivariate Statistical Analysis, S.A. Aivazian, ed., pp. 189-208, Moscow: Nauka, 1980 (in Russian).

Index Terms:
First-order tree-type dependence, a priori information, classification, generalization, sample size, dimensionality.
Sarunas Raudys, Ausra Saudargiene, "First-Order Tree-Type Dependence between Variables and Classification Performance," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 233-239, Feb. 2001, doi:10.1109/34.908975
Usage of this product signifies your acceptance of the Terms of Use.