The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.07 - July (2011 vol.33)
pp: 1470-1481
David C. Hoyle , University of Manchester, Manchester
ABSTRACT
For many learning problems, estimates of the inverse population covariance are required and often obtained by inverting the sample covariance matrix. Increasingly for modern scientific data sets, the number of sample points is less than the number of features and so the sample covariance is not invertible. In such circumstances, the Moore-Penrose pseudo-inverse sample covariance matrix, constructed from the eigenvectors corresponding to nonzero sample covariance eigenvalues, is often used as an approximation to the inverse population covariance matrix. The reconstruction error of the pseudo-inverse sample covariance matrix in estimating the true inverse covariance can be quantified via the Frobenius norm of the difference between the two. The reconstruction error is dominated by the smallest nonzero sample covariance eigenvalues and diverges as the sample size becomes comparable to the number of features. For high-dimensional data, we use random matrix theory techniques and results to study the reconstruction error for a wide class of population covariance matrices. We also show how bagging and random subspace methods can result in a reduction in the reconstruction error and can be combined to improve the accuracy of classifiers that utilize the pseudo-inverse sample covariance matrix. We test our analysis on both simulated and benchmark data sets.
INDEX TERMS
Pseudo-inverse, linear discriminants, peaking phenomenon, random matrix theory, bagging, random subspace method.
CITATION
David C. Hoyle, "Accuracy of Pseudo-Inverse Covariance Learning—A Random Matrix Theory Analysis", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.33, no. 7, pp. 1470-1481, July 2011, doi:10.1109/TPAMI.2010.186
REFERENCES
[1] C.M. Bishop, Neural Networks for Pattern Recognition. Oxford Univ. Press, 1995.
[2] T. Golub, D. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. Mesirov, H. Coller, M. Loh, J. Downing, M. Caligiuri, C. Bloomfield, and E. Lander, "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, pp. 531-537, 1999.
[3] A. Price, N. Patterson, R. Plenge, M. Weinblatt, N. Shadick, and D. Reich, "Principal Components Analysis Corrects for Stratification in Genome-Wide Association Studies," Nature Genetics, vol. 38, pp. 904-909, 2006.
[4] E.H. Moore, "On the Reciprocal of the General Algebraic Matrix," Bull. Am. Math. Soc., vol. 26, pp. 394-395, 1920.
[5] R. Penrose, "A Generalized Inverse for Matrices," Proc. Cambridge Philosophical Soc., vol. 51, pp. 406-413, 1955.
[6] S. Raudys and R. Duin, "Expected Classification Error of the Fisher Linear Classifier with Pseudo-Inverse Covariance Matrix," Pattern Recognition Letters, vol. 19, pp. 385-392, 1998.
[7] W. Krzanowski, P. Jonathan, W. McCarthy, and M. Thomas, "Discriminant Analysis with Singular Covariance Matrices: Methods and Applications to Spectroscopic Data," Applied Statistics, vol. 44, pp. 101-115, 1995.
[8] J. Schäfer and K. Strimmer, "An Empirical Bayes Approach to Inferring Large Scale Gene Association Networks," Bioinformatics, vol. 21, pp. 754-764, 2005.
[9] R.A. Horn and C.R. Johnson, Matrix Analysis. Cambridge Univ. Press, 1985.
[10] Y. Le Cun, I. Kanter, and S. Solla, "Eigenvalues of Covariance Matrices: Application to Neural-Network Learning," Physical Rev. Letters, vol. 66, pp. 2396-2399, 1991.
[11] A. Krogh and J. Hertz, "Generalization in a Linear Perceptron in the Presence of Noise," J. Physics A: Math. and General, vol. 25, pp. 1135-1147, 1992.
[12] L. Hansen, "Stochastic Linear Learning: Exact Test and Training Error Averages," Neural Networks, vol. 6, pp. 393-396, 1993.
[13] D. Barber, D. Saad, and P. Sollich, "Finite-Size Effects and Optimal Test Set Size in Linear Perceptrons," J. Physics A: Math. and General, vol. 28, pp. 1325-1334, 1995.
[14] J. von Neumann, "Some Matrix-Inequalities and Metrization of Matrix-Space," Tomsk Univ. Rev., vol. 1, pp. 286-300, 1937.
[15] L. Mirsky, "A Trace Inequality of John von Neumann," Monatshefte für Mathematik, vol. 79, pp. 303-306, 1975.
[16] J. Lasserre, "A Trace Inequality for Matrix Product," IEEE Trans. Automatic Control, vol. 40, no. 8, pp. 1500-1501, Aug. 1995.
[17] G.H. Hardy, J.E. Littlewood, and G. Pólya, Inequalities, second ed. Cambridge Univ. Press, 1988.
[18] W. Young, "On the Multiplication of Successions of Fourier Constants," Proc. Royal Soc. Series A, vol. 87, pp. 331-339, 1912.
[19] I. Johnstone, "High Dimensional Statistical Inference and Random Matrices," Proc. Int'l Congress of Mathematicians, M. Sanz-Solé, J. Soria, J. Varona, and J. Verdera, eds., 2006.
[20] D. Hoyle and M. Rattray, "Statistical Mechanics of Learning Multiple Orthogonal Signals: Asymptotic Theory and Fluctuation Effects," Physical Rev. E, vol. 75, 2007, doi: 10.1103/PhysRevE. 75.016101.
[21] V.A. Marčenko and L.A. Pastur, "Distribution of Eigenvalues for Some Sets of Random Matrices," Math. USSR-Sb, vol. 1, pp. 457-483, 1967.
[22] K. Wachter, "The Strong Limits of Random Matrix Spectra for Sample Matrices of Independent Elements," Annnals of Probability, vol. 6, pp. 1-18, 1978.
[23] A.M. Sengupta and P.P. Mitra, "Distributions of Singular Values for Some Random Matrices," Physical Rev. E, vol. 60, pp. 3389-3392, 1999.
[24] D. Hoyle and M. Rattray, "A Statistical Mechanics Analysis of Gram Matrix Eigenvalue Spectra," Proc. Conf. Learning Theory, J. Shawe-Taylor and Y. Singer, eds., 2004.
[25] L. Breiman, "Bagging Predictors," Machine Learning, vol. 24, pp. 123-140, 1996.
[26] G. Stewart, "The Efficient Generation of Random Orthogonal Matrices with an Application to Condition Estimators," SIAM J. Numerical Analysis, vol. 17, pp. 403-409, 1980.
[27] I. Guyon, J. Li, T. Mader, P.A. Pletscher, G. Schneider, and M. Uhr, "Competitive Baseline Methods Set New Standards for the NIPS 2003 Feature Selection Benchmark," Pattern Recognition letters, vol. 28, pp. 1438-1444, 2007.
[28] D. Hoyle, M. Rattray, R. Jupp, and A. Brass, "Making Sense of Microarray Data Distributions," Bioinformatics, vol. 18, pp. 576-584, 2002.
[29] B. Scholköpf, A. Smola, and K.-R. Müller, "Nonlinear Component Analysis as a Kernel Eigenvalue Problem," Neural Computation, vol. 10, pp. 1299-1319, 1998.
[30] I.M. Johnstone, "On the Distribution of the Largest Eigenvalue in Principal Components Analysis," Annals of Statistics, vol. 29, pp. 295-327, 2001.
[31] J. Baik, G. Ben Arous, and S. Peche, "Phase Transition of the Largest Eigenvalue for Non-Null Complex Sample Covariance Matrices," Annals of Probability, vol. 33, pp. 1643-1697, 2005.
[32] J. Baik and J. Silverstein, "Eigenvalues of Large Sample Covariance Matrices of Spiked Population Models," J. Multivariate Analysis, vol. 97, pp. 1382-1408, 2006.
[33] D. Paul, "Asymptotics of Sample Eigenstruture for a Large Dimensional Spiked Covariance Model," Statistica Sinica, vol. 17, pp. 1617-1642, 2007.
[34] D.C. Hoyle and M. Rattray, "PCA Learning for Sparse High-Dimensional Data," Europhysics Letters, vol. 62, pp. 117-123, 2003.
[35] D. Hoyle and M. Rattray, "Principal-Component-Analysis Eigenvalue Spectra from Data with Symmetry-Breaking Structure," Physical Rev. E, vol. 69, 2004, doi: 10.1103/PhysRevE.69.026124.
[36] M. Tipping and C. Bishop, "Probabilistic Principal Component Analysis," J. Royal Statistical Soc. B, vol. 61, pp. 611-622, 1999.
[37] T. Minka, "Automatic Choice of Dimensionality for PCA," Advances in Neural Information Processing Systems, T. Leen, T. Dietterich, and V. Tresp, eds., pp. 598-604, MIT Press, 2001.
[38] D. Hoyle, "Automatic PCA Dimension Selection for High Dimensional Data and Small Sample Sizes," J. Machine Learning Research, vol. 9, pp. 2733-2759, 2008.
[39] J. Schäfer and K. Strimmer, "A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics," Statistical Applications in Genetics and Molecular Biology, vol. 4, 2005.
[40] P. Bickel and E. Levina, "Regularized Estimation of Large Covariance Matrices," Annals of Statistics, vol. 36, pp. 199-227, 2008.
17 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool