Subscribe
Issue No.07 - July (2013 vol.35)
pp: 1592-1605
V. Y. F. Tan , Inst. for Infocomm Res., A*STAR, Singapore, Singapore
C. Fevotte , Lab. Lagrange, Univ. de Nice Sophia Antipolis, Nice, France
ABSTRACT
This paper addresses the estimation of the latent dimensionality in nonnegative matrix factorization (NMF) with the β-divergence. The β-divergence is a family of cost functions that includes the squared euclidean distance, Kullback-Leibler (KL) and Itakura-Saito (IS) divergences as special cases. Learning the model order is important as it is necessary to strike the right balance between data fidelity and overfitting. We propose a Bayesian model based on automatic relevance determination (ARD) in which the columns of the dictionary matrix and the rows of the activation matrix are tied together through a common scale parameter in their prior. A family of majorization-minimization (MM) algorithms is proposed for maximum a posteriori (MAP) estimation. A subset of scale parameters is driven to a small lower bound in the course of inference, with the effect of pruning the corresponding spurious components. We demonstrate the efficacy and robustness of our algorithms by performing extensive experiments on synthetic data, the swimmer dataset, a music decomposition example, and a stock price prediction task.
INDEX TERMS
Bayesian methods, Linear programming, Cost function, Data models, Principal component analysis, Algorithm design and analysis,automatic relevance determination, Nonnegative matrix factorization, model order selection, majorization-minimization, group-sparsity
CITATION
V. Y. F. Tan, C. Fevotte, "Automatic Relevance Determination in Nonnegative Matrix Factorization with the /spl beta/-Divergence", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 7, pp. 1592-1605, July 2013, doi:10.1109/TPAMI.2012.240
REFERENCES
 [1] P. Paatero and U. Tapper, "Positive Matrix Factorization: A Non-Negative Factor Model with Optimal Utilization of Error Estimates of Data Values," Environmetrics, vol. 5, pp. 111-126, 1994. [2] D.D. Lee and H.S. Seung, "Learning the Parts of Objects with Nonnegative Matrix Factorization," Nature, vol. 401, pp. 788-791, 1999. [3] C. Févotte, N. Bertin, and J.-L. Durrieu, "Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis," Neural Computation, vol. 21, pp. 793-830, Mar. 2009. [4] D. Guillamet, B. Schiele, and J. Vitri, "Analyzing Non-Negative Matrix Factorization for Image Classification," Proc. Int'l Conf. Pattern Recognition, 2002. [5] K. Drakakis, S. Rickard, R. de Frein, and A. Cichocki, "Analysis of Financial Data Using Non-Negative Matrix Factorization," Int'l J. Math. Sciences, vol. 6, June 2007. [6] Y. Gao and G. Church, "Improving Molecular Cancer Class Discovery through Sparse Non-Negative Matrix Factorization," Bioinformatics, vol. 21, pp. 3970-3975, 2005. [7] A. Cichocki, R. Zdunek, and S. Amari, "Csiszar's Divergences for Non-Negative Matrix Factorization: Family of New Algorithms," Proc. Sixth Int'l Conf. Independent Component Analysis and Blind Signal Separation, pp. 32-39, Mar. 2006. [8] M. Nakano, H. Kameoka, J. Le Roux, Y. Kitano, N. Ono, and S. Sagayama, "Convergence-Guaranteed Multiplicative Algorithms for Non-Negative Matrix Factorization with Beta-Divergence," Proc. IEEE Int'l Workshop Machine Learning for Signal Processing, Sept. 2010. [9] C. Févotte and J. Idier, "Algorithms for Nonnegative Matrix Factorization with the Beta-Divergence," Neural Computation, vol. 23, pp. 2421-2456, Sept. 2011. [10] A. Cichocki, S. Cruces, and S. Amari, "Generalized Alpha-Beta Divergences and Their Application to Robust Nonnegative Matrix Factorization," Entropy, vol. 13, pp. 134-170, 2011. [11] G. Schwarz, "Estimating the Dimension of a Model," Annals of Statistics, vol. 6, pp. 461-464, 1978. [12] D.J.C. Mackay, "Probable Networks and Plausible Predictions—A Review of Practical Bayesian Models for Supervised Neural Networks," Network: Computation in Neural Systems, vol. 6, no. 3, pp. 469-505, 1995. [13] C.M. Bishop, "Bayesian PCA," Advances in Neural Information Processing Systems, pp. 382-388, 1999. [14] A.T. Cemgil, "Bayesian Inference for Nonnegative Matrix Factorisation Models," Computational Intelligence and Neuroscience, vol. 2009, Article ID 785152, p. 17, 2009, doi:10.1155/2009/785152. [15] M.N. Schmidt, O. Winther, and L.K. Hansen, "Bayesian Non-Negative Matrix Factorization," Proc. Eighth Int'l Conf. Independent Component Analysis and Signal Separation, Mar. 2009. [16] M. Zhong and M. Girolami, "Reversible Jump MCMC for Non-Negative Matrix Factorization," Proc. Int'l Conf. Artificial Intelligence and Statistics, p. 8, 2009. [17] M.N. Schmidt and M. Mørup, "Infinite Non-Negative Matrix Factorizations," Proc. European Signal Processing Conf., 2010. [18] M. Mørup and L.K. Hansen, "Tuning Pruning in Sparse Non-Negative Matrix Factorization," Proc. 17th European Signal Processing Conf., Aug. 2009. [19] M. Mørup and L.K. Hansen, "Automatic Relevance Determination for Multiway Models," J. Chemometrics, vol. 23, nos. 7/8, pp. 352-363, 2009. [20] Z. Yang, Z. Zhu, and E. Oja, "Automatic Rank Determination in Projective Nonnegative Matrix Factorization," Proc. Ninth Int'l Conf. Latent Variable Analysis and Signal Separation, pp. 514-521, 2010. [21] M.D. Hoffman, D.M. Blei, and P.R. Cook, "Bayesian Nonparametric Matrix Factorization for Recorded Music," Proc. Int'l Conf. Machine Learning, 2010. [22] V.Y.F. Tan and C. Févotte, "Automatic Relevance Determination in Nonnegative Matrix Factorization," Proc. Workshop Signal Processing with Adaptative Sparse Structured Representations, Apr. 2009. [23] A. Basu, I.R. Harris, N.L. Hjort, and M.C. Jones, "Robust and Efficient Estimation by Minimising a Density Power Divergence," Biometrika, vol. 85, pp. 549-559, Sept. 1998. [24] S. Eguchi and Y. Kano, "Robustifying Maximum Likelihood Estimation," technical report, Inst. of Statistical Math., Research Memo. 802, June 2001. [25] M. Tweedie, "An Index which Distinguishes between Some Important Exponential Families," Proc. Indian Statistical Inst. of Golden Jubilee Int'l Conf., pp. 579-604, 1984. [26] D.R. Hunter and K. Lange, "A Tutorial on MM Algorithms," The Am. Statistician, vol. 58, pp. 30-37, 2004. [27] A. Cichocki, R. Zdunek, A.H. Phan, and S.-I. Amari, Nonnegative Matrix and Tensor Factorizations:Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation. John Wiley & Sons, 2009. [28] Y.K. Yilmaz, "Generalized Tensor Factorization," PhD thesis, Boğaziçi Univ., 2012. [29] B. Jørgensen, "Exponential Dispersion Models," J. Royal Statistical Soc. Series B (Methodological), vol. 49, no. 2, p. 127162, 1987. [30] M. Yuan and Y. Lin, "Model Selection and Estimation in Regression with Grouped Variables," J. Royal Statistical Soc., Series B, vol. 68, no. 1, pp. 49-67, 2007. [31] F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, "Optimization with Sparsity-Inducing Penalties," Foundations and Trends in Machine Learning, vol. 4, no. 1, pp. 1-106, 2012. [32] E.J. Candès, M.B. Wakin, and S.P. Boyd, "Enhancing Sparsity by Reweighted $\ell_1$ Minimization," J. Fourier Analysis and Applications, vol. 14, pp. 877-905, Dec. 2008. [33] V.Y.F. Tan and C. Févotte, "Supplementary Material for 'Automatic Relevance Determination in Nonnegative Matrix Factorization with the $\beta$ -Divergence'," http://doi.ieeecomputersociety. org/10.1109 TPAMI.2012.240, 2012. [34] Z. Yang and E. Oja, "Unified Development of Multiplicative Algorithms for Linear and Quadratic Nonnegative Matrix Factorization," IEEE Trans. Neural Networks, vol. 22, no. 12, pp. 1878-1891, Dec. 2011. [35] Z. Yang and E. Oja, "Linear and Nonlinear Projective Nonnegative Matrix Factorization," IEEE Trans. Neural Networks, vol. 21, no. 5, pp. 734-749, May 2010. [36] J. Eggert and E. Körner, "Sparse Coding and NMF," Proc. IEEE Int'l Joint Conf. Neural Networks, pp. 2529-2533, 2004. [37] D. Donoho and V. Stodden, "When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts?" Proc. Advances in Neural Information Processing Systems Conf., 2004. [38] N.-D. Ho, "Nonnegative Matrix Factorization Algorithms and Applications," PhD thesis, Universiteit Katholique de Louvain, 2008. [39] M.E. Tipping, "Sparse Bayesian Learning and the Relevance Vector Machine," J. Machine Learning Research, vol. 1, pp. 211-244, 2001. [40] D.P. Wipf, B.D. Rao, and S. Nagarajan, "Latent Variable Bayesian Models for Promoting Sparsity," IEEE Trans. Information Theory, vol. 57, no. 9, pp. 6236-55, Sept. 2011. [41] R. Salakhutdinov and A. Mnih, "Probabilistic Matrix Factorization," Proc. Advances in Neural Information Processing Systems Conf., vol. 19, 2007. [42] A. Lefèvre, F. Bach, and C. Févotte, "Online Algorithms for Nonnegative Matrix Factorization with the Itakura-Saito Divergence," Proc. IEEE Workshop Applications of Signal Processing to Audio and Acoustics, Oct. 2011. [43] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, "Online Learning for Matrix Factorization and Sparse Coding," J. Machine Learning Research, vol. 11, pp. 10-60, 2010.