The Community for Technology Leaders
Neural Networks, IEEE - INNS - ENNS International Joint Conference on (2000)
Como, Italy
July 24, 2000 to July 27, 2000
ISSN: 1098-7576
ISBN: 0-7695-0619-4
pp: 1305
Yoshua Bengio , Universit? de Montr?al
ABSTRACT
Many machine learning algorithms can be formulated as the minimization of a training criterion, which involves a hyper-parameter. This hyper-parameter is usually chosen by trial and error with a model selection criterion. In this paper, we present a methodology to optimize several hyper-parameters, based on the computation of the gradient of a model selection criterion with respect to the hyper-parameters. In the case of a quadratic training criterion, the gradient of the selection criterion with respect to the hyper-parameters is efficiently computed by back propagating through Cholesky decomposition. In the more general case, we show that the implicit function theorem can be used to derive a formula for the hyper-parameter gradient involving second derivatives of the training criterion.
INDEX TERMS
CITATION

Y. Bengio, "Continuous Optimization of Hyper-Parameters," Neural Networks, IEEE - INNS - ENNS International Joint Conference on(IJCNN), Como, Italy, 2000, pp. 1305.
doi:10.1109/IJCNN.2000.857853
107 ms
(Ver 3.3 (11022016))