Issue No. 04 - April (2006 vol. 28)

ISSN: 0162-8828

pp: 497-508

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPAMI.2006.77

ABSTRACT

Many algorithms in machine learning rely on being given a good distance metric over the input space. Rather than using a default metric such as the Euclidean metric, it is desirable to obtain a metric based on the provided data. We consider the problem of learning a Riemannian metric associated with a given differentiable manifold and a set of points. Our approach to the problem involves choosing a metric from a parametric family that is based on maximizing the inverse volume of a given data set of points. From a statistical perspective, it is related to maximum likelihood under a model that assigns probabilities inversely proportional to the Riemannian volume element. We discuss in detail learning a metric on the multinomial simplex where the metric candidates are pull-back metrics of the Fisher information under a Lie group of transformations. When applied to text document classification the resulting geodesic distance resemble, but outperform, the tfidf cosine similarity measure.

INDEX TERMS

Distance learning, text analysis, machine learning.

CITATION

Guy Lebanon, "Metric Learning for Text Documents",

*IEEE Transactions on Pattern Analysis & Machine Intelligence*, vol. 28, no. , pp. 497-508, April 2006, doi:10.1109/TPAMI.2006.77