Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007)
Distance Metric Learning through Optimization of Ranking
Omaha, Nebraska, USA
October 28-October 31
ISBN: 0-7695-3033-8
Data preprocessing is important in machine learning, data mining, and pattern recognition. In particular, selecting relevant features in high- dimensional data is often necessary to efficiently construct models that accurately describe the data. For example, many lazy learning algorithms (like k- Nearest Neighbor) rely on feature-based distance metrics to compare input patterns for the purpose of classification or retrieval from a database. In previous work, we introduced Slider, a distance metric learning method that optimizes the weights of features in a protein model-building application (where features are used to describe patterns of electron density around protein macromolecules). In this work, we demonstrate the usefulness of Slider as a general method for classification, ranking and retrieval, with results on several benchmark datasets. We also compare it to other well-known feature selection or weighting methods.