Proceedings 41st Annual Symposium on Foundations of Computer Science (2000)

Redondo Beach, California

Nov. 12, 2000 to Nov. 14, 2000

ISSN: 0272-5428

ISBN: 0-7695-0850-2

pp: 631

M. Charikar , Dept. of Comput. Sci., Stanford Univ., CA, USA

V. Guruswami , Dept. of Comput. Sci., Stanford Univ., CA, USA

R. Kumar , Dept. of Comput. Sci., Stanford Univ., CA, USA

S. Rajagopalan , Dept. of Comput. Sci., Stanford Univ., CA, USA

A. Sahai , Dept. of Comput. Sci., Stanford Univ., CA, USA

ABSTRACT

Motivated by frequently recurring themes in information retrieval and related disciplines, we define a genre of problems called combinatorial feature selection problems. Given a set S of multidimensional objects, the goal is to select a subset K of relevant dimensions (or features) such that some desired property /spl Pi/ holds for the set S restricted to K. Depending on /spl Pi/, the goal could be to either maximize or minimize the size of the subset K. Several well-studied feature selection problems can be cast in this form. We study the problems in this class derived from several natural and interesting properties /spl Pi/, including variants of the classical p-center problem as well as problems akin to determining the VC-dimension of a set system. Our main contribution is a theoretical framework for studying combinatorial feature selection, providing (in most cases essentially tight) approximation algorithms and hardness results for several instances of these problems.

INDEX TERMS

information retrieval; combinatorial mathematics; feature extraction; optimisation; set theory; computational complexity; combinatorial feature selection problems; information retrieval; multidimensional objects; subset size maximization; subset size minimization; p-center problem; VC-dimension; Vapnik-Chervonenkis dimension; approximation algorithms; hardness results

CITATION

M. Charikar, V. Guruswami, S. Rajagopalan, A. Sahai and R. Kumar, "Combinatorial feature selection problems,"

*Proceedings 41st Annual Symposium on Foundations of Computer Science(FOCS)*, Redondo Beach, California, 2000, pp. 631.

doi:10.1109/SFCS.2000.892331

CITATIONS

SEARCH