Proceedings 41st Annual Symposium on Foundations of Computer Science (2000)
Redondo Beach, California
Nov. 12, 2000 to Nov. 14, 2000
M. Charikar , Dept. of Comput. Sci., Stanford Univ., CA, USA
V. Guruswami , Dept. of Comput. Sci., Stanford Univ., CA, USA
R. Kumar , Dept. of Comput. Sci., Stanford Univ., CA, USA
S. Rajagopalan , Dept. of Comput. Sci., Stanford Univ., CA, USA
A. Sahai , Dept. of Comput. Sci., Stanford Univ., CA, USA
Motivated by frequently recurring themes in information retrieval and related disciplines, we define a genre of problems called combinatorial feature selection problems. Given a set S of multidimensional objects, the goal is to select a subset K of relevant dimensions (or features) such that some desired property /spl Pi/ holds for the set S restricted to K. Depending on /spl Pi/, the goal could be to either maximize or minimize the size of the subset K. Several well-studied feature selection problems can be cast in this form. We study the problems in this class derived from several natural and interesting properties /spl Pi/, including variants of the classical p-center problem as well as problems akin to determining the VC-dimension of a set system. Our main contribution is a theoretical framework for studying combinatorial feature selection, providing (in most cases essentially tight) approximation algorithms and hardness results for several instances of these problems.
information retrieval; combinatorial mathematics; feature extraction; optimisation; set theory; computational complexity; combinatorial feature selection problems; information retrieval; multidimensional objects; subset size maximization; subset size minimization; p-center problem; VC-dimension; Vapnik-Chervonenkis dimension; approximation algorithms; hardness results
M. Charikar, V. Guruswami, S. Rajagopalan, A. Sahai and R. Kumar, "Combinatorial feature selection problems," Proceedings 41st Annual Symposium on Foundations of Computer Science(FOCS), Redondo Beach, California, 2000, pp. 631.