2012 IEEE 53rd Annual Symposium on Foundations of Computer Science (2012)

New Brunswick, NJ, USA USA

Oct. 20, 2012 to Oct. 23, 2012

ISSN: 0272-5428

ISBN: 978-1-4673-4383-1

pp: 21-30

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/FOCS.2012.64

ABSTRACT

One motivation for property testing of boolean functions is the idea that testing can provide a fast preprocessing step before learning. However, in most machine learning applications, it is not possible to request for labels of arbitrary examples constructed by an algorithm. Instead, the dominant query paradigm in applied machine learning, called *active learning*, is one where the algorithm may query for labels, but only on points in a given (polynomial-sized) unlabeled sample, drawn from some underlying distribution D. In this work, we bring this well-studied model to the domain of testing. We develop both general results for this *active testing* model as well as efficient testing algorithms for several important properties for learning, demonstrating that testing can still yield substantial benefits in this restricted setting. For example, we show that testing unions of d intervals can be done with O(1) label requests in our setting, whereas it is known to require Omega(d) labeled examples for learning (and Omega(sqrt{d}) for passive testing [KR00] where the algorithm must pay for every example drawn from D). In fact, our results for testing unions of intervals also yield improvements on prior work in both the classic query model (where any point in the domain can be queried) and the passive testing model as well. For the problem of testing linear separators in R^n over the Gaussian distribution, we show that both active and passive testing can be done with O(sqrt{n}) queries, substantially less than the Omega(n) needed for learning, with near-matching lower bounds. We also present a general combination result in this model for building testable properties out of others, which we then use to provide testers for a number of assumptions used in semi-supervised learning. In addition to the above results, we also develop a general notion of the *testing dimension* of a given property with respect to a given distribution, that we show characterizes (up to constant factors) the intrinsic number of label requests needed to test that property. We develop such notions for both the active and passive testing models. We then use these dimensions to prove a number of lower bounds, including for linear separators and the class of dictator functions.

INDEX TERMS

Unions of intervals, Property testing, Active learning, Boolean functions, Linear threshold functions

CITATION

M. Balcan, E. Blais, A. Blum and L. Yang, "Active Property Testing,"

*2012 IEEE 53rd Annual Symposium on Foundations of Computer Science(FOCS)*, New Brunswick, NJ, USA USA, 2012, pp. 21-30.

doi:10.1109/FOCS.2012.64

CITATIONS

SEARCH