We give the first algorithm that (under distributional assumptions) efficiently learns halfspaces in the notoriously difficult agnostic framework of Kearns, Schapire, & Sellie, where a learner is given access to labeled examples drawn from a distribution, without restriction on the labels (e.g. adversarial noise). The algorithm constructs a hypothesis whose error rate on future examples is within an additive \varepsilon of the optimal halfspace, in time poly(n) for any constant \varepsilon > 0, under the uniform distribution over {-1,1}^n or the unit sphere in R^n, as well as under any log-concave distribution over R^n. It also agnostically learns Boolean disjunctions in time b^2 (\sqrt n) with respect to any distribution. The new algorithm, essentially L1 polynomial regression, is a noise-tolerant arbitrary-distribution generalization of the "low-degree" Fourier algorithm of Linial, Mansour, & Nisan. We also give a new algorithm for PAC learning halfspaces under the uniform distribution on the unit sphere with the current best bounds on tolerable rate of "malicious noise."