In this paper, we describe a new distributed community detection algorithm for billion-edge directed graphs that, unlike modularity-based methods, achieves cluster quality on par with the best-known algorithms in the literature. We show that a simple approximation to the best-known serial algorithm dramatically reduces computation and enables distributed evaluation yet incurs only a very small impact on cluster quality. We present three main results: First, we show that the clustering produced by our scalable approximate algorithm compares favorably with prior results on small synthetic benchmarks and small real-world datasets (70 million edges). Second, we evaluate our algorithm on billion-edge directed graphs (a 1.5B edge social network graph, and a 3.7B edge web crawl), and show that the results exhibit the structural properties predicted by analysis of much smaller graphs from similar sources. Third, we show that our algorithm exhibits over 90% parallel efficiency on massive graphs in weak scaling experiments.