2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) (2014)
Chicago, IL, USA
May 26, 2014 to May 29, 2014
In this paper we present the scaling of BTWorld, our MapReduce-based approach to observing and analyzing the global BitTorrent network which we have been monitoring for the past 4 years. BTWorld currently provides a comprehensive and complex set of queries implemented in Pig Latin, with data dependencies between them, which translate to several MapReduce jobs that have a heavy-tailed distribution with respect to both execution time and input size characteristics. Processing BitTorrent data in excess of 1 TB with our BTWorld workflow required an in-depth analysis of the entire software stack and the design of a complete optimization cycle. We analyze our system from both theoretical and experimental perspectives and we show how we attained a 15 times larger scale of data processing than our previous results.
Runtime, Optimization, Peer-to-peer computing, Software, Data mining, Big data, Monitoring
B. Ghit, M. Capota, T. Hegeman, J. Hidders, D. Epema and A. Iosup, "V for Vicissitude: The Challenge of Scaling Complex Big Data Workflows," 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)(CCGRID), Chicago, IL, USA, 2014, pp. 927-932.