This Article 
 Bibliographic References 
 Add to: 
Mars: Accelerating MapReduce with Graphics Processors
April 2011 (vol. 22 no. 4)
pp. 608-620
Wenbin Fang, University of Wisconsin-Madison, Madison
Bingsheng He, Nanyang Technological University, Singapore
Qiong Luo, Hong Kong University of Science and Technology, Hong Kong
Naga K. Govindaraju, Microsoft Corp., Redmond
We design and implement Mars, a MapReduce runtime system accelerated with graphics processing units (GPUs). MapReduce is a simple and flexible parallel programming paradigm originally proposed by Google, for the ease of large-scale data processing on thousands of CPUs. Compared with CPUs, GPUs have an order of magnitude higher computation power and memory bandwidth. However, GPUs are designed as special-purpose coprocessors and their programming interfaces are less familiar than those on the CPUs to MapReduce programmers. To harness GPUs' power for MapReduce, we developed Mars to run on NVIDIA GPUs, AMD GPUs as well as multicore CPUs. Furthermore, we integrated Mars into Hadoop, an open-source CPU-based MapReduce system. Mars hides the programming complexity of GPUs behind the simple and familiar MapReduce interface, and automatically manages task partitioning, data distribution, and parallelization on the processors. We have implemented six representative applications on Mars and evaluated their performance on PCs equipped with GPUs as well as multicore CPUs. The experimental results show that, the GPU-CPU coprocessing of Mars on an NVIDIA GTX280 GPU and an Intel quad-core CPU outperformed Phoenix, the state-of-the-art MapReduce on the multicore CPU with a speedup of up to 72 times and 24 times on average, depending on the applications. Additionally, integrating Mars into Hadoop enabled GPU acceleration for a network of PCs.

[1] AMD Brook+. ing /, 2010.
[2] CUDA—Tutorial 5—Performance of Atomics. http://supercompu , 2010.
[3] CUDPP., 2010.
[4] Hadoop. /, 2010.
[5] NVIDIA CUDA. http://www.nvidia.comcuda, 2006.
[6] OpenCL. http://www.khronos.orgopencl/, 2008.
[7] A. Ailamaki, N.K. Govindaraju, S. Harizopoulos, and D. Manocha, "Query Co-Processing on Commodity Processors," Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB), 2006.
[8] F. Black and M.S. Scholes, "The Pricing of Options and Corporate Liabilities," J. Political Economy, vol. 81, no. 3, pp. 637-54, May-June 1973.
[9] P.P. Boyle, "Options: A Monte Carlo Approach," J. Financial Economics, vol. 4, pp. 323-338, 1977.
[10] I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan, "Brook for GPUs: Stream Computing on Graphics Hardware," Proc. ACM SIGGRAPH, 2004.
[11] B. Catanzaro, N. Sundaram, and K. Keutzer, "A Map Reduce Framework for Programming Gpus," Proc. Third Workshop Software Tools for MultiCore Systems (STMCS), 2008.
[12] M. Charalambous, P. Trancoso, and R. Stamatakis, "Initial Experiences Porting a Bioinformatics Application to a Graphics Processor," Proc. 10th Panhellenic Conf. Informatics, 2005.
[13] C.-T. Chu, S.K. Kim, Y.-A. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, and K. Olukotun, "Map-Reduce for Machine Learning on Multicore," Proc. Neural Information Processing Systems Conf. (NIPS), 2006.
[14] M. de Kruijf and K. Sankaralingam, "Mapreduce for the Cell B.E. Architecture," technical report, Univ. of Wisconsin at Madison, 2007.
[15] J. Dean and S. Ghemawat, "Mapreduce: Simplified Data Processing on Large Clusters," Proc. Sixth Conf. Symp. Opearting Systems Design and Implementation (OSDI), 2004.
[16] W. Fang, B. He, and Q. Luo, "Database Compression on Graphics Processors," Proc. 36th Int'l Conf. Very Large Data Bases (VLDB), 2010.
[17] W. Fang, K.K. Lau, M. Lu, X. Xiao, C.K. Lam, P.Y. Yang, B. He, Q. Luo, P.V. Sander, and K. Yang, "Parallel Data Mining on Graphics Processors," Technical Report HKUST-CS08-07, Hong Kong Univ. of Science and Technology (HKUST), 2008.
[18] W. Fang, M. Lu, X. Xiao, B. He, and Q. Luo, "Frequent Itemset Mining on Graphics Processors," Proc. Fifth Int'l Workshop Data Management on New Hardware (DaMoN '09), pp. 34-42, 2009.
[19] J. Feng, S. Chakraborty, B. Schmidt, W. Liu, and U.D. Bordoloi, "Fast Schedulability Analysis Using Commodity Graphics Hardware," Proc. 13th IEEE Int'l Conf. Embedded and Real-Time Computing Systems and Applications (RTCSA), 2007.
[20] N. Govindaraju, J. Gray, R. Kumar, and D. Manocha, "GPUTeraSort: High Performance Graphics Co-Processor Sorting for Large Database Management," Proc. ACM SIGMOD, 2006.
[21] N.K. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha, "Fast Computation of Database Operations Using Graphics Processors," Proc. ACM SIGMOD, 2004.
[22] B. He, W. Fang, Q. Luo, N.K. Govindaraju, and T. Wang, "Mars: A Mapreduce Framework on Graphics Processors," Proc. 17th Int'l Conf. Parallel Architectures and Compilation Techniques (PACT), 2008.
[23] B. He, N.K. Govindaraju, Q. Luo, and B. Smith, "Efficient Gather and Scatter Operations on Graphics Processors," Proc. ACM/IEEE Conf. Supercomputing, 2007.
[24] B. He, M. Lu, K. Yang, R. Fang, N.K. Govindaraju, Q. Luo, and P.V. Sander, "Relational Query Coprocessing on Graphics Processors," ACM Trans. Database Systems, vol. 34, no. 4, pp. 1-39, 2009.
[25] B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander, "Relational Joins on Graphics Processors," Proc. ACM SIGMOD, 2008.
[26] C. Jiang and M. Snir, "Automatic Tuning Matrix Multiplication Performance on Graphics Hardware," Proc. 14th Int'l Conf. Parallel Architectures and Compilation Techniques (PACT), 2005.
[27] A. Kerr, G. Diamos, and S. Yalamanchili, "Modeling GPU-CPU Workloads and Systems," Proc. Third Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-3), 2010.
[28] M.D. Linderman, J.D. Collins, H. Wang, and T.H. Meng, "Merge: A Programming Model for Heterogeneous Multi-Core Systems," Proc. 13th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2008.
[29] M.D. McCool, "Data-Parallel Programming on the Cell BE and the GPU Using the Rapidmind Development Platform," Proc. GSPx Multicore Applications Conf., 2006.
[30] NVIDIA Corp., NVIDIA CUDA Programming Guide 2.0, 2008.
[31] J.D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krger, A.E. Lefohn, and T.J. Purcell, "A Survey of General-Purpose Computation on Graphics Hardware," Computer Graphics Forum, vol. 26, no. 1, pp. 80-113, 2007.
[32] C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis, "Evaluating Mapreduce for Multi-Core and Multiprocessor Systems," Proc. IEEE 13th Int'l Symp. High Performance Computer Architecture (HPCA), 2007.
[33] S. Sengupta, M. Harris, Y. Zhang, and J.D. Owens, "Scan Primitives for GPU Computing," Proc. 22nd ACM SIGGRAPH/EUROGRAPHICS Symp. Graphics Hardware, 2007.
[34] D. Tarditi, S. Puri, and J. Oglesby, "Accelerator: Using Data Parallelism to Program GPUs for General-Purpose Uses," Proc. 12th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2006.
[35] V. Volkov and J.W. Demmel, "Benchmarking GPUs to Tune Dense Linear Algebra," Proc. ACM/IEEE Conf. Supercomputing, 2008.
[36] H. Yang, A. Dasdan, R.-L. Hsiao, and D.S. Parker, "Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters," Proc. ACM SIGMOD, 2007.
[37] J.H. Yeung, C. Tsang, K. Tsoi, B.S. Kwan, C.C. Cheung, A.P. Chan, and P.H. Leong, "Map-Reduce as a Programming Model for Custom Computing Machines," Proc. 16th Int'l Symp. Field-Programmable Custom Computing Machines (FCCM), 2008.
[38] R. Yoo, A. Romano, and C. Kozyrakis, "Phoenix Rebirth: Scalable Mapreduce on a NUMA System," Proc. Int'l Symp. Workload Characterization (IISWC), 2009.

Index Terms:
MapReduce, graphics processor, parallel computing, multicore processor, many-core architecture.
Wenbin Fang, Bingsheng He, Qiong Luo, Naga K. Govindaraju, "Mars: Accelerating MapReduce with Graphics Processors," IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 4, pp. 608-620, April 2011, doi:10.1109/TPDS.2010.158
Usage of this product signifies your acceptance of the Terms of Use.