2013 National Conference on Parallel Computing Technologies (PARCOMPTECH) (2013)
Feb. 21, 2013 to Feb. 23, 2013
Ranajoy Malakar , Corporate Research and Technology, Siemens Technology Services, Bangalore, India
Naga Vydyanathan , Corporate Research and Technology, Siemens Technology Services, Bangalore, India
Hadoop is a map-reduce based distributed processing framework, frequently used in the industry today, in areas of big data analysis, particularly text analysis. Graphics processing units (GPUs), on the other hand, are massively parallel platforms with attractive performance to price and power ratios, used extensively in the recent years for acceleration of data parallel computations. CUDA or Compute Unified Device Architecture is a C-based programming model proposed by NVIDIA for leveraging the parallel computing capabilities of the GPU for general purpose computations. This paper attempts to integrate CUDA acceleration into the Hadoop distributed processing framework to create a heterogeneous high performance image processing system. As Hadoop primarily is used for text analysis, this involves facilitating efficient image processing in Hadoop. Our experimental evaluations using a Adaboost based face detection algorithm indicate that CUDA-enabling a Hadoop cluster, even with low-end GPUs, can result in a 25% improvement in data processing throughput, indicating that an integration of these two technologies can help build scalable, high throughput, power and cost-efficient computing platforms.
GPGPU, Hadoop, Map-reduce, CUDA
R. Malakar and N. Vydyanathan, "A CUDA-enabled Hadoop cluster for fast distributed image processing," 2013 National Conference on Parallel Computing Technologies (PARCOMPTECH), Bangalore, India, 2013, pp. 1-5.