2016 45th International Conference on Parallel Processing (ICPP) (2016)
Philadelphia, PA, USA
Aug. 16, 2016 to Aug. 19, 2016
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICPP.2016.69
The increasing main memory capacity and theexplosion of big data has fueled the development of in-memorybig data management and processing. By offering efficient inmemory parallel execution model which eliminates disk I/Obottleneck, existing in-memory cluster computing platforms(e.g., Flink and Spark) have already been proven to beoutstanding platforms for big data processing. However, theseplatforms are merely CPU-based systems now. This paper hasproposed GFlink, an in-memory computing architecture onheterogeneous CPU-GPU clusters for big data. Our proposed architecture extends the original Flink fromCPU clusters to heterogeneous CPU-GPU clusters, greatlyimproving the computational power of Flink. Furthermore, we proposed an abstract GPU-based model named as GDST, hiding the programming complexity of GPUs behind the simpleand familiar high-level interfaces, and automatically managingtask partitioning, device memory, and parallelization on MultiCore GPUs. To achieve high performance and good loadbalance, an efficient JVM-GPU communication strategy andan adaptive locality-aware scheduling scheme for three-stagepipeline execution are proposed. Extensive experiment resultsindicate that not only the high computational power of GPUscan be efficiently utilized, but also the implementations onGFlink outperforms that on the original CPU-based Flink.
Graphics processing units, Computational modeling, Programming, Memory management, Big data, Sparks
C. Chen, K. Li, A. Ouyang, Z. Tang and K. Li, "GFlink: An In-Memory Computing Architecture on Heterogeneous CPU-GPU Clusters for Big Data," 2016 45th International Conference on Parallel Processing (ICPP), Philadelphia, PA, USA, 2016, pp. 542-551.