Synchronous Parallel Processing of Big-Data Analytics Services to Optimize Performance in Federated Clouds
Honolulu, HI, USA USA
June 24, 2012 to June 29, 2012
Parallelization of big-data analytics services over a federation of heterogeneous clouds has been considered to improve performance. However, contrary to common intuition, there is an inherent tradeoff between the level of parallelism and the performance for big-data analytics principally because of a significant delay for big-data to get transferred over the network. The data transfer delay can be comparable or even higher than the time required to compute data. To address the aforementioned tradeoff, this paper determines: (a) how many and which computing nodes in federated clouds should be used for parallel execution of big-data analytics; (b) opportunistic apportioning of big-data to these computing nodes in a way to enable synchronized completion at best-effort performance; and (c) sequence of apportioned, different sizes of big-data chunks to be computed in each node so that transfer of a chunk is overlapped as much as possible with the computation of the previous chunk in the node. In this regard, Maximally Overlapped Bin-packing driven Bursting (MOBB) algorithm is proposed, which improve the performance by up to 60% against existing approaches.
Delay, Data mining, Parallel processing, Synchronization, Computational modeling, Estimation, Sorting, parallelization, federated clouds, big-data analytics
Gueyoung Jung, Nathan Gnanasambandam, Tridib Mukherjee, "Synchronous Parallel Processing of Big-Data Analytics Services to Optimize Performance in Federated Clouds", CLOUD, 2012, 2013 IEEE Sixth International Conference on Cloud Computing, 2013 IEEE Sixth International Conference on Cloud Computing 2012, pp. 811-818, doi:10.1109/CLOUD.2012.108