Issue No.01 - Jan. (2013 vol.62)
pp: 173-185
The Van Luong , Lab. d'Inf. Fondamentale de Lille (LIFL), Univ. de Lille 1, Villeneuve d'Ascq, France
N. Melab , Lab. d'Inf. Fondamentale de Lille (LIFL), Univ. de Lille 1, Villeneuve d'Ascq, France
E. Talbi , Lab. d'Inf. Fondamentale de Lille (LIFL), Univ. de Lille 1, Villeneuve d'Ascq, France
Local search metaheuristics (LSMs) are efficient methods for solving complex problems in science and industry. They allow significantly to reduce the size of the search space to be explored and the search time. Nevertheless, the resolution time remains prohibitive when dealing with large problem instances. Therefore, the use of GPU-based massively parallel computing is a major complementary way to speed up the search. However, GPU computing for LSMs is rarely investigated in the literature. In this paper, we introduce a new guideline for the design and implementation of effective LSMs on GPU. Very efficient approaches are proposed for CPU-GPU data transfer optimization, thread control, mapping of neighboring solutions to GPU threads, and memory management. These approaches have been experimented using four well-known combinatorial and continuous optimization problems and four GPU configurations. Compared to a CPU-based execution, accelerations up to \times 80 are reported for the large combinatorial problems and up to \times 240 for a continuous problem. Finally, extensive experiments demonstrate the strong potential of GPU-based LSMs compared to cluster or grid-based parallel architectures.
Graphics processing unit, Instruction sets, Encoding, Optimization, Computer architecture, Parallel processing, Search problems,performance evaluation, Parallel metaheuristics, local search metaheuristics, GPU computing
The Van Luong, N. Melab, E. Talbi, "GPU Computing for Parallel Local Search Metaheuristic Algorithms", IEEE Transactions on Computers, vol.62, no. 1, pp. 173-185, Jan. 2013, doi:10.1109/TC.2011.206
[1] S. Ryoo, C.I. Rodrigues, S.S. Stone, J.A. Stratton, S.-Z. Ueng, S.S. Baghsorkhi, and W.W. Hwu, “Program Optimization Carving for GPU Computing,” J. Parallel Distributed Computing, vol. 68, no. 10, pp. 1389-1401, 2008.
[2] S. Che, M. Boyer, J. Meng, D. Tarjan, J.W. Sheaffer, and K. Skadron, “A Performance Study of General-Purpose Applications on Graphics Processors Using Cuda,” J. Parallel Distributed Computing, vol. 68, no. 10, pp. 1370-1380, 2008.
[3] J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable Parallel Programming with Cuda,” ACM Queue, vol. 6, no. 2, pp. 40-53, 2008.
[4] C. Tenllado, J. Setoain, M. Prieto, L. Piñuel, and F. Tirado, “Parallel Implementation of the 2D Discrete Wavelet Transform on Graphics Processing Units: Filter Bank versus Lifting,” IEEE Trans. Parallel and Distributed Systems, vol. 19, no. 3, pp. 299-310, Mar. 2008.
[5] J.-M. Li, X.-J. Wang, R.-S. He, and Z.-X. Chi, “An Efficient Fine-Grained Parallel Genetic Algorithm Based on GPU-Accelerated,” Proc. IFIP Int'l Conf. Network and Parallel Computing Workshops, pp. 855-862, 2007.
[6] D.M. Chitty, “A Data Parallel Approach to Genetic Programming Using Programmable Graphics Hardware,” Proc. Ann. Conf. Genetic and Evolutionary Computation (GECCO), pp. 1566-1573, 2007.
[7] T.-T. Wong and M.L. Wong, “Parallel Evolutionary Algorithms on Consumer-Level Graphics Processing Unit,” Proc. Parallel Evolutionary Computations, pp. 133-155, 2006.
[8] É.D. Taillard, “Robust Taboo Search for the Quadratic Assignment Problem,” Parallel Computing, vol. 17, nos. 4/5, pp. 443-455, 1991.
[9] D. Pointcheval, “A New Identification Scheme Based on the Perceptrons Problem,” Proc. Workshop Theory and Application of Cryptographic Techniques (EUROCRYPT), pp. 319-328, 1995.
[10] M. Dorigo and L.M. Gambardella, “Ant Colony System: A Cooperative Learning Approach to the Traveling Salesman Problem,” IEEE Trans. Evolutionary Computation, vol. 1, no. 1, pp. 53-66, Apr. 1997.
[11] E. Lutton and J.L. Véhel, “Holder Functions and Deception of Genetic Algorithms,” IEEE Trans. Evolutionary Computation, vol. 2, no. 2, pp. 56-71, July 1998.
[12] E.-G. Talbi, Metaheuristics: From Design to Implementation. Wiley, 2009.
[13] J. Chakrapani and J. Skorin-Kapov, “Massively Parallel Tabu Search for the Quadratic Assignment Problem,” Annals of Operations Research, vol. 41, pp. 327-341, 1993.
[14] T. Crainic, M. Toulouse, and M. Gendreau, “Parallel Asynchronous Tabu Search for Multicommodity Location-Allocation with Balancing Requirements,” Annals of Operations Research, vol. 63, pp. 277-299, 1995.
[15] B.-L. Garcia, J.-Y. Potvin, and J.-M. Rousseau, “A Parallel Implementation of the Tabu Search Heuristic for Vehicle Routing Problems with Time Window Constraints,” Computers and Operations Research, vol. 21, no. 9, pp. 1025-1033, 1994.
[16] T.D. Braun, H.J. Siegel, N. Beck, L. Bölöni, M. Maheswaran, A.I. Reuther, J.P. Robertson, M.D. Theys, B. Yao, D.A. Hensgen, and R.F. Freund, “A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems,” J. Parallel and Distributed Computing, vol. 61, no. 6, pp. 810-837, 2001.
[17] T. James, C. Rego, and F. Glover, “A Cooperative Parallel Tabu Search Algorithm for the Quadratic Assignment Problem,” European J. Operational Research, vol. 195, pp. 810-826, 2009.
[18] A. Bevilacqua, “A Methodological Approach to Parallel Simulated Annealing on an SMP System,” J. Parallel and Distributed Computing, vol. 62, no. 10, pp. 1548-1570, 2002.
[19] A.-A. Tantar, N. Melab, and E.-G. Talbi, “A Comparative Study of Parallel Metaheuristics for Protein Structure Prediction on the Computational Grid,” Proc. IEEE Int'l Parallel and Distributed Processing Symp. (IPDPS), pp. 1-10, 2007.
[20] J. Nickolls and W.J. Dally, “The GPU Computing Era,” IEEE Micro, vol. 30, no. 2, pp. 56-69, Mar./Apr. 2010.
[21] NVIDIA, CUDA Programming Guide Version 4.0, 2011.
[22] J.W. Choi, A. Singh, and R.W. Vuduc, “Model-Driven Autotuning of Sparse Matrix-Vector Multiply on GPUs,” ACM SIGPLAN Notices, vol. 45, pp. 115-126, Jan. 2010.
[23] A. Nukada and S. Matsuoka, “Auto-Tuning 3-D FFT Library for CUDA GPUs,” Proc. Conf. High Performance Computing Networking, Storage and Analysis, ser. SC '09, pp. 30:1-30:10, 2009.
[24] R. Chelouah and P. Siarry, “Tabu Search Applied to Global Optimization,” European J. Operational Research, vol. 123, no. 2, pp. 256-270, 2000.
[25] G. Jost, H. Jin, D.A. Mey, and F.F. Hatay, “Comparing the Openmp, Mpi, and Hybrid Programming Paradigm on an SMP Cluster,” NASA technical report, 2003.
[26] N. Melab, S. Cahon, and E.-G. Talbi, “Grid Computing for Parallel Bioinspired Algorithms,” J. Parallel Distributed Computing, vol. 66, no. 8, pp. 1052-1061, 2006.
[27] K. Group, OpenCL 1.1 Quick Reference Card, 2011.
[28] S. Cahon, N. Melab, and E.-G. Talbi, “Paradiseo: A Framework for the Reusable Design of Parallel and Distributed Metaheuristics,” J. Heuristics, vol. 10, no. 3, pp. 357-380, 2004.