Parallel and Distributed Computing, International Symposium on (2010)
July 7, 2010 to July 9, 2010
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ISPDC.2010.22
Todays commercial off-the-shelf computer systems are multicore computing systems as a combination of CPU, graphic processor (GPU) and custom devices. In comparison with CPU cores, graphic cards are capable to execute hundreds up to thousands compute units in parallel. To benefit from these GPU computing resources, applications have to be parallelized and adapted to the target architecture. In this paper we show our experience in applying the NQueens puzzle solution on GPUs using Nvidia's CUDA (Compute Unified Device Architecture) technology. Using the example of memory usage and memory access, we demonstrate that optimizations of CUDA programs may have contrary results on different CUDA architectures. Evaluation results will point out, that it is not sufficient to use new programming languages or compilers to achieve best results with emerging graphic card computing.
GPGPU, memory access trade-off
Andreas Polze, Martin von Löwis, Frank Feinbube, Bernhard Rabe, "NQueens on CUDA: Optimization Issues", Parallel and Distributed Computing, International Symposium on, vol. 00, no. , pp. 63-70, 2010, doi:10.1109/ISPDC.2010.22