P. Heidelberger, A. Norton, J.T. Robinson, "Parallel Quicksort Using FetchAndAdd," IEEE Transactions on Computers, vol. 39, no. 1, pp. 133138, January, 1990.  
A parallelization of the Quicksort algorithm that is suitable for execution on a shared memory multiprocessor with an efficient implementation of the fetchandadd operation is presented. The partitioning phase of Quicksort, which has been considered a serial bottleneck, is cooperatively executed in parallel by many processors through the use of fetchandadd. The parallel algorithm maintains the inplace nature of Quicksort, thereby allowing internal sorting of large arrays. A class of fetchandaddbased algorithms for dynamically scheduling processors to subproblems is presented. Adaptive scheduling algorithms in this class have low overhead and achieve effective processor load balancing. The basic algorithm is shown to execute in an average of O(log(N)) time on an Nprocessor PRAM (parallel randomaccess machine) assuming a constanttime fetchandadd. Estimated speedups, based on simulations, are also presented for cases when the number of items to be sorted is much greater than the number of processors.
