The Community for Technology Leaders
Green Image
<p><b>Abstract</b>—We present a hardware-algorithm for sorting <tmath>$N$</tmath> elements using either a <it>p</it>-sorter or a sorting network of fixed I/O size <tmath>$p$</tmath> while strictly enforcing conflict-free memory accesses. To the best of our knowledge, this is the first realistic design that achieves optimal time performance, running in <tmath>$\Theta ( {\frac{N \log N}{p \log p}})$</tmath> time for all ranges of <tmath>$N$</tmath>. Our result completely resolves the problem of designing an implementable, time-optimal algorithm for sorting <tmath>$N$</tmath> elements using a <it>p</it>-sorter. More importantly, however, our result shows that, in order to achieve optimal time performance, all that is needed is a sorting network of depth <tmath>$O(\log^2 p)$</tmath> such as, for example, Batcher's classic bitonic sorting network.</p>
Special-purpose architectures, hardware-algorithms, sorting networks, columnsort, VLSI.

M. C. Pinotti, S. Olariu and S. Q. Zheng, "An Optimal Hardware-Algorithm for Sorting Using a Fixed-Size Parallel Sorting Device," in IEEE Transactions on Computers, vol. 49, no. , pp. 1310-1324, 2000.
93 ms
(Ver 3.3 (11022016))