2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) (2016)
Washington, DC, USA
May 1, 2016 to May 3, 2016
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/FCCM.2016.34
Sorting has tremendous usage in the applications that handle massive amount of data. Existing techniques accelerate sorting using multiprocessors or GPGPUs where a data set is partitioned into disjunctive subsets to allow multiple sorting threads working in parallel. Hardware sorters implemented in FPGAs have the potential of providing high-speed and low-energy solutions but the partition algorithms used in software systems are so data dependent that they cannot be easily adopted. The speed of most current sequential sorters still hangs around 1 number/cycle. Recently a new hardware merge sorter broke this speed limit by merging a large number of sorted sequences at a speed proportional to the number of sequences. This paper significantly improves its area and speed scalability by allowing stalls and variable sorting rate. A 32-port parallel merge-tree that merges 32 sequences is implemented in a Virtex-7 FPGA. It merges sequences at an average rate of 31.05 number/cycle and reduces the total sorting time by 160 times compared with traditional sequential sorters.
Sorting, Hardware, Corporate acquisitions, Field programmable gate arrays, Servers, Software, Bandwidth
W. Song, D. Koch, M. Lujan and J. Garside, "Parallel Hardware Merge Sorter," 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Washington, DC, USA, 2016, pp. 95-102.