New York, NY, USA
Nov. 12, 1990 to Nov. 16, 1990
Chatterjee , Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
Blelloch , Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
Zagha , Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
The authors describe an optimized implementation of a set of scan (also called all-prefix-sums) primitives on a single processor of a CRAY Y-MP, and demonstrate that their use leads to greatly improved performance for several applications that cannot be vectorized with existing computer technology. The algorithm used to implement the scans is based on an algorithm for parallel computers. A set of segmented versions of these scans is only marginally more expensive than the unsegmented versions. The authors describe a radix sorting routine based on the scans that is 13 times faster than a Fortran version and within 20% of a highly optimized library sort routine, three operations on trees that are between 10 to 20 times faster than the corresponding C versions, and a connectionist learning algorithm that is 10 times faster than the corresponding C version for sparse and irregular networks.
sparse networks, vector algorithms, scan primitives, segmental scans, tree operations, unsegmented scans, plus-scan, vector computers, all-prefix-sums, CRAY Y-MP, unsegmented versions, radix sorting routine, connectionist learning algorithm, irregular networks
Chatterjee, Blelloch, Zagha, "Scan primitives for vector computers", SC, 1990, SC Conference, SC Conference 1990, pp. 666-675, doi:10.1109/SUPERC.1990.130084