13th Symposium on High Performance Interconnects (HOTI'05)
Optimised Global Reduction on QsNet^ⅠⅠ
Stanford, California, USA
August 17-August 19
ISBN: 0-7695-2449-4
In this paper we describe how QsNet^II supports reduction, a key collective for massively parallel applications. Results from jobs run on a 512-node quad CPU cluster show excellent scaling, with the average time to execute a 2048 process global sum being 22 microsecs.