Transitive closure on the cell broadband engine: A study on self-scheduling in a multicore processor
Parallel and Distributed Processing Symposium, International (2009)
May 23, 2009 to May 29, 2009
Sudhir Vinjamuri , Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue EEB-244, Los Angeles USA 90007
Viktor K. Prasanna , Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue EEB-244, Los Angeles USA 90007
In this paper, we present a mappingmethodology and optimizations for solving transitive closure on the Cell multicore processor. Using our approach, it is possible to achieve near peak performance for transitive closure on the Cell processor. We first parallelize the Standard Floyd Warshall algorithm and show through analysis and experimental results that data communication is a bottleneck for performance and scalability. We parallelize a cache optimized version of Floyd Warshall algorithm to remove the memory bottleneck. As is the case with several scientific computing and industrial applications on a multicore processor, synchronization and scheduling of the cores plays a crucial role in determining the performance of this algorithm. We define a self-scheduling mechanism for the cores of a multicore processor and design a self-scheduler for Blocked Floyd Warshall algorithm on the Cell multicore processor to remove the scheduling bottleneck. We also present optimizations in scheduling order to remove synchronization points. Our implementations achieved up to 78GFLOPS.
Sudhir Vinjamuri, Viktor K. Prasanna, "Transitive closure on the cell broadband engine: A study on self-scheduling in a multicore processor", Parallel and Distributed Processing Symposium, International, vol. 00, no. , pp. 1-11, 2009, doi:10.1109/IPDPS.2009.5161072