Issue No. 06 - June (2009 vol. 58)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TC.2009.32
Alex Aletà , UPC, Barcelona
Josep M. Codina , Intel Labs Barcelona, Barcelona
Jesús Sánchez , Intel Labs, Barcelona
Antonio González , Intel Labs Barcelona, Barcelona
David Kaeli , Northeastern University, Boston
This paper presents AGAMOS, a technique to modulo schedule loops on clustered microarchitectures. The proposed scheme uses a multilevel graph partitioning strategy to distribute the workload among clusters and reduces the number of intercluster communications at the same time. Partitioning is guided by approximate schedules (i.e., pseudoschedules), which take into account all of the constraints that influence the final schedule. To further reduce the number of intercluster communications, heuristics for instruction replication are included. The proposed scheme is evaluated using the SPECfp95 programs. The described scheme outperforms a state-of-the-art scheduler for all programs and different cluster configurations. For some configurations, the speedup obtained when using this new scheme is greater than 40 percent, and for selected programs, performance can be more than doubled.
Clustered microarchitectures, ILP, instruction replication, modulo scheduling, statically scheduled processors.
A. Aletà, A. González, D. Kaeli, J. Sánchez and J. M. Codina, "AGAMOS: A Graph-Based Approach to Modulo Scheduling for Clustered Microarchitectures," in IEEE Transactions on Computers, vol. 58, no. , pp. 770-783, 2009.