
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Georgios Goumas, Nikolaos Drosinos, Nectarios Koziris, "CommunicationAware Supernode Shape," IEEE Transactions on Parallel and Distributed Systems, vol. 20, no. 4, pp. 498511, April, 2009.  
BibTex  x  
@article{ 10.1109/TPDS.2008.114, author = {Georgios Goumas and Nikolaos Drosinos and Nectarios Koziris}, title = {CommunicationAware Supernode Shape}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {20}, number = {4}, issn = {10459219}, year = {2009}, pages = {498511}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPDS.2008.114}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Parallel and Distributed Systems TI  CommunicationAware Supernode Shape IS  4 SN  10459219 SP498 EP511 EPD  498511 A1  Georgios Goumas, A1  Nikolaos Drosinos, A1  Nectarios Koziris, PY  2009 KW  I/O and Data Communications KW  Load balancing and task assignment KW  Parallel processors KW  Parallel Architectures KW  Scheduling and task partitioning KW  Data communications VL  20 JA  IEEE Transactions on Parallel and Distributed Systems ER   
[1] F. Irigoin and R. Triolet, “Supernode Partitioning,” Proc. 15th Ann. ACM SIGACTSIGPLAN Symp. Principles of Programming Languages (POPL '88), pp. 319329, Jan. 1988.
[2] W. Press, S. Teukolsky, W. Vetterling, and B. Flannery, Numerical Recipes in C: The Art of Scientific Computing. Cambridge Univ. Press, 1992.
[3] B.D. Acunto, Computational Methods for PDE in Mechanics. World Scientific, 2004.
[4] K. Morton and D. Mayers, Numerical Solution of Partial Differential Equations. Cambridge Univ. Press, 2005.
[5] J. Ramanujam and P. Sadayappan, “Tiling Multidimensional Iteration Spaces for Multicomputers,” J. Parallel and Distributed Computing, vol. 16, pp. 108120, 1992.
[6] R. Andonov, S. Balev, S. Rajopadhye, and N. Yanev, “Optimal SemiOblique Tiling,” IEEE Trans. Parallel and Distributed Systems, vol. 14, no. 9, pp. 944960, Sept. 2003.
[7] P. Boulet, A. Darte, T. Risset, and Y. Robert, “(Pen)Ultimate Tiling,” INTEGRATION, The VLSI J., vol. 17, pp. 3351, 1994.
[8] P. Boulet, J. Dongarra, Y. Robert, and F. Vivien, “Static Tiling for Heterogeneous Computing Platforms,” J. Parallel Computing, vol. 25, no. 5, pp. 547568, May 1999.
[9] G. Goumas, A. Sotiropoulos, and N. Koziris, “Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping,” Proc. 15th Int'l Parallel and Distributed Processing Symp. (IPDPS '01), Apr. 2001.
[10] G. Goumas, M. Athanasaki, and N. Koziris, “An Efficient Code Generation Technique for Tiled Iteration Spaces,” IEEE Trans. Parallel and Distributed Systems, vol. 14, no. 10, pp. 10211034, Oct. 2003.
[11] E. Hodzic and W. Shang, “On Supernode Transformation with Minimized Total Running Time,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 5, pp. 417428, May 1998.
[12] G. Goumas, N. Drosinos, M. Athanasaki, and N. Koziris, “MessagePassing Code Generation for NonRectangular Tiling Transformations,” J. Parallel Computing, vol. 32, no. 10, pp.711732, Nov. 2006.
[13] N. Drosinos and N. Koziris, “Performance Comparison of Pure MPI versus Hybrid MPIOpenMP Parallelization Models on SMP Clusters,” Proc. 18th Int'l Parallel and Distributed Processing Symp. (IPDPS '04), p. 10, Apr. 2004.
[14] E. Hodzic and W. Shang, “On Time Optimal Supernode Shape,” IEEE Trans. Parallel and Distributed Systems, vol. 13, no. 12, pp.12201233, Dec. 2002.
[15] K. Högstedt, L. Carter, and J. Ferrante, “On the Parallel Execution Time of Tiled Loops,” IEEE Trans. Parallel and Distributed Systems, vol. 14, no. 3, pp. 307321, Mar. 2003.
[16] N. Koziris, A. Sotiropoulos, and G. Goumas, “A Pipelined Schedule to Minimize Completion Time for Loop Tiling with Computation and Communication Overlapping,” J. Parallel and Distributed Computing, vol. 63, no. 11, pp. 11381151, Nov. 2003.
[17] H. Ohta, Y. Saito, M. Kainaga, and H. Ono, “Optimal Tile Size Adjustment in Compiling General DOACROSS Loop Nests,” Proc. Ninth Int'l Conf. Supercomputing (ICS '95), pp. 270279, July 1995.
[18] Y. Song and Z. Li, “Impact of TileSize Selection for Skewed Tiling,” Proc. Fifth Workshop Interaction between Compilers and Architectures (INTERACT '01), Jan. 2001.
[19] P. Tang and J. Xue, “Generating Efficient Tiled Code for Distributed Memory Machines,” J. Parallel Computing, vol. 26, no. 11, pp. 13691410, 2000.
[20] J. Xue, “On Tiling as a Loop Transformation,” Parallel Processing Letters, vol. 7, no. 4, pp. 409424, 1997.
[21] J. Xue, “CommunicationMinimal Tiling of Uniform Dependence Loops,” J. Parallel and Distributed Computing, vol. 42, no. 1, pp.4259, 1997.
[22] J. Xue and W. Cai, “TimeMinimal Tiling When Rise Is Larger Than Zero,” J. Parallel Computing, vol. 28, no. 6, pp. 915939, 2002.
[23] S. Parsa and S. Lotfi, “A New Genetic Algorithm for Loop Tiling,” J. Supercomputing, vol. 37, no. 3, pp. 249269, 2006.
[24] L. Renganarayanan, D. Kim, S. Rajopadhye, and M.M. Strout, “Parameterized Tiled Loops for Free,” Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI '07), pp.405414, 2007.
[25] S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan, “Effective Automatic Parallelization of Stencil Computations,” Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI '07), pp.235244, 2007.
[26] N. Ahmed, N. Mateev, and K. Pingali, “Tiling ImperfectlyNested Loop Nests,” Proc. ACM/IEEE Conf. Supercomputing, p. 31, 2000.
[27] R. Andonov, P. Calland, S. Niar, S. Rajopadhye, and N. Yanev, “First Steps towards Optimal Oblique Tile Sizing,” Proc. Eighth Int'l Workshop Compilers for Parallel Computers, pp.351366, Jan. 2000.
[28] K. Högstedt, L. Carter, and J. Ferrante, “Selecting Tile Shape for Minimal Execution Time,” Proc. 11th ACM Symp. Parallel Algorithms and Architectures (SPAA '99), pp. 201211, 1999.
[29] R. Allen, K. Kennedy, and J.R. Allen, Optimizing Compilers for Modern Architectures: A DependenceBased Approach. Morgan Kaufmann, 2001.
[30] E. D'Hollander, “Partitioning and Labeling of Loops by Unimodular Transformations,” IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 4, pp. 465476, July 1992.
[31] M. Kandemir, R. Bordawekar, A. Choudhary, and J. Ramanujam, “A Unified Tiling Approach for OutofCore Computations,” Proc.Sixth Workshop Compilers for Parallel Computers, pp. 323334, 1996.
[32] G.E. Karniadakis and R.M. Kirby, Parallel Scientific Computing in C++ and MPI: A Seamless Approach to Parallel Algorithms and Their Implementation. Cambridge Univ. Press, 2002.
[33] W. Shang and J. Fortes, “Time Optimal Linear Schedules for Algorithms with Uniform Dependencies,” IEEE Trans. Computers, vol. 40, no. 6, pp. 723742, June 1991.
[34] A. Darte, L. Khachiyan, and Y. Robert, “Linear Scheduling Is Nearly Optimal,” Parallel Processing Letters, vol. 1, no. 2, pp. 7381, 1991.
[35] W. Shang and J. Fortes, “On Time Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays,” IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 3, pp. 350363, May/June 1992.
[36] P. Tang and J. Zigman, “Reducing Data Communication Overhead for DOACROSS Loop Nests,” Proc. Eighth Int'l Conf. Supercomputing (ICS '94), pp. 4453, July 1994.
[37] M. Wolf and M. Lam, “A Loop Transformation Theory and an Algorithm to Maximize Parallelism,” IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 4, pp. 452471, Oct. 1991.
[38] G. Rivera and C.W. Tseng, “Tiling Optimizations for 3D Scientific Computations,” Proc. ACM/IEEE Conf. Supercomputing, p. 32, 2000.