|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Georgios Goumas, Nikolaos Drosinos, Nectarios Koziris, "Communication-Aware Supernode Shape," IEEE Transactions on Parallel and Distributed Systems, vol. 20, no. 4, pp. 498-511, April, 2009. | |||
| BibTex | x | ||
| @article{ 10.1109/TPDS.2008.114, author = {Georgios Goumas and Nikolaos Drosinos and Nectarios Koziris}, title = {Communication-Aware Supernode Shape}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {20}, number = {4}, issn = {1045-9219}, year = {2009}, pages = {498-511}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPDS.2008.114}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Parallel and Distributed Systems TI - Communication-Aware Supernode Shape IS - 4 SN - 1045-9219 SP498 EP511 EPD - 498-511 A1 - Georgios Goumas, A1 - Nikolaos Drosinos, A1 - Nectarios Koziris, PY - 2009 KW - I/O and Data Communications KW - Load balancing and task assignment KW - Parallel processors KW - Parallel Architectures KW - Scheduling and task partitioning KW - Data communications VL - 20 JA - IEEE Transactions on Parallel and Distributed Systems ER - | |||
[1] F. Irigoin and R. Triolet, “Supernode Partitioning,” Proc. 15th Ann. ACM SIGACT-SIGPLAN Symp. Principles of Programming Languages (POPL '88), pp. 319-329, Jan. 1988.
[2] W. Press, S. Teukolsky, W. Vetterling, and B. Flannery, Numerical Recipes in C: The Art of Scientific Computing. Cambridge Univ. Press, 1992.
[3] B.D. Acunto, Computational Methods for PDE in Mechanics. World Scientific, 2004.
[4] K. Morton and D. Mayers, Numerical Solution of Partial Differential Equations. Cambridge Univ. Press, 2005.
[5] J. Ramanujam and P. Sadayappan, “Tiling Multidimensional Iteration Spaces for Multicomputers,” J. Parallel and Distributed Computing, vol. 16, pp. 108-120, 1992.
[6] R. Andonov, S. Balev, S. Rajopadhye, and N. Yanev, “Optimal Semi-Oblique Tiling,” IEEE Trans. Parallel and Distributed Systems, vol. 14, no. 9, pp. 944-960, Sept. 2003.
[7] P. Boulet, A. Darte, T. Risset, and Y. Robert, “(Pen)-Ultimate Tiling,” INTEGRATION, The VLSI J., vol. 17, pp. 33-51, 1994.
[8] P. Boulet, J. Dongarra, Y. Robert, and F. Vivien, “Static Tiling for Heterogeneous Computing Platforms,” J. Parallel Computing, vol. 25, no. 5, pp. 547-568, May 1999.
[9] G. Goumas, A. Sotiropoulos, and N. Koziris, “Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping,” Proc. 15th Int'l Parallel and Distributed Processing Symp. (IPDPS '01), Apr. 2001.
[10] G. Goumas, M. Athanasaki, and N. Koziris, “An Efficient Code Generation Technique for Tiled Iteration Spaces,” IEEE Trans. Parallel and Distributed Systems, vol. 14, no. 10, pp. 1021-1034, Oct. 2003.
[11] E. Hodzic and W. Shang, “On Supernode Transformation with Minimized Total Running Time,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 5, pp. 417-428, May 1998.
[12] G. Goumas, N. Drosinos, M. Athanasaki, and N. Koziris, “Message-Passing Code Generation for Non-Rectangular Tiling Transformations,” J. Parallel Computing, vol. 32, no. 10, pp.711-732, Nov. 2006.
[13] N. Drosinos and N. Koziris, “Performance Comparison of Pure MPI versus Hybrid MPI-OpenMP Parallelization Models on SMP Clusters,” Proc. 18th Int'l Parallel and Distributed Processing Symp. (IPDPS '04), p. 10, Apr. 2004.
[14] E. Hodzic and W. Shang, “On Time Optimal Supernode Shape,” IEEE Trans. Parallel and Distributed Systems, vol. 13, no. 12, pp.1220-1233, Dec. 2002.
[15] K. Högstedt, L. Carter, and J. Ferrante, “On the Parallel Execution Time of Tiled Loops,” IEEE Trans. Parallel and Distributed Systems, vol. 14, no. 3, pp. 307-321, Mar. 2003.
[16] N. Koziris, A. Sotiropoulos, and G. Goumas, “A Pipelined Schedule to Minimize Completion Time for Loop Tiling with Computation and Communication Overlapping,” J. Parallel and Distributed Computing, vol. 63, no. 11, pp. 1138-1151, Nov. 2003.
[17] H. Ohta, Y. Saito, M. Kainaga, and H. Ono, “Optimal Tile Size Adjustment in Compiling General DOACROSS Loop Nests,” Proc. Ninth Int'l Conf. Supercomputing (ICS '95), pp. 270-279, July 1995.
[18] Y. Song and Z. Li, “Impact of Tile-Size Selection for Skewed Tiling,” Proc. Fifth Workshop Interaction between Compilers and Architectures (INTERACT '01), Jan. 2001.
[19] P. Tang and J. Xue, “Generating Efficient Tiled Code for Distributed Memory Machines,” J. Parallel Computing, vol. 26, no. 11, pp. 1369-1410, 2000.
[20] J. Xue, “On Tiling as a Loop Transformation,” Parallel Processing Letters, vol. 7, no. 4, pp. 409-424, 1997.
[21] J. Xue, “Communication-Minimal Tiling of Uniform Dependence Loops,” J. Parallel and Distributed Computing, vol. 42, no. 1, pp.42-59, 1997.
[22] J. Xue and W. Cai, “Time-Minimal Tiling When Rise Is Larger Than Zero,” J. Parallel Computing, vol. 28, no. 6, pp. 915-939, 2002.
[23] S. Parsa and S. Lotfi, “A New Genetic Algorithm for Loop Tiling,” J. Supercomputing, vol. 37, no. 3, pp. 249-269, 2006.
[24] L. Renganarayanan, D. Kim, S. Rajopadhye, and M.M. Strout, “Parameterized Tiled Loops for Free,” Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI '07), pp.405-414, 2007.
[25] S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan, “Effective Automatic Parallelization of Stencil Computations,” Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI '07), pp.235-244, 2007.
[26] N. Ahmed, N. Mateev, and K. Pingali, “Tiling Imperfectly-Nested Loop Nests,” Proc. ACM/IEEE Conf. Supercomputing, p. 31, 2000.
[27] R. Andonov, P. Calland, S. Niar, S. Rajopadhye, and N. Yanev, “First Steps towards Optimal Oblique Tile Sizing,” Proc. Eighth Int'l Workshop Compilers for Parallel Computers, pp.351-366, Jan. 2000.
[28] K. Högstedt, L. Carter, and J. Ferrante, “Selecting Tile Shape for Minimal Execution Time,” Proc. 11th ACM Symp. Parallel Algorithms and Architectures (SPAA '99), pp. 201-211, 1999.
[29] R. Allen, K. Kennedy, and J.R. Allen, Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann, 2001.
[30] E. D'Hollander, “Partitioning and Labeling of Loops by Unimodular Transformations,” IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 4, pp. 465-476, July 1992.
[31] M. Kandemir, R. Bordawekar, A. Choudhary, and J. Ramanujam, “A Unified Tiling Approach for Out-of-Core Computations,” Proc.Sixth Workshop Compilers for Parallel Computers, pp. 323-334, 1996.
[32] G.E. Karniadakis and R.M. Kirby, Parallel Scientific Computing in C++ and MPI: A Seamless Approach to Parallel Algorithms and Their Implementation. Cambridge Univ. Press, 2002.
[33] W. Shang and J. Fortes, “Time Optimal Linear Schedules for Algorithms with Uniform Dependencies,” IEEE Trans. Computers, vol. 40, no. 6, pp. 723-742, June 1991.
[34] A. Darte, L. Khachiyan, and Y. Robert, “Linear Scheduling Is Nearly Optimal,” Parallel Processing Letters, vol. 1, no. 2, pp. 73-81, 1991.
[35] W. Shang and J. Fortes, “On Time Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays,” IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 3, pp. 350-363, May/June 1992.
[36] P. Tang and J. Zigman, “Reducing Data Communication Overhead for DOACROSS Loop Nests,” Proc. Eighth Int'l Conf. Supercomputing (ICS '94), pp. 44-53, July 1994.
[37] M. Wolf and M. Lam, “A Loop Transformation Theory and an Algorithm to Maximize Parallelism,” IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 4, pp. 452-471, Oct. 1991.
[38] G. Rivera and C.-W. Tseng, “Tiling Optimizations for 3D Scientific Computations,” Proc. ACM/IEEE Conf. Supercomputing, p. 32, 2000.

