The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - January (2008 vol.19)
pp: 52-65
ABSTRACT
The processing elements of many modern tightly coupled multicomputers are connected via mesh or toroidal networks. Such interconnects are simple and highly scalable, but suffer from high fragmentation, low utilization, and insufficient fault tolerance when the resources allocated to each job are dedicated. High dimension interconnects may be more efficient in certain cases, but are based on complex and expensive components, and scale poorly. We present a novel hardware/software architectural approach that detaches the processing elements of the system from the interconnect and augments the traditional toroidal topology to provide additional connectivity options and additional link redundancy. We explore the properties of the new "multi-toroidal" topology and the improvements it offers in resource utilization and failure tolerance. We present the results of extensive simulation studies to show that for practically important types of workloads the resource utilization may be increased by 50%, and in certain cases as much as 100% compared to toroidal machines, and is, in fact, close to the theoretically optimal case of a full crossbar interconnect. The combined hardware/software architectural innovation is a major significant improvement in resource utilization on top of the state of the art in scheduling algorithm research. Also, multi-toroidal multicomputers are able to work under link failure rates of 0.002 failures per week that would shut down toroidal machines. A variant of multi-toroidal architecture is implemented in the Blue Gene/L supercomputer.
INDEX TERMS
parallel architectures, scheduling and task partitioning, network topology
CITATION
Yariv Aridor, Tamar Domany, Oleg Goldshmidt, Yevgeny Kliteynik, Edi Shmueli, Jose E. Moreira, "Multitoroidal Interconnects For Tightly Coupled Supercomputers", IEEE Transactions on Parallel & Distributed Systems, vol.19, no. 1, pp. 52-65, January 2008, doi:10.1109/TPDS.2007.1118
REFERENCES
[1] N.R. Adiga et al., “An Overview of the Blue Gene/L Supercomputer,” Supercomputing, 2002.
[2] “Cray T3D System Architecture Overview,” technical report, Cray Research, Inc., 1993.
[3] D.G. Feitelson and M.A. Jette, “Improved Utilization and Responsiveness with Gang Scheduling,” Proc. Third Workshop Job Scheduling Strategies for Parallel Processing, pp. 238-261, 1997.
[4] R. Kessler and J. Schwarzmeier, “CRAY T3D: A New Dimension for Cray Research,” Proc. COMPCON '93, pp. 176-182, 1993.
[5] Earth Simulator, http:/www.es.jamstec.go.jp, 2007.
[6] W. Liu, V.M. Lo, K. Windisch, and B. Nitzberg, “Non-Contiguous Processor Allocation Algorithms for Distributed Memory Multicomputers,” Supercomputing, pp. 227-236, 1994.
[7] Top 500, http://www.top500.org/lists/200411/, 2007.
[8] Y. Aridor et al., “Multi-Toroidal Interconnects: Using Additional Communication Links to Improve Utilization of Parallel Computers,” Proc. 10th Workshop Job Scheduling Strategies for Parallel Processing, 2004.
[9] A.W. Mualem and D.G. Feitelson, “Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling,” IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 6, pp. 529-543, June 2001.
[10] D. Lifka, “The ANL/IBM SP Scheduling System,” Proc. First Workshop Job Scheduling Strategies for Parallel Processing, pp. 295-303, 1995.
[11] J. Skovira, W. Chan, H. Zhou, and D. Lifka, “The EASY— LoadLeveler API Project,” Proc. Second Workshop Job Scheduling Strategies for Parallel Processing, pp. 41-47, 1996.
[12] Y. Zhu, “Efficient Processor Allocation Strategies for Mesh-Connected Parallel Computers,” J. Parallel and Distributed Computing, vol. 16, pp. 328-337, 1992.
[13] K. Li and K.H. Cheng, “Job Scheduling in a Partitionable Mesh Using a Two-Dimensional Buddy System Partitioning Scheme,” IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 4, pp. 413-422, Apr. 1991.
[14] K. Li and K.H. Cheng, “A Two-Dimensional Buddy System for Dynamic Resource Allocation in a Partitionable Mesh-Connected System,” J. Parallel and Distributed Computing, vol. 12, pp. 79-83, 1991.
[15] P.J. Chuang and N.F. Tzeng, “An Efficient Submesh Allocation Strategy for Mesh Computer Systems,” Proc. 11th Int'l Conf. Distributed Computing Systems (ICDCS '91), pp. 256-263, 1991.
[16] P.J. Chuang and N.F. Tzeng, “Allocating Precise Submesh in Mesh-Connected Systems,” IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 2, pp. 211-217, Feb. 1994.
[17] J. Ding and L.N. Bhuyan, “An Adaptive Submesh Allocation Strategy for Two-Dimensional Mesh Connected Systems,” Proc. Int'l Conf. Parallel Processing (ICPP '93), vol. 2, pp. 193-200, 1993.
[18] P. Mohapatra, “Processor Allocation Using Partitioning in Mesh Connected Parallel Computers,” J. Parallel and Distributed Computing, vol. 39, pp. 181-190, 1996.
[19] J. Srisawat and N.A. Alexandridis, “Efficient Processor Allocation Scheme with Task Embedding for Partitionable Mesh Architectures,” Proc. 11th Int'l Conf. Computer Applications in Industry and Engineering, pp. 309-312, 1998.
[20] S. Yoo et al., “An Efficient Task Allocation Scheme for 2D Mesh Architectures,” IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 9, pp. 934-942, Sept. 1997.
[21] D. Das Sharma and D.K. Pradhan, “A Fast and Efficient Strategy for Submesh Allocation in Mesh-Connected Parallel Computers,” Proc. Fifth IEEE Symp. Parallel and Distributed Processing (SPDP '93), pp. 682-689, 1993.
[22] D. Das Sharma and D.K. Pradhan, “Submesh Allocation in Mesh Multicomputers Using Busy-List: A Best-Fit Approach with Complete Recognition Capability,” J. Parallel and Distributed Computing, vol. 36, pp. 106-118, 1996.
[23] D. Das Sharma and D.K. Pradhan, “Job Scheduling in Mesh Multicomputers,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 1, pp. 57-70, Jan. 1998.
[24] T. Liu et al., “A Submesh Allocation Scheme for Mesh-Connected Multiprocessor Systems,” Proc. Int'l Conf. Parallel Processing (ICPP'95), vol. 2, pp. 159-163, 1995.
[25] G. Kim and H. Yoon, “On Submesh Allocation for Mesh Multicomputers: A Best-Fit Allocation and a Virtual Submesh Allocation for Faulty Meshes,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 2, Feb. 1998.
[26] J. Srisawat and N.A. Alexandridis, “Reducing System Fragmentation in Dynamically Partitionable Mesh-Connected Architectures,” Proc. Int'l Conf. Parallel and Distributed Computing and Networks (PDCN '98), 1998.
[27] E. Krevat, J.G. Castanos, and J.E. Moreira, “Job Scheduling for the BlueGene/L System,” Proc. Eighth Workshop Job Scheduling Strategies for Parallel Processing, pp. 38-54, 2002.
[28] H. Choo, S.-M. Yoo, and H.Y. Youn, “Processor Scheduling and Allocation for 3D Torus Multicomputer Systems,” IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 5, pp. 475-484, May 2000.
[29] W. Qiao and L.M. Ni, “Efficient Processor Allocation for 3D Tori,” Proc. Ninth Int'l Parallel Processing Symp. (IPPS '95), pp. 466-471, 1995.
[30] P. Krueger, T.H. Lai, and V.A. Dixit-Radiya, “Job Scheduling Is More Important than Processor Allocation for Hypercube Computers,” IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 5, pp. 488-497, May 1994.
[31] Y. Aridor, T. Domany, O. Goldshmidt, J.E. Moreira, and E. Shmueli, “Resource Allocation and Utilization in the Blue Gene/L Supercomputer,” IBM J. Research and Development, vol. 49, nos. 2-3, p. 425, 2005.
[32] W. Mao, J. Chen, and W. Waston III, “Efficient Subtorus Processor Allocation in a Multi-Dimensional Torus,” Proc. Eighth Int'l Conf. High-Performance Computing in Asia-Pacific Region (HPC Asia '97), pp. 1-8, 2005.
[33] J. Chen, Private Communication, 2005.
[34] J. Bruck, R. Cypher, and C. Ho, “Fault-Tolerant Meshes and Hypercubes with Minimal Numbers of Spares,” IEEE Trans. Computers, vol. 42, no. 9, pp. 1089-1104, Sept. 1993.
[35] A.J. Oliner, R.K. Sahoo, J.E. Moreira, M. Gupta, and A. Sivasubramaniam, “Fault-Aware Job Scheduling for Blue Gene/L Systems,” Proc. 18th Int'l Parallel and Distributed Processing Symp. (IPDPS '04), p. 64a, 2004.
[36] G. Almasi et al., “System Management in the Blue Gene/L Supercomputer,” Proc. Third Workshop Massively Parallel Processing (WMPP '03), 2003.
[37] J. Srisawat, N.A. Alexandridis, and T. El-Ghazawi, “A Unified Model for Sub-System Allocation on Product Networks,” Proc. Int'l Conf. Parallel Processing (ICPP '99), 1999.
[38] Parallel Workload Archive, 2007, http://www.cs.huji.ac.il/labs/parallelworkload .
50 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool