This Article 
 Bibliographic References 
 Add to: 
Mesh Partitioning for Efficient Use of Distributed Systems
January 2002 (vol. 13 no. 1)
pp. 67-79

Mesh partitioning for homogeneous systems has been studied extensively; however, mesh partitioning for distributed systems is a relatively new area of research. To ensure efficient execution on a distributed system, the heterogeneities in the processor and network performance must be taken into consideration in the partitioning process; equal size subdomains and small cut set size, which results from conventional mesh partitioning, are no longer the primary goals. In this paper, we address various issues related to mesh partitioning for distributed systems. These issues include the metric used to compare different partitions, efficiency of the application executing on a distributed system, and the advantage of exploiting heterogeneity in network performance. We present a tool called PART, for automatic mesh partitioning for distributed systems. The novel feature of PART is that it considers heterogeneities in the application and the distributed system. Simulated annealing is used in PART to perform the backtracking search for desired partitions. While it is well-known that simulated annealing is computationally intensive, we describe the parallel version of simulated annealing that is used with PART. The results of the parallelization exhibit superlinear speedup in most cases and nearly perfect speedup for the remaining cases. Experimental results are also presented for partitioning regular and irregular finite element meshes for an explicit, nonlinear finite element application, called WHAMS2D, executing on a distributed system consisting of two IBM SPs with different processors. The results from the regular problems indicate a 33 to 46 percent increase in efficiency when processor performance is considered as compared to the conventional even partitioning. The results indicate a 5 to 15 percent increase in efficiency when network performance is considered as compared to considering only processor performance; this is significant given that the optimal improvement is 15 percent for this application. The results from the irregular problem indicate up to 36 percent increase in efficiency when processor and network performance are considered as compared to even partitioning.

[1] P. Banerjee, M.H. Jones, and J.S. Sargent, “Parallel Simulated Annealing Algorithms for Cell Placement on Hypercube Multiprocessors,” IEEE Trans. Parallel and Distributed Systems, vol. 1, no. 1, Jan. 1990.
[2] S.T. Barnard and H.D. Simon, “A Fast Multilevel Implementation of Recursive Spectral Bisection for Partitioning Unstructured Problems,” technical report, NAS systems Division, Applied Research Branch, NASA Ames Research Center, 1993.
[3] M.J. Berger and S.H. Bokhari, "A partitioning strategy for nonuniform problems on multiprocessors," IEEE Trans. Computers, vol. 36, no. 5, pp. 570-580, May 1987.
[4] J. A. Chandy, S. Kim, B. Ramkumar, S. Parkes, and P. Banerjee, “An Evaluation of Parallel Simulated Annealing Strategies with Application to Standard Cell Placement,” IEEE Trans. Computer, vol. 16, no. 4, pp. 398-410, Apr. 1997.
[5] H.C. Chen, H. Gao, and S. Sarma, “WHAMS3D Project Progress Report (PR-2),” Technical Report 1112, Univ. of Illinois (CSRD), 1991.
[6] J. Chen and V.E. Taylor, “Mesh Partitioning for Distributed Systems,” Proc. Seventh IEEE Int'l Symp. High Performance Distributed Computing, July 1998.
[7] J. Chen and V.E. Taylor, “Parapart: Parallel Mesh Partitioning Tool for Distributed Systems,” Proc. Sixth Int'l Workshop Solving Irregularly Structured Problems in Parallel, Apr. 1999.
[8] P.E. Crandall and M.J. Quinn, “Data Partitioning for Networked Parallel Processing,” Proc. Fifth IEEE Symp. Parallel and Distributed Processing, 1993.
[9] P.E. Crandall and M.J. Quinn, “Problem Decomposition in Parallel Networks” technical report, Dept. of Computer Science, Oregon State Univ., Mar. 1993.
[10] P.E. Crandall and M.J. Quinn, “Block Data Partitioning for Partial-Homogeneous Parallel Networks,” Proc. 27th Hawaii Int'l Conf. System Sciences, 1994.
[11] P.E. Crandall and M.J. Quinn, “Three-Dimensional Grid Partitioning for Network Parallel Processing,” Proc. ACM 1994 Computer Science Conf., 1994.
[12] P.E. Crandall and M.J. Quinn, “Evaluating Decomposition Techniques for High-Speed Cluster Computing,” technical report, Dept. of Computer Science, Oregon State Univ., 1995.
[13] P.E. Crandall and M.J. Quinn, “A Partitioning Advisory System for Networked Data-Parallel Processing,” Concurrency: Practice and Experience, vol. 7, no. 5, pp. 479-495, Aug. 1995.
[14] C. Farhat, “A Simple and Efficient Automatic Fem Domain Decomposer,” Computers and Structures, vol. 28, no. 5, pp. 579-602, 1988.
[15] C. Farhat and M. Lesoinne, “Automatic Partitioning of Unstructured Meshes for the Parallel Solution of Problems in Computational Mechanics,” Int'l J. Numerical Methods in Eng., vol. 36, pp. 745-764, 1993.
[16] I. Foster et al., "Managing Multiple Communication Methods in High-Performance Networked Computing Systems," J. Par. Distr. Comput., Vol. 40, No. 1, Jan. 1997, pp. 35-48.
[17] I. Foster, J. Geisler, B. Nickless, and S. Tuecke, “Software Infrastructure for the I-Way Metacomputing Experiment,” Concurrency: Practical Experience, vol. 10, no. 7, pp. 567-581, 1998.
[18] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness.New York: W.H. Freeman, 1979.
[19] A. George and J. W.-H. Liu,Computer Solution of Large Sparse Positive Difinite Systems. Englewood Cliffs, NJ: Prentice-Hall, 1981.
[20] D.R. Greening, “Parallel Simulated Annealing Techniques,” Physica, vol. 42, pp. 293-306, 1990.
[21] A.S. Grimshaw and W.A. Wulf, "The Legion Vision of a Worldwide Virtual Computer," Comm. ACM, vol. 40, no. 1, 1997, pp. 39-45.
[22] G. Hasteer and P. Banerjee, “Simulated Annealing Based Parallel State Assignment of Finite State Machines,” J. Parallel and Distributed Computing, vol. 43, 1997.
[23] B. Hendrickson and R. Leland, “The Chaco User's Guide,” Technical Report SAND93-2339, Sandia Nat'l Laboratory, 1993.
[24] B. Hendrickson and R. Leland, “A Multilevel Algorithm for Partitioning Graphs,” technical report, Sandia Nat'l Laboratories, June 1993.
[25] J. Jamison and R. Wilder, “vBNS: The Internet Fast Lane for Research and Education,” IEEE Comm. Magazine, Jan. 1997.
[26] G. Karypis and V. Kumar, “A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs,” Technical Report TR95-035, Dept. of Computer Science, Univ. of Minnesota, 1995.
[27] G. Karypis and V. Kumar, “Multilevel K-Way Partitioning Scheme for Irregular Graphs,” Technical Report, Univ. of Minnesota, Dept. of Computer Science, TR95-064, 1995.
[28] B. Kernighan and S. Lin, “An Efficient Heuristic Procedure for Partitioning Graphs,” Bell System Technical J., vol. 29, pp. 291-307, 1970.
[29] S.A. Kravitz and R.A. Rutenbar, “Placement by Simulated Annealing on a Multiprocessor,” IEEE Trans. Computer Aided Design, vol. 6, pp. 534-549, July 1987.
[30] V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin Cummings, 1994.
[31] S.Y. Lee and K.G. Lee, “Asynchronous Comm. Multiple Markov Chain in Parallel Simulated Annealing,” Proc. Int'l Conf. Parallel Processing, vol. 3, pp. 169-176, Aug. 1992.
[32] Legion 1.7 Developer Manual. Charlottesville, Va.: The Legion Group, Univ. of Virginia, 2000.
[33] B. Nour-Omid, A. Raefsky, and G. Lyzenga, “Solving Finite Element Equations on Concurrent Computers,” Parallel Computations and Their Impact on Mechanics, Am. Soc. Mechnical Eng. (ASME), A.K. Noor, ed., 1987.
[34] A. Pothen, H. Simon, and K. Liou, "Partitioning Sparse Matrices with Eigenvectors of Graphs," SIAM J. Matrix Analysis and Application, vol. 11, pp. 430-352, July 1990.
[35] H.D. Simon, “Partitioning of Unstructured Problems for Parallel Processing,” Computing Systems in Eng., 2 vol., nos. 2/3, pp. 135-148, 1991.
[36] H.D. Simon and C. Farhat, “Top/domdec: A Software Tool for Mesh Partitioning and Parallel Processing,” Technical Report, RNR-93-011, NASA, July 1993.
[37] A. Sohn, “Parallel N-Ary Speculative Computation of Simulated Annealing,” IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 10, Oct. 1995.
[38] V.E. Taylor and B. Nour-Omid, “A Study of the Factorization Fill-In for a Parallel Implementation of the Finite Element Method,” Int'l J. Numerical Methods in Eng., vol. 37, pp. 3809-3823, 1994.
[39] D. Vanderstraeten, C. Farhat, P.S. Chen, R. Keunings, and O. Zone, “A Retrofit Based Methodology for the Fast Generation and Optimization of Large-Scale Mesh Partitions: Beyond the Minimum Interface Size Criterion,” technical report, Center for Aerospace Structures, Univ. of Colorado, Sept. 1994.

Index Terms:
Mesh partitioning, simulated annealing, distributed systems.
Jian Chen, Valerie E. Taylor, "Mesh Partitioning for Efficient Use of Distributed Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 1, pp. 67-79, Jan. 2002, doi:10.1109/71.980027
Usage of this product signifies your acceptance of the Terms of Use.