This Article 
 Bibliographic References 
 Add to: 
Scalable Global and Local Hashing Strategies for Duplicate Pruning in Parallel A* Graph Search
July 1997 (vol. 8 no. 7)
pp. 738-756

Abstract—For many applications of the A* algorithm, the state space is a graph rather than a tree. The implication of this for parallel A* algorithms is that different processors may perform significant duplicated work if interprocessor duplicates are not pruned. In this paper, we consider the problem of duplicate pruning in parallel A* graph-search algorithms implemented on distributed-memory machines. A commonly used method for duplicate pruning uses a hash function to associate with each distinct node of the search space a particular processor to which duplicate nodes arising in different processors are transmitted and thereby pruned. This approach has two major drawbacks. First, load balance is determined solely by the hash function. Second, node transmissions for duplicate pruning are global; this can lead to hot spots and slower message delivery. To overcome these problems, we propose two different duplicate pruning strategies: 1) To achieve good load balance, we decouple the task of duplicate pruning from load balancing, by using a hash function for the former and a load balancing scheme for the latter. 2) A novel search-space partitioning scheme that allocates disjoint parts of the search space to disjoint subcubes in a hypercube (or disjoint processor groups in the target architecture), so that duplicate pruning is achieved with only intrasubcube or adjacent intersubcube communication. Thus message latency and hot-spot probability are greatly reduced. The above duplicate pruning schemes were implemented on an nCUBE2 hypercube multicomputer to solve the Traveling Salesman Problem (TSP). For uniformly distributed intercity costs, our strategies yield a speedup improvement of 13 to 35 percent on 1,024-processors over previous methods that do not prune any duplicates, and 13 to 25 percent over the previous hashing-only scheme. For normally distributed data the corresponding figures are 135 percent and 10 to 155 percent. Finally, we analyze the scalability of our parallel A* algorithms on k-ary n-cube networks in terms of the isoefficiency metric, and show that they have isoefficiency lower and upper bounds of Θ(P log P) and Θ(Pkn2), respectively.

[1] S. Anderson and M.C. Chen, "Parallel Branch-and-Bound Algorithms on the Hypercube," Proc. Second Conf. Hypercube Multiprocessors, pp. 309-317, 1987.
[2] P.P. Chakrabarti, S. Ghose, A. Acharya, and S.C. De Sarkar, "Heuristic Search in Restricted Memory," Artificial Intelligence, vol. 41, pp. 197-221, 1989.
[3] T.H. Cormen, C.E. Leiserson, and R.L. Rivest, Introduction to Algorithms, chapter 37, pp. 978-983. The MIT Press, McGraw-Hill Book Company, 1992.
[4] W.J. Dally, "Performance Analysis of k-ary n-Cube Interconnection Networks," IEEE Trans. Computers, vol. 39, no. 6, pp. 775-785, June 1992.
[5] S. Dutt and N.R. Mahapatra, "Parallel A* Algorithms and Their Performance on Hypercube Multiprocessors," Proc. Seventh Int'l Parallel Processing Symp., pp. 797-803, 1993.
[6] S. Dutt and N.R. Mahapatra, Scalable Load-Balancing Strategies for Parallel a* Algorithms," J. Parallel and Distributed Computing, vol. 22, no. 3, pp. 488-505, special issue on scalability of parallel algorithms and architectures, Sept. 1994.
[7] J. Eckstein, "Parallel Branch-and-Bound Algorithms for General Mixed Integer-Programming on the CM-5," Technical Report TMC-257, Thinking Machines Corp., Aug. 1993.
[8] M. Evett, J. Hendler, A. Mahanti, and D. Nau, "PRA*: A Memory-Limited Heuristic Search Procedure for the Connection Machine," Proc. Third Symp. Frontiers of Massively Parallel Computation, pp. 145-149, 1990.
[9] J.L. Hennessy and D.A. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann, San Mateo, Calif., 1990.
[10] R.C. Holte, C. Drummond, and M.B. Perez, "Searching with Abstractions: A Unifying Framework and New High-Performance Algorithm," Proc. 10th Canadian Conf. Artificial Intelligence, pp. 263-270, 1994.
[11] R. Karp and Y. Zhang, "Randomized Parallel Algorithms for Backtrack Search and Branch-and-Bound Computation," J. ACM, vol. 40, pp. 765-789, 1993.
[12] V. Kumar, K. Ramesh, and V.N. Rao, "Parallel Best-First Search of State-Space Graphs: A Summary of Results," Proc. Seventh Nat'l Conf. Artificial Intelligence (AAAI 88), vol. 1, pp. 122-127,Saint Paul, Minn., Aug.21-26, 1988.
[13] V. Kumar and V.N. Rao, "Load Balancing on the Hypercube Architecture," Proc. Fourth Conf. Hypercubes, Concurrent Computers and Applications, vol. 1, pp. 603-608,Monterey, Calif., Mar.6-8, 1989.
[14] D. Knuth, The Art of Computer Programming, vol. 3: Sorting and Searching. Addison-Wesley, 1973.
[15] E.L. Lawler, Combinatorial Optimization: Networks and Matroids.New York: Holt, Rinehart, and Winston, 1976.
[16] J.D. Little et al., "An Algorithm for the Traveling Salesman Problem," Operations Research, vol. 11, no. 6, pp. 972-989, 1963.
[17] R. Luling and B. Monien, "Load Balancing for Distributed Branch&Bound Algorithms," Proc. Sixth Int'l Parallel Processing Symp., pp. 543-548,Beverly Hills, Calif., Mar.23-26, 1992.
[18] N.R. Mahapatra and S. Dutt, "New Anticipatory Load Balancing Strategies for Parallel A* Algorithms," Am. Math. Society's Proc. DIMACS Series in Discrete Math. and Theoretical Computer Science, vol. 22, pp. 197-232, 1995.
[19] N.R. Mahapatra and S. Dutt, "An Efficient Delay-Optimal Distributed Termination Detection Algorithm," submitted to IEEE Trans. Parallel and Distributed Systems.
[20] G. Manzini and M. Somalvico, "Probabilistic Performance Analysis of Heuristic Search Using Parallel Hash Tables," Proc. Int'l Symp. Artificial Intelligence and Math.,Ft. Lauderdale, Fla., Jan 1990.
[21] D.L. Miller and J.F. Pekny, "Results from a Parallel Branch and Bound Algorithm for Solving Large Asymmetric Traveling Salesman Problems," Operations Research Letters, vol. 8, pp. 129-135, 1989.
[22] J. Mohan, "Experience with Two Parallel Programs Solving the Traveling Salesman Problem," Proc. 1983 Int'l Conf. Parallel Processing, pp. 191-193,Bellaire, Mich., Aug.23-26, 1983.
[23] E. Rich and K. Knight, Artificial Intelligence, second edition. McGraw-Hill, 1995.
[24] V.A. Saletore, "A Distributed and Adaptive Dynamic Load Balancing Scheme for Parallel Processing of Medium-Grain Tasks," Proc. Fifth Distributed Memory Computing Conf., 1990.
[25] A.K. Sen and A. Bagchi, "Fast Recursive Formulations for Best-First Search that Allow Controlled Use of Memory," Proc. 11th Int'l Joint Conf. Artificial Intelligence (IJCAI-89), pp. 297-302,Detroit, Mich., Aug.2-25, 1989.
[26] N.A. Sherwani, Algorithms for VLSI Physical Design Automation.Norwell, Mass.: Kluwer Academic Publishers, 1993.
[27] D.R. Smith, "Random Trees and the Analysis of Branch-and-Bound Procedures," J. ACM, vol. 31, no. 1, pp. 163-188, Jan 1984.
[28] B.W. Wah and Y.Q. Ma,“MANIP—a parallel computer system for implementing branch-and-bound algorithms,” Proc. 8th Ann. Symp. on Computer Architecture, pp. 239-262, 1982.

Index Terms:
A* algorithm, branch-and-bound search, communication delay, duplicate pruning, graph search, isoefficiency function, k-ary n-cubes, parallel A*, scalability, traveling salesman problem.
Nihar R. Mahapatra, Shantanu Dutt, "Scalable Global and Local Hashing Strategies for Duplicate Pruning in Parallel A* Graph Search," IEEE Transactions on Parallel and Distributed Systems, vol. 8, no. 7, pp. 738-756, July 1997, doi:10.1109/71.598348
Usage of this product signifies your acceptance of the Terms of Use.