This Article 
 Bibliographic References 
 Add to: 
Priority Queues and Sorting Methods for Parallel Simulation
May 2000 (vol. 26 no. 5)
pp. 401-422

Abstract—We examine the design, implementation, and experimental analysis of parallel priority queues for device and network simulation. We consider: 1) distributed splay trees using MPI, 2) concurrent heaps using shared memory atomic locks, and 3) a new, more general concurrent data structure based on distributed sorted lists, which is designed to provide dynamically balanced work allocation (with automatic or manual control) and efficient use of shared memory resources. We evaluate performance for all three data structures on a Cray-T3E900 system at KFA-Jülich. Our comparisons are based on simulations of single buffers and a $64 \times 64$ packet switch which supports multicasting. In all implementations, PEs monitor traffic at their preassigned input/output ports, while priority queue elements are distributed across the Cray-T3E virtual shared memory. Our experiments with up to 60,000 packets and two to 64 PEs indicate that concurrent priority queues perform much better than distributed ones. Both concurrent implementations have comparable performance, while our new data structure uses less memory and has been further optimized. We also consider parallel simulation for symmetric networks by sorting integer conflict functions and implementing an interesting packet indexing scheme. The optimized message passing network simulator can process $\sim 500$K packet moves in one second, with an efficiency that exceeds $\sim 50$ percent for a few thousands packets on the Cray-T3E with 32 PEs. All developed data structures now form a parallel library. Although our concurrent implementations use the Cray-T3E ShMem library, portability can be derived from Open-MP or MPI-2 standard libraries, which will provide support for one-way communication and shared memory lock mechanisms.

[1] S. Arora, F.T. Leighton, and B.M. Maggs, “On-Line Algorithms for Path Selection in a Nonblocking Network,” SIAM J. Computing, vol. 25, no. 3, pp. 600–625, 1996.
[2] J. Aspnes, M. Herlihy, and N. Shavit, “Counting Networks,” J. ACM, vol. 41, no. 5, pp. 1,020–1,048, 1994.
[3] H. Attiya and R. Friedman., “Programming DEC-Alpha Based Multiprocessors the Easy Way,” Proc. Sixth ACM Symp. Parallel Algorithms and Architectures, pp. 157–166, 1990.
[4] G.S. Brodal, J.L. Traff, and C.D. Zaroliagis, “A Parallel Priority Queue with Constant Time Operations,” J. Parallel and Distributed Computing, vol. C-49, no. 1, pp. 4–21, 1998.
[5] J.R. Driscoll, H.N. Gabow, R. Shrairman, and R.E. Tarjan, “An Alternative to Fibonacci Heaps with Applications to Parallel Computation. Comm.,” Comm. ACM, vol. 31, no. 11, pp. 1,343–1,354, 1988.
[6] M.D. Grammatikakis, N. Fideropoulos, F. Howell, S. Liesche, T. Thielke, and A. Zachos, “Network Simulation on the CM-5 by Sorting Integer Conflict Functions,” Proc. Parallel Computer Conf., 1997.
[7] M.D. Grammatikakis, N. Fideropoulos, and A. Zachos., “Network Simulation on Cray-T3E Using MPI,” Proc. Third Cray-SGI MPP Conf., , 1997.
[8] M.D. Grammatikakis, D.F. Hsu, M. Kraetzl, and J. Sibeyn, “Packet Routing in Fixed-Connection Networks: A Survey,” J. Parallel and Distributed Computing, pp. 77–132, 1998.
[9] M.D. Grammatikakis and M. Johl., “Clock-Cycle Level Simulations of an ATM Switch,” Proc. First SCS Euro Media Conf., pp. 149–156, 1996.
[10] M.D. Grammatikakis and S. Liesche., “Synchronization on Cray-T3E Virtual Shared Memory,” Proc. 40th Cray Users Group Conf.,, 1998.
[11] D.R. Helman, D. Bader, and J. JáJá, “Parallel Algorithms for Personalized Communication and Sorting with an Experimental Study,” Proc. ACM Symp. Parallel Algorithms and Architectures, pp. 211–220, 1996.
[12] M. Herlihy, B.H. Lim, and N. Shavit, “Scalable Concurrent Counting,” Proc. ACM Trans. Computing Systems, vol. C-13, no. 4, pp. 343–364, 1995.
[13] M. Herlihy and J.E.B. Moss,“Transactional memory: Architectural support for lock-free data structures,” Proc. Int’l Symp. Computer Architecture, pp. 289-300, 1993.
[14] F.W. Howell, “Reverse Profiling,” Proc. First Int'l Workshop Parallel and Distributed Software Eng., pp. 245–255, 1996.
[15] G.C. Hunt, M. Michael, M.S. Parthasarathy, and M.L. Scott, “An Efficient Algorithm for Concurrent Priority Queue Heaps,” Information Procedure Letters, vol. 60, no. 3 pp. 151–157, 1996.
[16] D. Jones,“Concurrent operations on priority queues,” Comm. ACM, vol. 32, no. 1, pp. 132-137, 1989.
[17] D. Knuth, The Art of Computer Programming, vol. 3: Sorting and Searching. Addison-Wesley, 1973.
[18] J.S. Lim, Two-Dimensional Signal and Image Processing, Prentice Hall, Upper Saddle River, N.J., 1990.
[19] B.M. Maggs, “Randomly-Wired Multistage Networks,” Statistical Science, vol. C-8, no. 1, pp. 70–75, 1993.
[20] B. Mans, “Portable Distributed Priority Queues with MPI,” Concurrency: Practice and Experience, vol. 10, no. 3, pp. 175–198, 1998.
[21] Z.G Mou and M. Goodman., “A Comparison of Communication Costs for Three Parallel Program Paradigms on Hypercube and Mesh Architectures,” Proc. Fifth SIAM Conf. Parallel Processing and Scientific Computing, pp. 491–500, 1992.
[22] “MPI-2: Extensions to the Message-Passing Interface,” Proc. MPI Forum, July 1997.
[23] S. Liesche., “MPI and Shared Memory Implementations of Priority Queues for Parallel Simulation on Cray-T3E,” Diplomarbeit, Germany: Inst. of Informatics, Univ. of Hildesheim, , May 1998.
[24] J. M. Mellor-Crummey and M. L. Scott,“Algorithms for scalable synchronization on shared-memory multiprocessors,”ACM Trans. Comput. Syst., vol, 9, no. 1, pp. 21–65, Feb. 1991.
[25] D. Peleg, “Distributed Data Structures: A Complexity-Oriented View,” Proc. Int'l Workshop Distr. Alg., vol. 486, pp. 71–89, 1991.
[26] V.N. Rao and V. Kumar, “Concurrent Access of Priority Queues,” IEEE Trans. Computer, vol. 37, no. 12, pp. 1,657–1,665, Dec. 1988.
[27] P. Sanders, “Fast Priority Queues for Parallel Branch-and-Bound,” Proc. Int'l Workshop Algorithm Irregularities and Structural Problems, vol. 980, pp. 379–393, 1995.
[28] X. Qian,M.E. Stickel,P.D. Karp,T.F. Lunt, and T.D. Garvey,"Detection and elimination of inference channels in multilevel relational database systems," Proc. IEEE Computer Society Symp. Research in Security and Privacy, pp. 196-205, May 1993.
[29] S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson, “Eraser: A Dynamic Data Race Detector for Multi-Threaded Programs,” Proc. 16th ACM Symp. OS Principles, France: Saint-Malo, pp. 26–37, 1997.
[30] D. Sleator and R.E. Tarjan, “Self-Adjusting Binary Search Trees,” SIAM J. Computing, vol. 15, no. 1, pp. 52–69, 1986.
[31] R. Subramanian and I.D. Scherson, “An Analysis of Diffusive Load-Balancing,” Proc. ACM Symp. Parallel and Algorithms and Architectures, pp. 220–225, 1994.
[32] J. Turner and N. Yamanaka, “Architectural Choices in Large Scale ATM Switches,” Trans. Inst. Electronics, Information and Communication Eng., 1998.
[33] J.W. Williams, “Algorithm 232: Heapsort,” Comm. ACM, vol. 7, pp. 347–348, 1964.

Index Terms:
Concurrent data structure, Cray-T3E, data race, distributed data structure, memory lock, priority queue, parallel simulation, virtual shared memory.
Miltos D. Grammatikakis, Stefan Liesche, "Priority Queues and Sorting Methods for Parallel Simulation," IEEE Transactions on Software Engineering, vol. 26, no. 5, pp. 401-422, May 2000, doi:10.1109/32.846298
Usage of this product signifies your acceptance of the Terms of Use.