This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Distributed Shared Abstractions (DSA) on Multiprocessors
February 1996 (vol. 22 no. 2)
pp. 132-152

Abstract—Any parallel program has abstractions that are shared by the program's multiple processes, including data structures containing shared data, code implementing operations like global sums or minima, type instances used for process synchronization or communication. Such shared abstractions can considerably affect the performance of parallel programs, on both distributed and shared memory multiprocessors. As a result, their implementation must be efficient, and such efficiency should be achieved without unduly compromising program portability and maintainability. Unfortunately, efficiency and portability can be at cross-purposes, since high performance typically requires changes in the representation of shared abstractions across different parallel machines.

The primary contribution of the DSA library presented and evaluated in this paper is its representation of shared abstractions as objects that may be internally distributed across different nodes of a parallel machine. Such distributed shared abstractions (DSA) are encapsulated so that their implementations are easily changed while maintaining program portability across parallel architectures ranging from small-scale multiprocessors, to medium-scale shared and distributed memory machines, and potentially, to networks of computer workstations. The principal results presented in this paper are 1) a demonstration that the fragmentation of object state across different nodes of a multiprocessor machine can significantly improve program performance, and 2) that such object fragmentation can be achieved without compromising portability by changing object interfaces. These results are demonstrated using implementations of the DSA library on several medium-scale multiprocessors, including the BBN Butterfly, Kendall Square Research, and SGI shared memory multiprocessors. The DSA library's evaluation uses synthetic workloads and a parallel implementation of a branch-and-bound algorithm for solving the Traveling Salesperson Problem (TSP).

[1] K. Schwan and W. Bo, "Topologies—Distributed objects on multicomputers," ACM Trans. Computer Systems, vol. 8, pp. 111-157, May 1990.
[2] G. Fox,M. Johnson,G. Lyzenga,S. Otto,J. Salmon,, and D. Walker,Solving Problems on Concurrent Processors, Vol. I: General Techniques andRegular Problems.Englewood Cliffs, N.J.: Prentice Hall 1988.
[3] T.J. Leblanc, "Shared memory versus message-passing in a tightly-coupled multiprocessor: A case study," Proc. Int'l Conf. Parallel Processing, pp. 463-466, Aug. 1986.
[4] T.E. Anderson,“The performance of spin lock alternatives for shared memory multiprocessors,” IEEE Trans. Parallel and Distributed Systems, vol. 1, no. 1, pp. 6-16, 1990.
[5] J. M. Mellor-Crummey and M. L. Scott,“Algorithms for scalable synchronization on shared-memory multiprocessors,”ACM Trans. Comput. Syst., vol, 9, no. 1, pp. 21–65, Feb. 1991.
[6] M. Shapiro, "Structure and encapsulation in distributed systems: The proxy principle," Proc. Sixth Int'l Conf. Distributed Computing Systems pp. 198-204, May 1986.
[7] E. Cooper and R. Draves, "C threads," Technical Report No. CMU-CS-88-154, Dept. of Computer Science, Carnegie Mellon Univ., June 1988.
[8] B. Mukherjee, "A portable and reconfigurable threads package," Proc. Sun User Group Technical Conf., pp. 101-112, June 1991.
[9] A. Birrell and B. Nelson, "Implementing Remote Procedure Calls," ACM Trans. Computer Systems," vol. 2, pp. 39-59, 1984.
[10] M. Satayanarayanan, J. Howard, D. Nichols, R. Sidebotham, A. Spector, and M. West, "The ITC distributed file system: Principles and design," Proc. 10th ACM Symp. Operating System Principles, pp. 35-50, Dec. 1985.
[11] K. Gharchorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, "Memory consistency and event ordering in scalable shared memory multiprocessors," Proc. 17th Ann. Int'l Symp. Computer Architecture, May 1990.
[12] J. Bennett, J. Carter, and W. Zwaenepoel, "Munin: Distributed shared-memory based on type-specific memory coherence," Proc. 1990 Conf. Principles and Practice of Parallel Programming.New York: ACM Press, pp. 168-176, 1990.
[13] P.W. Hutto and M. Ahamad, "Slow memory: Weakening consistency to enhance concurrency in distributed shared memories," Proc. Int'l Conf. Distributed Computing Systems, pp. 302-311, 1990.
[14] K. Li and P. Hudak, "Memory Coherence in Shared Virtual Memory Systems," ACM Trans. Computer Surveys, vol. 7, no. 4, Nov. 1989.
[15] P. Kohli, M. Ahamad, and K. Schwan, "Indigo: User-level support for building distributed shared abstractions," Proc. Fourth IEEE Int'l Symp. High-Performance Distributed Computing, Aug. 1995.
[16] H. Bal, M. Kaashoek, and A. Tanenbaum, "Orca: A language for parallel programming of distributed systems," IEEE Trans. Software Engineering, vol. 13, Mar. 1992.
[17] V. Karamcheti and A. Chien, "Concert—Efficient runtime support for concurrent object-oriented programming languages on stock hardware," Proc. Supercomputing, May 1993.
[18] W. Weihl, E. Brewer, A. Colbrook, C. Dellarocas, W. Hsieh, A. Joseph, C. Waldspurger, and P. Wang, "Prelude: A system for portable parallel software," MIT Lab for Computer Science, Technical Report MIT/LCS/TR-519, Oct. 1991.
[19] W. Hsieh, K. Johnson, M. Kaashoek, D. Wallach, and W. Weihl, "Optimistic active messages: A mechanism for scheduling communication with computation," Proc. Symp. Principles and Practice of Parallel Programming, July 1995
[20] E. Spertus and W.J. Dally, "Evaluating and locality benefits of active messages," Proc. Symp. Principles and Practice of Parallel Programming, July 1995.
[21] B. Mukherjee, D. Silva, K. Schwan, and A. Gheith, "KTK: Kernel support for configurable objects and invocations," Distributed Systems Engineering J., vol. 1, pp. 259-270, Sept. 1994.
[22] M. Shapiro, "Object-supporting operating systems," TCOS Newsletter, vol. 5, pp. 39-42, 1991.
[23] D.M. Ogle, K. Schwan, and R. Snodgrass, "The dynamic monitoring of real-time distributed and parallel systems," Technical Report ICS-GIT-90/23, College of Computing, Georgia Inst. of Tech nology, Atlanta, May 1990.
[24] C. Kilpatrick and K. Schwan, "Chaosmon—Application-specific monitoring and display of performance information for parallel and distributed systems," Proc. ACM Workshop Parallel and Distributed Debugging, pp. 57-67, May 1991.
[25] B. Mukherjee and K. Schwan, “Improving Performance by Use of Adaptive Objects: Experimentation with a Configurable Multiprocessor Thread Package,” Proc. Second Int'l Symp. High Performance Distributed Computing (HPDC-2), pp. 59–66, July 1993.
[26] W. Gu, G. Eisenhauer, E. Kraemer, K. Schwan, J. Stasko, J. Vetter, and N. Mallavarupu, "Falcon: On-line monitoring and steering of large-scale parallel programs," Technical Report GIT-CC-94-21, Georgia Inst. of Technology, College of Computing, Atlanta, Apr. 1994.
[27] A.K. Jones and K. Schwan, "Task forces: Distributed software for solving problems of substantial size," Proc. Fourth Int'l Conf. Software Engineering, pp. 315-329, Sept. 1979.
[28] R.H. Halstead Jr., "Multilisp: A language for concurrent symbolic computation," ACM Trans. Programming Languages and Systems, vol. 7, pp. 501-538, Oct. 1985.
[29] R. Finkel and U. Manber, "Dib—A distributed implementation of backtracking," ACM Trans. Programming Languages and Systems, vol. 9, pp. 235-255, Apr. 1987.
[30] K. Schwan, B. Blake, W. Bo, and J. Gawkowski, “Global Data and Control in Multicomputers: Operating System Primitives and Experimentation with a Parallel Branch-and-Bound Algorithm,” Concurrency: Practice and Experience, pp. 191–218, Dec. 1989.
[31] D.S.J.D. Little, K. Murty, and C. Karel, "An algorithm for the traveling salesman problem," Operations Research, vol. 11, 1963.
[32] J. Mohan, "Experience with two parallel programs solving the parallel salesman problem," Proc. Int'l Conf. Parallel Processing, pp. 191-193, Aug. 1983.
[33] K. Schwan, J. Gawkowski, and B. Blake, "Process and workload migration for a parallel branch-and-bound algorithm on a hypercube multicomputer," Proc. Third Conf. Hypercube Concurrent Computers and Applications, pp. 1,520-1,530, Jan. 1988.
[34] E. Chaves Jr., P. Das, T. LeBlanc, B. Marsh, and M. Scott, "Kernel-kernel communication in a shared-memory multiprocessor," Concurrency: Practice and Experience, vol. 5, pp. 171-192, May 1993.
[35] E. Felten, "Best-first branch-and-bound on a hypercube," Proc. Third Conf. Hypercube Concurrent Computers and Applications, Jan. 1988.
[36] K. Ghosh, B. Mukherjee, and K. Schwan, "Experimentation with configurable lightweight threads on a ksr multiprocessor," Technical report GIT-CC-93/37, College of Computing, Georgia Inst. of Tech nology, Atlanta, 1993.
[37] K. Schwan, H. Forbes, A. Gheith, B. Mukherjee, and Y. Samiotakis, "A C thread library for multiprocessors," Technical Report GIT-ICS-91/02, College of Computing, Georgia Inst. of Tech nology, Atlanta, Jan. 1991.
[38] A. Cox, R. Fowler, and J. Veenstra, "Interprocessor invocation on a numa multiprocessor," Technical report TR 356, Univ. of Rochester, 1990.
[39] D. Eager and J. Zahorjan, "Enhanced run-time support for shared memory parallel computing," ACM Trans. Computer Systems, vol. 11, pp. 1-32, Feb. 1993.
[40] G. Alverson and D. Notkin, "Program structuring for effective parallel portability," Proc. IEEE Trans. Parallel and Distributed Systems, vol. 4, pp. 1,041-1,059, Sept. 1993
[41] L. Crowl, "Architectural adaptability in parallel programming," PhD thesis, Dept. of Computer Science, Univ. of Rochester, May 1991.
[42] B. Mukherjee and K. Schwan, "Experimentation with a reconfigurable micro-kernel," Proc. Second Workshop Microkernels and Other Kernel Architectures, Sept. 1993.
[43] B. Lindgren, B. Krupczak, M. Ammar, and K. Schwan, "Parallel and configurable protocols: Experiences with a prototype and an architectural framework," Proc. Int'l Conf. Network Protocols, 1993.
[44] A. Gheith and K. Schwan, “CHAOS-Arc—Kernel Support for Multi-Weight Objects, Invocations, and Atomicity in Real-Time Applications,” ACM Trans. Computer Systems, vol. 11, no. 1, pp. 33–72, Apr. 1993.
[45] G. Eisenhauer, W. Gu, T. Kindler, K. Schwan, D. Silva, and J. Vetter, "Opportunities and tools for highly interactive distributed and parallel computing," Technical Report GIT-CC-94-58, Georgia Inst. of Technology, College of Computing, Atlanta, Dec. 1994.
[46] M. Schroeder and M. Burrows, "Performance or firefly rpc," Proc. 12th ACM Symp. Operating Systems, pp. 83-90, Dec. 1989.

Index Terms:
Distributed shared memory, application dependent memory consistency, fragmented objects, asynchronous events, topology.
Citation:
Christian Clémençon, Bodhisattwa Mukherjee, Karsten Schwan, "Distributed Shared Abstractions (DSA) on Multiprocessors," IEEE Transactions on Software Engineering, vol. 22, no. 2, pp. 132-152, Feb. 1996, doi:10.1109/32.485223
Usage of this product signifies your acceptance of the Terms of Use.