This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Hiding Relaxed Memory Consistency with a Compiler
August 2001 (vol. 50 no. 8)
pp. 824-833

Abstract—We present a compiler technique, which is based on Shasha and Snir's delay set analysis, to hide the underlying relaxed memory consistency model for an optimizing compiler for explicitly parallel programs. The compiler presents programmers with a sequentially consistent view of the underlying machine, irrespective of whether it follows a sequentially consistent model or a relaxed model. To hide the underlying relaxed memory consistency model and to guarantee sequential consistency, our algorithm inserts fence instructions by identifying memory-barrier nodes. We reduce the number of fence instructions by exploiting the ordering constraints of the underlying memory consistency model and the property of fence and synchronization operations. We introduce dominators with respect to a node in a control flow graph to identify memory-barrier nodes and show that minimizing the number of memory-barrier nodes is NP-hard

[1] S. Adve, “Designing Memory Consistency Models for Shared-Memory Multiprocessors,” PhD thesis, Computer Science Technical Report #1198, Univ. of Wisconsin-Madison, Dec. 1993.
[2] S.V. Adve and K. Gharachorloo, “Shared Memory Consistency Models: A Tutorial,” Computer, pp. 66-76, Dec. 1996.
[3] S.V. Adve and M.D. Hill, “Weak OrderingA New Definition,” Proc. 17th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 2-14, May 1990.
[4] A.V. Aho, J.E. Hopcroft, and J.D. Ullman, The Design and Analysis of Computer Algorithms. Addison-Wesley, 1974.
[5] A.V. Aho, R. Sethi, and J.D. Ullman, Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986.
[6] A.W. Appel, Modern Compiler Implementation in Java. New York: Cambridge Univ. Press, 1998.
[7] W. Collier, “Principles of Architecture for Systems of Parallel Processes,” Technical Report TR00.3100, IBM T.J. Watson Research Center, Mar. 1981.
[8] Apple Computer, IBM, and Motorola, PowerPC Microprocessor Common Hardware Reference Platform. Morgan Kaufmann, 1995.
[9] T.H. Cormen, C.E. Leiserson, and R.L. Rivest, Introduction to Algorithms. MIT Press, 1990.
[10] Intel Corp., IA-64 Application Developer's Architecture Guide, Rev. 1.0, May 1999.
[11] M. Dubois, C. Scheurich, and F. Briggs, “Memory Access Buffering in Multiprocessors,” Proc. 13th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 434-442, June 1986.
[12] M.R. Garey and D.S. Johnson, Computers and Intractability. W.H. Freeman, 1979.
[13] K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, “Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors,” Proc. 17th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 15-26, May 1990.
[14] C. Gniady, B. Falsafi, and T.N. Vijaykumar, “Is SC + ILP = RC?,” Proc. 26th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 162-171, May 1999.
[15] J.R. Goodman, “Cache Consistency and Sequential Consistency,” Technical Report CS-TR-91-1006, Dept. of Computer Science, Univ. of Wisconsin, Feb. 1991.
[16] J. Gosling, B. Joy, and G. Steele, The Java Language Specification. Addison-Wesley, 1996.
[17] M.D. Hill, “Multiprocessors Should Support Simple Memory-Consistency Models,” Computer, pp. 28-34, Aug. 1998.
[18] J. Knoop, B. Steffen, and J. Vollmer, “Parallelism for Free: Efficient and Optimal Bitvector Analysis for Parallel Programs,” ACM Trans. Programming Languages and Systems, vol. 18, no. 3, pp. 268-299, May 1996.
[19] A. Krishnamurthy and K. Yelick, “Optimizing Parallel SPMD Programs,” Proc. Seventh Ann. Workshop Languages and Compilers for Parallel Computing, Aug. 1994.
[20] A. Krishnamurthy and K. Yelick, “Optimizing Parallel Programs with Explicit Synchronization,” Proc. ACM SIGPLAN 1995 Conf. Programming Language Design and Implementation (PLDI), pp. 196-204, June 1995.
[21] L. Lamport, “How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs,” IEEE Trans. Computers, vol. 28, no. 9, pp. 690-691, Sept. 1979.
[22] D. Lea, Concurrent Programming in Java. Addison-Wesley, 1996.
[23] J. Lee, “Compilation Techniques for Explicitly Parallel Programs,” PhD thesis, Technical Report UIUCDCS-R-99-2112 Dept. of Computer Science, Univ. of Illinois at Urbana-Champaign, Oct. 1999.
[24] J. Lee, S.P. Midkiff, and D.A. Padua, “Concurrent Static Single Assignment Form and Constant Propagation for Explicitly Parallel Programs,” Proc. 10th Int'l Workshop Languages and Compilers for Parallel Computing, pp. 114-130, Aug. 1997.
[25] J. Lee, S.P. Midkiff, and D.A. Padua, “A Constant Propagation Algorithm for Explicitly Parallel Programs,” Int'l J. Parallel Programming, vol. 26, no. 5, pp. 563-589, 1998.
[26] J. Lee, D.A. Padua, and S.P. Midkiff, “Basic Compiler Algorithms for Parallel Programs,” Proc. 1999 ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, pp. 1-12, May 1999.
[27] D.E. Lenoski and W.-D. Weber, Scalable Shared-Memory Multiprocessing. Morgan Kaufmann, 1995.
[28] Z. Li and W. Abu-sufah, “A Technique for Reducing Synchronization Overhead in Large Scale Multiprocessors,” Proc. 12th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 284-291, 1985.
[29] Z. Li and W. Abu-sufah, “On Reducing Data Synchronization in Multiprocessed Loops,” IEEE Trans. Computers, vol. 36, no. 1, pp. 105-109, Jan. 1987.
[30] S.P. Midkiff and D.A. Padua, “Compiler Generated Synchronization for Do Loops,” Proc. 1986 Int'l Conf. Parallel Processing, pp. 19-22, Aug. 1986.
[31] S.P. Midkiff and D.A. Padua, “Compiler Algorithms for Synchronization,” IEEE Trans. Computers, vol. 36, no. 12, pp. 1485-1495, Dec. 1987.
[32] S.P. Midkiff and D.A. Padua, “Issues in the Optimization of Parallel Programs,” Proc. 1990 Int'l Conf. Parallel Processing (ICPP), Vol. II Software, pp. 105-113, Aug. 1990.
[33] S.P. Midkiff, D.A. Padua, and R. Cytron, “Compiling Programs with User Parallelism,” Languages and Compilers for Parallel Computing, pp. 402-422, 1990.
[34] D. Novillo, R. Unrau, and J. Schaeffer, “Concurrent SSA Form in the Presence of Mutual Exclusion,” Proc. 1998 Int'l Conf. Parallel Processing, Aug. 1998.
[35] D.A. Patterson and J.L. Hennessy, Computer Architecture: A Qunatitative Approach, second ed. Morgan Kaufmann, 1996.
[36] W. Pugh, “Fixing the Java Memory Model,” Proc. ACM 1999 Java Grande Conf., June 1999.
[37] D. Shasha and M. Snir, “Efficient and Correct Execution of Parallel Programs that Share Memory,” ACM Trans. Programming Languages and Systems, vol. 10, no. 2, pp. 282-312, Apr. 1988.
[38] X. Shen, Arvind, and L. Rudolph, “Commit-Reconcile & Fences (CRF): A New Memory Model for Architects and Compiler Writers,” Proc. 26th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 150-161, May 1999.
[39] R.L. Sites and R.T. Witek, Alpha AXP Architecture Reference Manual, second ed. Digital Press, 1995.
[40] Sun Microsystems Technical Support, personal comm., Dec. 1998.
[41] D.L. Weaver and T. Germond, The SPARC Architecture Manual. Prentice Hall, 1994.

Index Terms:
Sequential consistency, relaxed memory consistency, dominator, synchronization, fence, NP-hard, compiler.
Citation:
J. Lee, D.A. Padua, "Hiding Relaxed Memory Consistency with a Compiler," IEEE Transactions on Computers, vol. 50, no. 8, pp. 824-833, Aug. 2001, doi:10.1109/12.947002
Usage of this product signifies your acceptance of the Terms of Use.