The Community for Technology Leaders
2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA) (1996)
San Jose, CA
Feb. 3, 1996 to Feb. 7, 1996
ISBN: 0-8186-7237-4
pp: 26
Sandhya Dwarkadas , Rice University
Alan L. Cox , Rice University
Sarita V. Adve , Rice University
Ramakrishnan Rajamony , Rice University
Willy Zwaenepoel , Rice University
This paper compares several implementations of entry consistency (EC) and lazy release consistency (LRC), two relaxed memory models in use with software distributed shared memory (DSM) systems. We use six applications in our study: SOR, Quicksort, Water, Barnes-Hut, IS, and 3D-FFT. For these applications, EC's requirement that all shared data be associated with a synchronization object leads to a fair amount of additional programming effort. We identify, in particular, extra synchronization, lock rebinding, and object granularity as sources of extra complexity. In terms of performance, for the set of applications and for the computing environment utilized neither model is consistently better than the other. For SOR and IS, execution times are about the same, but LRC is faster for Water (33%) and Barnes-Hut (41%) and EC is faster for Quicksort (14%) and 3D-FFT (10%). Among the implementations of EC and LRC, we independently vary the method for write trapping and the method for write collection. Our goal is to separate implementation issues from any particular model. We consider write trapping by compiler instrumentation of the code and by twinning (comparing the current version of shared data with an older version). Write collection is done either by scanning timestamps or by building diffs, records of the changes to shared data. For write trapping in EC, twinning is faster if data is shared at the granularity of a single word. For larger granularities than a word, compiler instrumentation is faster. For write trapping in LRC, twinning gives the best performance for all applications. For write collection in EC, timestamping works best in applications dominated by migratory data, while for other data diffing works best. For LRC, increased communication overhead in transmitting timestamps becomes an additional factor working in favor of diffing for applications with fine-grain sharing.
parallel computation, shared memory, networks of workstations, consistency models, performance measurement
Sandhya Dwarkadas, Alan L. Cox, Sarita V. Adve, Ramakrishnan Rajamony, Willy Zwaenepoel, "A Comparison of Entry Consistency and Lazy Release Consistency Implementations", 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), vol. 00, no. , pp. 26, 1996, doi:10.1109/HPCA.1996.501171
638 ms
(Ver 3.3 (11022016))