
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Guy E. Blelloch, Phillip B. Gibbons, Yossi Matias, Marco Zagha, "Accounting for Memory Bank Contention and Delay in HighBandwidth Multiprocessors," IEEE Transactions on Parallel and Distributed Systems, vol. 8, no. 9, pp. 943958, September, 1997.  
BibTex  x  
@article{ 10.1109/71.615440, author = {Guy E. Blelloch and Phillip B. Gibbons and Yossi Matias and Marco Zagha}, title = {Accounting for Memory Bank Contention and Delay in HighBandwidth Multiprocessors}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {8}, number = {9}, issn = {10459219}, year = {1997}, pages = {943958}, doi = {http://doi.ieeecomputersociety.org/10.1109/71.615440}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Parallel and Distributed Systems TI  Accounting for Memory Bank Contention and Delay in HighBandwidth Multiprocessors IS  9 SN  10459219 SP943 EP958 EPD  943958 A1  Guy E. Blelloch, A1  Phillip B. Gibbons, A1  Yossi Matias, A1  Marco Zagha, PY  1997 KW  Memory bank contention KW  memory delays KW  parallel machine models KW  performance analysis KW  parallel algorithms KW  shared memory KW  multiprocessors. VL  8 JA  IEEE Transactions on Parallel and Distributed Systems ER   
Abstract—For years, the computation rate of processors has been much faster than the access rate of memory banks, and this divergence in speeds has been constantly increasing in recent years. As a result, several sharedmemory multiprocessors consist of more memory banks than processors. The object of this paper is to provide a simple model (with only a few parameters) for the design and analysis of irregular parallel algorithms that will give a reasonable characterization of performance on such machines. For this purpose, we extend Valiant's bulksynchronous parallel (
[1] A. Agarwal, J. Kubiatowicz, D. Kranz, B. Lim, D. Yeung, G. D'Souza, and M. Parkin, "Sparcle: An Evolutionary Processor Design for LargeScale Multiprocessors," IEEE Micro, vol. 13, no. 3, pp. 4861, June 1993.
[2] R. Alverson et al., "The Tera Computer System," Proc. Int'l Conf. Supercomputing, Assoc. of Computing Machinery, N.Y., 1990, pp. 16.
[3] D.A. Bader and J. JàJà, "Practical Parallel Algorithms for Dynamic Data Redistribution, Median Finding, and Selection," Proc. 10th Int'l Parallel Processing Symp., pp. 292301, Apr. 1996.
[4] D.H. Bailey,“Vector computer memory bank contention,” IEEE Trans. Computers, vol. 36, pp. 293298, 1987.
[5] F. Baskett and A.J. Smith, "Interference in Multiprocessor Computer Systems with Interleaved Memory," Comm. ACM, vol. 19, no. 6, pp. 327334, June 1976.
[6] R.H. Bisseling and W.F. McColl, "Scientific Computing on Bulk Synchronous Parallel Architectures," Proc. 13th IFIP World Computer Congress, pp. 509514, 1994.
[7] G.E. Blelloch, M.A. Heroux, and M. Zagha, "Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors," Technical Report CMUCS93173, School of Computer Science, Carnegie Mellon Univ., Aug. 1993.
[8] G.E. Blelloch, C.E. Leiserson, B.M. Maggs, C.G. Plaxton, S. Smith, and M. Zagha, "A Comparison of Sorting Algorithms for the Connection Machine CM2," Proc. Third Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 316, July 1991.
[9] F.A. Briggs and E.S. Davidson, "Organization of Semiconductor Memories for Parallel Pipelined Processors," IEEE Trans. Computers, vol. 26, no. 2, pp. 162169, Feb. 1977.
[10] I.Y. Bucher and Simmons, "Measurement of Memory Access Contentions in Multiple Vector Processors," Proc. Supercomputing '91, pp. 806817, 1991.
[11] D.A. Calahan, "Some Results in Memory Conflict Analysis," Proc. Supercomputing '89, pp. 775778, Nov. 1989.
[12] L.J. Carter and M.N. Wegman, "Universal Classes of Hash Functions," J. Computer and System Sciences, vol. 18, pp. 143154, 1979.
[13] D.Y. Chang, D.J. Kuck, and D.H. Lawrie, "On the Effective Bandwidth of Parallel Memories," IEEE Trans. Computers, vol. 26, no. 5, pp. 480489, May 1977.
[14] T. Cheatham, A. Fahmy, D.C. Stefanescu, and L.G. Valiant, “Bulk Synchronous Parallel Computing—A Paradigm for Transportable Doftware,” Proc. 28th Hawaii Int'l Conf. System Science, vol. II, Jan. 1995.
[15] T. Cheung and J.E. Smith,“A simulation study of the Cray XMP memorysystem,” IEEE Trans. Computers, vol. 35, pp. 613622, 1986.
[16] D. Culler,R. Karp,D. Patterson,A. Sahay,K.E. Schauser,E. Santos,R. Subramonian,, and T. von Eicken,“LogP: Towards a realistic model of parallel computation,” Fourth Symp. Principles and Practices Parallel Programming, SIGPLAN’93, ACM, May 1993.
[17] D.E. Culler, A. Dusseau, R. Martin, and K.E. Schauser, "Fast Parallel Sorting Under LogP: From Theory to Practice," Proc. Workshop Portability and Performance for Parallel Processing,Southhampton, England, July 1993.
[18] U. Detert and G. Hofemann, "CRAY XMP and YMP Memory Performance," Parallel Computing, vol. 17, pp. 579590, 1991.
[19] M. Dietzfelbinger, J. Gil, Y. Matias, and N. Pippenger, "Polynomial Hash Functions Are Reliable," Proc. 19th Int'l Colloquium Automata Languages and Programming, Springer LNCS 623, pp. 235246, July 1992.
[20] M. Dietzfelbinger, T. Hagerup, J. Katajainen, and M. Penttonen, "A Reliable Randomized Algorithm for the ClosestPair Problem," Technical Report Research Report 513, Universitat Dortmund, Dec. 1993.
[21] C. Engelmann and J. Keller, "SimulationBased Comparison of Hash Functions for Emulated Shared Memory," Proc. Parallel Architectures and Languages Europe, Springer LNCS 694, pp. 111, June 1993.
[22] A.V. Gerbessiotis and C.J. Siniolakis, “Deterministic Sorting and Randomized Mean Finding on the BSP Model,” Proc. Eighth Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 223232, June 1996.
[23] P.B. Gibbons, Y. Matias, and V. Ramachandran, "Efficient LowContention Parallel Algorithms," J. Computer and System Sciences, vol. 53, no. 3, pp. 417442, 1996.
[24] P.B. Gibbons, Y. Mattias, and V. Ramachandran, “Can a SharedMemory Model Serve as a Bridging Model for Parallel Computation?,” Proc. Ninth Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 7283, Newport, R.I., June 1997.
[25] P. Gibbons, Y. Matias, and V. Ramachandran, “The QRQW PRAM: Accounting for Contention in Parallel Algorithms,” Proc. Fifth Ann. ACMSIAM Symp. Discrete Algorithms, pp. 638648, Jan. 1994.
[26] M. Goudreau, K. Lang, S. Rao, T. Suel, and T. Tsantilas, "Towards Efficiency and Portability: Programming with the BSP Model," Proc. Eighth ACM Symp. Parallel Algorithms and Architectures, pp. 112, June 1996.
[27] J. Greiner, "A Comparison of DataParallel Algorithms for Connected Components," Proc. Sixth ACM Symp. Parallel Algorithms and Architectures, pp. 1625, June 1994.
[28] D.T. Harper III,“Block, multistride vector and FFT accesses in parallel memorysystems,” IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 1, pp. 4351, 1991.
[29] D.T. Harper III and Y. Costa, "Analytical Estimation of Vector Access Performance in Parallel Memory Architectures," IEEE Trans. Computers, vol. 42, no. 5, pp. 616624, May 1993.
[30] D.R. Helman, D. Bader, and J. JáJá, “Parallel Algorithms for Personalized Communication and Sorting with an Experimental Study,” Proc. ACM Symp. Parallel Algorithms and Architectures, pp. 211–220, 1996.
[31] W.C. Hsu and J.E. Smith, “Performance of Cached DRAM Organizations in Vector Supercomputers,” Proc. 20th Ann. Int'l Symp. Computer Architecture (ISCA '93), pp. 327336, May 1993.
[32] J. J'aJ'a, An Introduction to Parallel Algorithms.New York: AddisonWesley, 1992.
[33] A.R. Karlin and E. Upfal, "Parallel Hashing: An Efficient Implementation of Shared Memory," J. ACM, vol. 35, no. 4, pp. 876892, 1988.
[34] R.M. Karp and V. Ramachandran, "Parallel Algorithms for SharedMemory Machines," Handbook of Theoretical Computer Science, J. van Leeuwen, ed., vol. A, pp. 869941.Amsterdam: NorthHolland, 1990.
[35] D.E. Knuth, Sorting and Searching, vol. 3, The Art of Computer Programming. Reading, Mass.: AddisonWesley, 1973.
[36] F.T. Leighton, "Methods for Message Routing in Parallel Machines," Proc. 24th ACM Symp. Theory of Computing, pp. 7796, May 1992.
[37] K. Mehlhorn and U. Vishkin, "Randomized and Deterministic Simulations of PRAMs by Parallel Machines with Restricted Granularity of Parallel Memories," Acta Informatica, vol. 21, pp. 339374, 1984.
[38] R. Miller, "A Library for BulkSynchronous Parallel Programming," Proc. British Computer Society Parallel Processsing, Specialist Group Workshop General Purpose Parallel Computing, Dec. 1993.
[39] W. Oed and O. Lange,“On the effective bandwidth of interleaved memories invector processing systems,” IEEE Trans. Computers, vol. 34, no. 10, pp. 949957, Oct. 1985.
[40] P. Raghavan, "Probabilistic Construction of Deterministic Algorithms: Approximating Packing Integer Programs," J. Computer and System Sciences, vol. 37, pp. 130143, 1988.
[41] R. Raghavan and J.P. Hayes, "On Randomly Interleaved Memories," Proc. Supercomputing '90, pp. 4958, Nov. 1990.
[42] A.G. Ranade, "How to Emulate Shared Memory," J. Computer and System Sciences, vol. 42, pp. 307326, 1991.
[43] B.R. Rau,“Pseudorandomly interleaved memory,” Int’l Symp. Computer Architecture, pp. 7483, 1991.
[44] J.H. Reif and L.G. Valiant, "A Logarithmic Time Sort for Linear Size Networks," J. ACM, vol. 34, pp. 6076, Jan. 1987.
[45] S. Saini and D.H. Bailey, "NAS Parallel Benchmark Results 1095," Technical Report NAS95019, NASA Ames Research Center, Oct. 1995.
[46] B.J. Smith, "A Pipelined, Shared Resource MIMD Computer," Proc. Int'l Conf. Parallel Processing, Aug. 1978.
[47] J.E. Smith and W.R. Taylor,“Accurate modeling of interconnection networks in vector supercomputers,” 1991 Int’l Conf. Supercomputing, pp. 264273, 1991.
[48] J.E. Smith and W.R. Taylor, "Characterizing Memory Performance in Vector Multiprocessors," Proc. Int'l Conf. Supercomputing, pp. 3544, July 1992.
[49] G.S. Sohi,“Highbandwidth interleaved memories for vector processors—Asimulation study,” IEEE Trans. Computer Systems, vol. 42, pp. 3444, 1993.
[50] R.H. Swendsen and J.S. Wang, "Nonuniversal Critical Dynamics in Monte Carlo Simulations," Physical Review Letters, vol. 58, no. 2, pp. 8688, Jan. 1987.
[51] K. Thearling and S. Smith, "An Improved Supercomputer Sorting Benchmark," Proc. Supercomputing '92, pp. 1419, Nov. 1992.
[52] T. Uehara and T. Tsuda, "Benchmarking Vector Indirect Load/Store Instructions," Supercomputer, vol. 8, no. 6, pp. 5774, Nov. 1991.
[53] L.G. Valiant, “A Bridging Model for Parallel Computation,” Comm. ACM, vol. 33, no. 8, pp. 103111, Aug. 1990.
[54] M. Zagha, "Efficient Irregular Computation on PipelinedMemory Multiprocessors," PhD thesis, in preparation, 1997.
[55] M. Zagha and G. Blelloch, "Radix Sort for Vector Multiprocessors," Supercomputing, 1991.