|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Guy E. Blelloch, Phillip B. Gibbons, Yossi Matias, Marco Zagha, "Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors," IEEE Transactions on Parallel and Distributed Systems, vol. 8, no. 9, pp. 943-958, September, 1997. | |||
| BibTex | x | ||
| @article{ 10.1109/71.615440, author = {Guy E. Blelloch and Phillip B. Gibbons and Yossi Matias and Marco Zagha}, title = {Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {8}, number = {9}, issn = {1045-9219}, year = {1997}, pages = {943-958}, doi = {http://doi.ieeecomputersociety.org/10.1109/71.615440}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Parallel and Distributed Systems TI - Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors IS - 9 SN - 1045-9219 SP943 EP958 EPD - 943-958 A1 - Guy E. Blelloch, A1 - Phillip B. Gibbons, A1 - Yossi Matias, A1 - Marco Zagha, PY - 1997 KW - Memory bank contention KW - memory delays KW - parallel machine models KW - performance analysis KW - parallel algorithms KW - shared memory KW - multiprocessors. VL - 8 JA - IEEE Transactions on Parallel and Distributed Systems ER - | |||
Abstract—For years, the computation rate of processors has been much faster than the access rate of memory banks, and this divergence in speeds has been constantly increasing in recent years. As a result, several shared-memory multiprocessors consist of more memory banks than processors. The object of this paper is to provide a simple model (with only a few parameters) for the design and analysis of irregular parallel algorithms that will give a reasonable characterization of performance on such machines. For this purpose, we extend Valiant's bulk-synchronous parallel (
[1] A. Agarwal, J. Kubiatowicz, D. Kranz, B. Lim, D. Yeung, G. D'Souza, and M. Parkin, "Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors," IEEE Micro, vol. 13, no. 3, pp. 48-61, June 1993.
[2] R. Alverson et al., "The Tera Computer System," Proc. Int'l Conf. Supercomputing, Assoc. of Computing Machinery, N.Y., 1990, pp. 1-6.
[3] D.A. Bader and J. JàJà, "Practical Parallel Algorithms for Dynamic Data Redistribution, Median Finding, and Selection," Proc. 10th Int'l Parallel Processing Symp., pp. 292-301, Apr. 1996.
[4] D.H. Bailey,“Vector computer memory bank contention,” IEEE Trans. Computers, vol. 36, pp. 293-298, 1987.
[5] F. Baskett and A.J. Smith, "Interference in Multiprocessor Computer Systems with Interleaved Memory," Comm. ACM, vol. 19, no. 6, pp. 327-334, June 1976.
[6] R.H. Bisseling and W.F. McColl, "Scientific Computing on Bulk Synchronous Parallel Architectures," Proc. 13th IFIP World Computer Congress, pp. 509-514, 1994.
[7] G.E. Blelloch, M.A. Heroux, and M. Zagha, "Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors," Technical Report CMU-CS-93-173, School of Computer Science, Carnegie Mellon Univ., Aug. 1993.
[8] G.E. Blelloch, C.E. Leiserson, B.M. Maggs, C.G. Plaxton, S. Smith, and M. Zagha, "A Comparison of Sorting Algorithms for the Connection Machine CM-2," Proc. Third Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 3-16, July 1991.
[9] F.A. Briggs and E.S. Davidson, "Organization of Semiconductor Memories for Parallel Pipelined Processors," IEEE Trans. Computers, vol. 26, no. 2, pp. 162-169, Feb. 1977.
[10] I.Y. Bucher and Simmons, "Measurement of Memory Access Contentions in Multiple Vector Processors," Proc. Supercomputing '91, pp. 806-817, 1991.
[11] D.A. Calahan, "Some Results in Memory Conflict Analysis," Proc. Supercomputing '89, pp. 775-778, Nov. 1989.
[12] L.J. Carter and M.N. Wegman, "Universal Classes of Hash Functions," J. Computer and System Sciences, vol. 18, pp. 143-154, 1979.
[13] D.Y. Chang, D.J. Kuck, and D.H. Lawrie, "On the Effective Bandwidth of Parallel Memories," IEEE Trans. Computers, vol. 26, no. 5, pp. 480-489, May 1977.
[14] T. Cheatham, A. Fahmy, D.C. Stefanescu, and L.G. Valiant, “Bulk Synchronous Parallel Computing—A Paradigm for Transportable Doftware,” Proc. 28th Hawaii Int'l Conf. System Science, vol. II, Jan. 1995.
[15] T. Cheung and J.E. Smith,“A simulation study of the Cray X-MP memorysystem,” IEEE Trans. Computers, vol. 35, pp. 613-622, 1986.
[16] D. Culler,R. Karp,D. Patterson,A. Sahay,K.E. Schauser,E. Santos,R. Subramonian,, and T. von Eicken,“LogP: Towards a realistic model of parallel computation,” Fourth Symp. Principles and Practices Parallel Programming, SIGPLAN’93, ACM, May 1993.
[17] D.E. Culler, A. Dusseau, R. Martin, and K.E. Schauser, "Fast Parallel Sorting Under LogP: From Theory to Practice," Proc. Workshop Portability and Performance for Parallel Processing,Southhampton, England, July 1993.
[18] U. Detert and G. Hofemann, "CRAY X-MP and Y-MP Memory Performance," Parallel Computing, vol. 17, pp. 579-590, 1991.
[19] M. Dietzfelbinger, J. Gil, Y. Matias, and N. Pippenger, "Polynomial Hash Functions Are Reliable," Proc. 19th Int'l Colloquium Automata Languages and Programming, Springer LNCS 623, pp. 235-246, July 1992.
[20] M. Dietzfelbinger, T. Hagerup, J. Katajainen, and M. Penttonen, "A Reliable Randomized Algorithm for the Closest-Pair Problem," Technical Report Research Report 513, Universitat Dortmund, Dec. 1993.
[21] C. Engelmann and J. Keller, "Simulation-Based Comparison of Hash Functions for Emulated Shared Memory," Proc. Parallel Architectures and Languages Europe, Springer LNCS 694, pp. 1-11, June 1993.
[22] A.V. Gerbessiotis and C.J. Siniolakis, “Deterministic Sorting and Randomized Mean Finding on the BSP Model,” Proc. Eighth Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 223-232, June 1996.
[23] P.B. Gibbons, Y. Matias, and V. Ramachandran, "Efficient Low-Contention Parallel Algorithms," J. Computer and System Sciences, vol. 53, no. 3, pp. 417-442, 1996.
[24] P.B. Gibbons, Y. Mattias, and V. Ramachandran, “Can a Shared-Memory Model Serve as a Bridging Model for Parallel Computation?,” Proc. Ninth Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 72-83, Newport, R.I., June 1997.
[25] P. Gibbons, Y. Matias, and V. Ramachandran, “The QRQW PRAM: Accounting for Contention in Parallel Algorithms,” Proc. Fifth Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 638-648, Jan. 1994.
[26] M. Goudreau, K. Lang, S. Rao, T. Suel, and T. Tsantilas, "Towards Efficiency and Portability: Programming with the BSP Model," Proc. Eighth ACM Symp. Parallel Algorithms and Architectures, pp. 1-12, June 1996.
[27] J. Greiner, "A Comparison of Data-Parallel Algorithms for Connected Components," Proc. Sixth ACM Symp. Parallel Algorithms and Architectures, pp. 16-25, June 1994.
[28] D.T. Harper III,“Block, multistride vector and FFT accesses in parallel memorysystems,” IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 1, pp. 43-51, 1991.
[29] D.T. Harper III and Y. Costa, "Analytical Estimation of Vector Access Performance in Parallel Memory Architectures," IEEE Trans. Computers, vol. 42, no. 5, pp. 616-624, May 1993.
[30] D.R. Helman, D. Bader, and J. JáJá, “Parallel Algorithms for Personalized Communication and Sorting with an Experimental Study,” Proc. ACM Symp. Parallel Algorithms and Architectures, pp. 211–220, 1996.
[31] W.-C. Hsu and J.E. Smith, “Performance of Cached DRAM Organizations in Vector Supercomputers,” Proc. 20th Ann. Int'l Symp. Computer Architecture (ISCA '93), pp. 327-336, May 1993.
[32] J. J'aJ'a, An Introduction to Parallel Algorithms.New York: Addison-Wesley, 1992.
[33] A.R. Karlin and E. Upfal, "Parallel Hashing: An Efficient Implementation of Shared Memory," J. ACM, vol. 35, no. 4, pp. 876-892, 1988.
[34] R.M. Karp and V. Ramachandran, "Parallel Algorithms for Shared-Memory Machines," Handbook of Theoretical Computer Science, J. van Leeuwen, ed., vol. A, pp. 869-941.Amsterdam: NorthHolland, 1990.
[35] D.E. Knuth, Sorting and Searching, vol. 3, The Art of Computer Programming. Reading, Mass.: Addison-Wesley, 1973.
[36] F.T. Leighton, "Methods for Message Routing in Parallel Machines," Proc. 24th ACM Symp. Theory of Computing, pp. 77-96, May 1992.
[37] K. Mehlhorn and U. Vishkin, "Randomized and Deterministic Simulations of PRAMs by Parallel Machines with Restricted Granularity of Parallel Memories," Acta Informatica, vol. 21, pp. 339-374, 1984.
[38] R. Miller, "A Library for Bulk-Synchronous Parallel Programming," Proc. British Computer Society Parallel Processsing, Specialist Group Workshop General Purpose Parallel Computing, Dec. 1993.
[39] W. Oed and O. Lange,“On the effective bandwidth of interleaved memories invector processing systems,” IEEE Trans. Computers, vol. 34, no. 10, pp. 949-957, Oct. 1985.
[40] P. Raghavan, "Probabilistic Construction of Deterministic Algorithms: Approximating Packing Integer Programs," J. Computer and System Sciences, vol. 37, pp. 130-143, 1988.
[41] R. Raghavan and J.P. Hayes, "On Randomly Interleaved Memories," Proc. Supercomputing '90, pp. 49-58, Nov. 1990.
[42] A.G. Ranade, "How to Emulate Shared Memory," J. Computer and System Sciences, vol. 42, pp. 307-326, 1991.
[43] B.R. Rau,“Pseudo-randomly interleaved memory,” Int’l Symp. Computer Architecture, pp. 74-83, 1991.
[44] J.H. Reif and L.G. Valiant, "A Logarithmic Time Sort for Linear Size Networks," J. ACM, vol. 34, pp. 60-76, Jan. 1987.
[45] S. Saini and D.H. Bailey, "NAS Parallel Benchmark Results 10-95," Technical Report NAS-95-019, NASA Ames Research Center, Oct. 1995.
[46] B.J. Smith, "A Pipelined, Shared Resource MIMD Computer," Proc. Int'l Conf. Parallel Processing, Aug. 1978.
[47] J.E. Smith and W.R. Taylor,“Accurate modeling of interconnection networks in vector supercomputers,” 1991 Int’l Conf. Supercomputing, pp. 264-273, 1991.
[48] J.E. Smith and W.R. Taylor, "Characterizing Memory Performance in Vector Multiprocessors," Proc. Int'l Conf. Supercomputing, pp. 35-44, July 1992.
[49] G.S. Sohi,“High-bandwidth interleaved memories for vector processors—Asimulation study,” IEEE Trans. Computer Systems, vol. 42, pp. 34-44, 1993.
[50] R.H. Swendsen and J.-S. Wang, "Nonuniversal Critical Dynamics in Monte Carlo Simulations," Physical Review Letters, vol. 58, no. 2, pp. 86-88, Jan. 1987.
[51] K. Thearling and S. Smith, "An Improved Supercomputer Sorting Benchmark," Proc. Supercomputing '92, pp. 14-19, Nov. 1992.
[52] T. Uehara and T. Tsuda, "Benchmarking Vector Indirect Load/Store Instructions," Supercomputer, vol. 8, no. 6, pp. 57-74, Nov. 1991.
[53] L.G. Valiant, “A Bridging Model for Parallel Computation,” Comm. ACM, vol. 33, no. 8, pp. 103-111, Aug. 1990.
[54] M. Zagha, "Efficient Irregular Computation on Pipelined-Memory Multiprocessors," PhD thesis, in preparation, 1997.
[55] M. Zagha and G. Blelloch, "Radix Sort for Vector Multiprocessors," Supercomputing, 1991.

