
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
D.T. Harper, III, "Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 2, no. 1, pp. 4351, January, 1991.  
BibTex  x  
@article{ 10.1109/71.80188, author = {D.T. Harper, III}, title = {Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {2}, number = {1}, issn = {10459219}, year = {1991}, pages = {4351}, doi = {http://doi.ieeecomputersociety.org/10.1109/71.80188}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Parallel and Distributed Systems TI  Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems IS  1 SN  10459219 SP43 EP51 EPD  4351 A1  D.T. Harper, III, PY  1991 KW  Index Termsfast Fourier transform; dynamic storage schemes; parallel memory performance; vectoraccesses; block accesses; constantgeometry FFT accesses; linear addresstransformations; XOR schemes; analytical results; quantitative analysis; bufferingeffects; pipelined memory systems; conflictfree access; memory bank cycle time; fastFourier transforms; memory architecture VL  2 JA  IEEE Transactions on Parallel and Distributed Systems ER   
A discussion is presented of the use of dynamic storage schemes to improve parallelmemory performance during three important classes of data accesses: vector accesses inwhich multiple strides are used to access a single vector, block accesses, andconstantgeometry FFT accesses. The schemes investigated are based on linear addresstransformations, also known as XOR schemes. It has been shown that this class ofschemes can be implemented more efficiently in hardware and has more flexibility thanschemes based on row rotations or other techniques. Several analytical results areshown. These include: quantitative analysis of buffering effects in pipelined memorysystems; design rules for storage schemes that provide conflictfree access usingmultiple strides, blocks, and FFT access patterns; and an analysis of the effects ofmemory bank cycle time on storage scheme capabilities.
[1] D. H. Bailey, "Vector computer memory bank contention,"IEEE Trans. Computers, vol. C36, pp. 293298, Mar. 1987.
[2] W. Oed and O. Lange, "On the effective bandwidth of interleaved memories in vector processing systems,"IEEE Trans. Comput., vol. C34, no. 10, pp. 949957, Oct. 1985.
[3] D. Lawrie and C. Vora, "The prime memory system for array access,"IEEE Trans. Comput., vol. C31, no. 5, pp. 435442, May 1982.
[4] D. Kuck and R. Stokes, "The Burroughs Scientific Processor (BSP),"IEEE Trans. Comput., vol. C31, pp. 363376, May 1982.
[5] P. Budnik and D. Kuck, "The organization and use of parallel memories,"IEEE Trans. Comput., vol. C20, no. 12, pp. 15661569, Dec. 1971.
[6] D. Lawrie, "Access and alignment of data in an array processor,"IEEE Trans. Comput., vol. C24. no. 12, pp. 11451155, Dec. 1975.
[7] K. Batcher, "The multidimensional access memory in STARAN,"IEEE Trans. Comput., vol. C26, pp. 174177, Feb. 1977.
[8] B. Rau, M. Schlansker, and D. Yen, "The Cydra 5 strideinsensitive memory system," inProc. Int. Conf. Parallel Processing, 1989, pp. 12421246.
[9] A. Norton and E. Melton, "A class of boolean linear transformations for conflictfree poweroftwo stride access," inProc. Int. Conf. Parallel Processing, 1987, pp. 247254.
[10] J. Frailong, W. Jalby, and J. Lenfant, "XORschemes: A flexible data organization in parallel memories," inProc. Int. Conf. Parallel Processing, 1985, pp. 276283.
[11] D. T. Harper III and D. Linebarger, "A dynamic storage scheme for conflictfree vector access," inProc. Int. Symp. Comput. Architecture, 1989.
[12] D.T. Haper III, "Address transformations to increase memory performance," inProc. 1989 Int. Conf. Parallel Processing, 1989.
[13] D.T. Haper III, "Increased memory performance during vector accesses through the use of linear address transformations,"IEEE Trans. Comput., to be published.
[14] D. Lee, "Scrambled storage for parallel memory systems," inProc. 15th Ann. Int. Symp. Computer Arch., May 1988.
[15] K. Kim and V. K. Kumar, "Perfect Latin square and parallel array access," inProc. 16th Annu. Int. Symp. Comput. Architecture, May 1989, pp. 372379.
[16] D. T. Harper III and J. R. Jump, "Vector access performance in parallel memories using a skewed storage scheme,"IEEE Trans. Comput., vol. C36, no. 12, pp. 14401449, 1987.
[17] D.T. Harper III and D.A. Linebarger, "Storage schemes for efficient computation of a radix 2 FFT in a machine with parallel memories," inProc. 1988 Int. Conf. Parallel Processing, 1988.
[18] G. Sohi, "Highbandwidth interleaved memories for vector processorsA simulation study," Tech. Rep., Comput. Sci. Dep., Univ. of WisconsinMadison, Sept. 1988.
[19] D.T. Harper III and D. Linebarger, "Conflictfree vector access using a dynamic storage scheme,"IEEE Trans. Comput., to be published.
[20] E. Kozdrowicki and D. Theis, "Second generation of vector supercomputers,"IEEE Comput. Mag., pp. 7183, Nov. 1980.
[21] T. Cheung and J. E. Smith, "A simulation study of the CRAY XMP memory system,"IEEE Trans. Computers, vol. C35, pp. 613622, July 1986.
[22] CONVEX Computer Corp., CONVEX Architecture Reference, Oct 1988.
[23] CRAY Research Inc., CRAY XMP Computer System Functional Description ManualHR3005, 1987.
[24] CRAY Research Inc., CRAY YMP Computer System Functional Description ManualHR4001A, 1988.
[25] T. Diede, C. Hagenmaier, G. Miranker, J. Rubinstein, and J. W. S. Worley, "The Titan graphics supercomputer architecture,"IEEE Comput. Mag., vol. 21, pp. 1330, Sept. 1988.
[26] O. Lubeck, J. Moore, and R. Mendez, "A benchmark comparison of three supercomputers: Fujitsu VP200, Hitachi S810/20, and Cray XMP/2,"IEEE Comput. Mag., vol. 18, no. 12, pp. 1024, 1985.
[27] K. Gallivan, W. Jalby, U. Meier, and A. H. Sameh, "Impact of hierarchical memory systems on linear algebra algorithm design," Int.J. Supercomput. Appl., vol. 2, no. 1, pp. 1248, 1988.
[28] C.S. Burrus and T.W. Parks,DFT/FFT and Convolution Algorithms, John Wiley&Sons, New York, 1985, 232 pp.
[29] R.C. Singleton, "A Method for Computing the Fast Fourier Transform with Auxiliary Memory and Limited HighSpeed Storage,"IEEE Trans. Audio and Elect., Vol. AU15, June 1967, pp. 9197.