This Article 
 Bibliographic References 
 Add to: 
Memory Latency Effects in Decoupled Architectures
October 1994 (vol. 43 no. 10)
pp. 1129-1139

Decoupled computer architectures partition the memory access and execute functions in a computer program and achieve high-performance by exploiting the fine-grain parallelism between the two. These architectures make use of an access processor to perform the data fetch ahead of demand by the execute process and hence are often less sensitive to memory access delays than conventional architectures. Past performance studies of decoupled computers used memory systems that are interleaved or pipelined, and in those studies, latency effects were partially hidden due to interleaving. A detailed simulation study of the latency effects in decoupled computers is undertaken in this paper. Decoupled architecture performance is compared to single processors with caches. The memory latency sensitivity of cache based uniprocessors and decoupled systems is studied. Simulations are performed to determine the significance of data caches in a decoupled architecture. It is observed that decoupled architectures can reduce the peak memory bandwidth requirement, but not the total bandwidth, whereas data caches can reduce the total bandwidth by capturing locality. It may be concluded that despite their capability to partially mask the effects of memory latency, decoupled architectures still need a data cache.

[1] D. B. Alpert and M. J. Flynn, "Performance trade-offs for microprocessor cache memories,"IEEE Micro, pp. 44-53, Aug. 1988.
[2] W. C. Brantley and J. Weiss, "Organization and architecture tradeoffs in FOM," presented atIEEE Int. Workshop Comput. Syst. Organization, New Orleans, LA, Mar. 1983.
[3] L. D. Coraor, P. T. Hulina, and D. N. Mannai, "A queue-based instruction cache memory," inProc. Int. Symp. Comput. Architecture and Digital Signal Processing, Hong Kong, Oct. 1989, pp. 281-286.
[4] E. U. Cohler and J. E. Storer, "Functionally parallel architectures for array processors,"IEEE Computer, vol. 14, pp. 28-36, Sept. 1981.
[5] R. J. Eickenmeyer and J. H. Patel, "Performance evaluation of on-chip register and cache organizations," inProc. 15th Ann. Int. Symp. Comput. Arch., Honolulu, HI, May 1988, pp. 64-72.
[6] M.K. Farrens and A.R. Pleszkun, "Improving the Performance of Small On-Chip Instruction Caches,"Proc. 16th Int'l Symp. Computer Architecture, Vol. 17, No. 3, June 1989, pp. 234-241.
[7] M. K. Farrens and A. R. Pleszkun, "Implementation of the PIPE processor,"IEEE Computer, pp. 65-70, Jan. 1991.
[8] J.R. Goodman et al., "PIPE: A VLSI Decoupled Architecture,"Proc. 12th Int'l Symp. Computer Architecture, June 1985, pp. 20- 27.
[9] J. T. Hsieh, A. R. Pleszkun, and J. R. Goodman, "Performance evaluation of the PIPE computer architecture," Tech. Rep. #566, Comput. Sci. Dep, Univ. of Wisconsin-Madison, Nov. 1984.
[10] P. T . Hulina, L. D. Coraor, and S. W. Sun, "Performance analysis of an address generation coprocessor," inProc. IEEE Int. Conf. Parallel Processing, vol. I, Aug. 1991, pp. 136-143.
[11] G. Kane,MIPS RISC Architecture, Prentice-Hall, Englewood Cliffs, N.J., 1988.
[12] L. Kurian, P. T. Hulina, L. D. Coraor, and D. N. Mannai, "Classification and performance evaluation of instruction buffering techniques," inProc. 18th Int. Symp. Comput. Architecture, Toronto, ON, Canada, May 1991, pp. 150-159.
[13] L. Kurian, P. T. Hulina, and L. D. Coraor, "Memory latency effects in decoupled architectures with a single data memory module," inProc. 19th Int. Symp. Comput. Architecture, Australia, May 1992, pp. 237-245.
[14] L. Kurian, P. T. Hulina, and L. D. Coraor, "Bottlenecks in decoupled architecture performance," Tech. Rep. TR-92-115, Comput. Eng. Program, The Pennsylvania State Univ., Nov. 1992.
[15] A. R. Pleszkun and E. S. Davidson, "Structured memory access architecture," inIEEE Int. Conf. Parallel Processing, 1983, pp. 461-471.
[16] A. R. Pleszkun, G. S. Sohi, B. Z. Kahalleh, and E. S. Davidson, "Features of the structured memory access (SMA) architecture," Presented at theThird IEEE Comput. Soc. Int. Conf., San Francisco, CA, Mar. 1986.
[17] R. R. Shivley, "Architecture of a programmable digital signal processor,"IEEE Trans. Comput., vol. C-31, no. 1, pp. 16-22, Jan. 1982.
[18] A. Smith, "Cache Memories,"Computing Surveys, Vol. 14, No. 3, Sept. 1982, pp. 473- 530.
[19] J.E. Smith, "Decoupled Access/Execute Architecture Computer Architectures,"ACM Trans. Computer Systems, Nov. 1984, pp. 289-308.
[20] J.E. Smith et al., "The ZS-1 Central Processor,"Proc. Second Int'l Conf. Architectural Support for Programming Languages and Operating Systems(ASPLOS-II), CS Press, Los Alamitos, Calif., Order No. 805, Oct. 1987, pp. 199-204.
[21] A. J. Smith, "Line (block) size choice for CPU cache memories,"IEEE Trans. Computers, vol. 36, no. 9, pp. 1063-1074, 1987.
[22] J. E. Smith, "Dynamic instruction scheduling and the astronautics ZS-1,"IEEE Computer, pp. 21-35, July 1989.
[23] J. E. Smith, A. R. Pleszkun, R. H. Katz, and J. R. Goodman, "PIPE: A high performance VLSI architecture," inIEEE Workshop on Comput. Syst. Organiz., New Orleans, LA, Mar. 1983, pp. 131-138.
[24] J. E. Smith, S. Weiss, and N. Y. Pang, "A simulation study of decoupled architecture computers,"IEEE Trans. Comput., vol. C-35, no. 8, pp. 692-702, Aug. 1986.
[25] W. M. Smith, S. G. Abraham, and E. S. Davidson, "A performance comparison of the IBM RS/6000 and the Astronautics ZS-1,"IEEE Computer, pp. 39-46, Jan. 1991.
[26] W. Mangione-Smith, S.G. Abraham, and E.S. Davidson, "The Effects of Memory Latency and Fine-Grain Parallelism on Astronautics ZS-1 Performance,"Proc. 23rd Hawaii Int'l Conf. Systems Sciences, 1990, IEEE Computer Soc. Press, Los Alamitos, Calif., Order No. 2008, pp. 288-296.
[27] G. S. Sohi and E. S. Davidson, "Performance of the structured memory access architecture," inProc. Int. Conf. Parallel Processing, Aug. 1984, pp. 506-513.
[28] W. A. Wulf, "Evaluation of the WM architecture," inProc. Int. Symp. Comput. Architecture, Australia, May 1992, pp. 382-390.
[29] H. C. Young and J. R. Goodman, "A simulation study of architectural data queues and prepare-to-branch instruction," inIEEE Int. Conf. Comput. Design, Oct. 1984, pp. 544-549.

Index Terms:
computer architecture; buffer storage; digital simulation; performance evaluation; memory latency effects; decoupled architectures; fine-grain parallelism; memory access delays; performance studies; interleaving; simulation study; cache based uniprocessors; decoupled systems.
L. Kurian, P.T. Hulina, L.D. Coraor, "Memory Latency Effects in Decoupled Architectures," IEEE Transactions on Computers, vol. 43, no. 10, pp. 1129-1139, Oct. 1994, doi:10.1109/12.324539
Usage of this product signifies your acceptance of the Terms of Use.