|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| David C. Wong, Edward W. Davis, Jeffrey O. Young, "A Software Approach to Avoiding Spatial Cache Collisions in Parallel Processor Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 9, no. 6, pp. 601-608, June, 1998. | |||
| BibTex | x | ||
| @article{ 10.1109/71.689447, author = {David C. Wong and Edward W. Davis and Jeffrey O. Young}, title = {A Software Approach to Avoiding Spatial Cache Collisions in Parallel Processor Systems}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {9}, number = {6}, issn = {1045-9219}, year = {1998}, pages = {601-608}, doi = {http://doi.ieeecomputersociety.org/10.1109/71.689447}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Parallel and Distributed Systems TI - A Software Approach to Avoiding Spatial Cache Collisions in Parallel Processor Systems IS - 6 SN - 1045-9219 SP601 EP608 EPD - 601-608 A1 - David C. Wong, A1 - Edward W. Davis, A1 - Jeffrey O. Young, PY - 1998 KW - Cache collision KW - cache offset KW - direct-mapped cache KW - highly parallel systems KW - sequential DO-loops. VL - 9 JA - IEEE Transactions on Parallel and Distributed Systems ER - | |||
Abstract—In parallel processor systems, the performance of individual processors is a key factor in overall performance. Processor performance is strongly affected by the behavior of cache memory in that high hit rates are essential for high performance. Hit rates are lowered when collisions on placing lines in the cache force a cache line to be replaced before it has been used to best effect.
[1] J. Brooks, "Single PE Optimization Techniques for the CRAY T3D System," Cray Research, Oct.20 1994.
[2] S. Coleman and K. McKinley, “Tile Size Selection Using Cache Organization and Data Layout,” Proc. SIGPLAN Conf. Programming Language Design and Implementation, June 1995.
[3] Cray T3D: Technical Summary. Cray Research, Inc., Sept. 1993.
[4] J.J. Dongarra, J.D. Croz, S. Hammarling, and I. Duff, "A Set of Level 3 Basic Linear Algebra Subprograms," ACM Trans. Mathematical Software, vol. 16, no. 1, pp. 1-17, Mar. 1990.
[5] Z. Fang, "Cache or Local Memory Thrashing and Compiler Strategy in Parallel Processing Systems," Proc. 1990 Int'l Conf. Parallel Processing, vol. II, pp. 271-275, 1990.
[6] J. Fang and M. Lu, "An Iteration Partition Approach for Cache or Local Memory Thrashing on Parallel Processing," IEEE Trans. Computers, vol. 42, no. 5, May 1993.
[7] M.D. Hill and J.R. Larus, "Cache Considerations for Multiprocessor Programmers," Comm. ACM, vol. 33, no. 8, pp. 97-102, Aug. 1990.
[8] N.P. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully Associative Cache and Prefetch Buffers,” Proc. 17th Int'l Symp. Computer Architecture, pp. 364-373, May 1990.
[9] M. Lam, E. Rothberg, and M. Wolf, “The Cache Performance and Optimizations of Blocked Algorithms,” Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '91), 1991.
[10] J.-H. Lee, M.-Y. Lee, S.-U. Choi, and M.-S. Park, "Reducing Cache Conflicts in Data Cache Prefetching," Computer Architecture News, vol. 22, no. 4, pp. 71-77, 1994.
[11] S. McFarling, ``Cache Replacement with Dynamic Exclusion,'' Proc. 19th ISCA, pp. 191-200, May 1992.
[12] A. Meltzer, Programming for Performance in CRAFT on the T3D. Cray Research, Inc., July28 1994.
[13] T. Mowry, "Tolerating Latency Through Software Controlled Data Prefetching," PhD Thesis, Dept. of Computer Science, Stanford Univ., Palo, Alto, Calif., Mar. 1994.
[14] S. Przybylski, M. Howrowitz, and J. Hennessy, "Performance Tradeoffs in Cache Design," Proc. 15th Int'l Symp. Computer Architecture, pp. 290-298, June 1988.
[15] H.S. Stone, High-Performance Computer Architecture.Reading, Mass.: Addison-Wesley, 1990.
[16] O. Temam, E.D. Granston,, and W. Jalby, “To Copy or Not to Copy: A Compile-Time Technique for Assessing When Data Copying Should Be Used to Eliminate Cache Conflicts,” Proc. Supercomputing, Nov. 1993.
[17] O. Temam and N. Drach, "Software Assistance for Data Caches," Proc. First IEEE Symp. High-Performance Computer Architecture, pp. 154-163,Raleigh, N.C., Jan.22-25 1995.
[18] J. Torrellas, C. Xia, and R. Daigle, “Optimizing Instruction Cache Performance for Operating System Intensive Workloads,” Proc. First Int'l Symp. High-Performance Computer Architecture, pp. 360-369, Jan. 1995.
[19] C. Vanden Eynden, Elementary Number Theory, first ed. Random House, 1987.
[20] S. Venugopal, "Automatic Reorganization of Loops to Reduce Cache Conflicts," Technical Report DCS-TR-274, Dept. of Computer Science, Laboratory for Computer Science Research, Rutgers Univ., Jan. 1991.
[21] S. Venugopal and W. Eventoff, "Automatic Transformation of FORTRAN Loops to Reduce Cache Conflicts," Proc. 1991 Int'l Conf. Supercomputing, pp. 183-193,Cologne, Germany, June17-21 1991.
[22] H. Weberpals, "Designing Vector Algorithms with Data Locality," Proc. Parallel Computing '89, pp. 419-424.North-Holland: Elsevier Science Publishers B.V., 1990.
[23] M. Wolfe, “Iteration Space Tiling for Memory Hierarchies,” Proc. Third SIAM Conf. Parallel Processing for Scientific Computing, Dec. 1987.
[24] M. Wolfe, “More Iteration Space Tiling,” Proc. Supercomputing '89, pp. 655-664, Nov. 1989.
[25] J.O. Young, E.D. Sills, and D.A. Jorge, "Optimization of the Regional Oxidant Model for the Cray Y-MP," Technical Report EPA/600/R-94/065, U.S. Environmental Protection Agency, Research Triangle Park, N.C., Jan. 1993.

