This Article 
 Bibliographic References 
 Add to: 
Cache Conscious Data Layout Organization for Conflict Miss Reduction in Embedded Multimedia Applications
January 2005 (vol. 54 no. 1)
pp. 76-81
Cache misses form a major bottleneck for real-time multimedia applications due to the off-chip accesses to the main memory. This results in both a major access bandwidth overhead (and related power consumption) as well as performance penalties. In this paper, we propose a new technique for organizing data in the main memory for data dominated multimedia applications so as to reduce the majority of the conflict cache misses. The focus of this paper is on the formal and heuristic algorithm we use to steer the data layout decisions and the experimental results obtained using a prototype tool. Experiments on real-life demonstrators illustrate that we are able to reduce up to 82 percent of the conflict misses for applications which are already aggressively transformed at source-level. At the same time, we also reduce the off-chip data accesses by up to 78 percent. In addition, we are able to reduce up to 20 percent more conflict misses compared to existing techniques.

[1] P. Baglietto, M. Maresca, and M. Migliardi, “Image Processing on High-Performance RISC Systems,” Proc. IEEE, vol. 84, no. 7, pp. 917-929, July, 1996.
[2] M. Bister, Y. Taeymans, and J. Cornelis, “Automatic Segmentation of Cardiac MR Images,” Computers in Cardiology, pp. 215-218, 1989.
[3] D.C. Burger, J.R. Goodman, and A. Kagi, “The Declining Effectiveness of Dynamic Caching for General Purpose Multiprocessor,” Technical Report no. 1261, Univ. of Wisconsin, 1995.
[4] E. De Greef, “Storage Size Reduction for Multimedia Applications,” doctoral dissertation, Dept. of Electrical Eng., K.U. Leuven, Jan. 1998.
[5] F. Catthoor, S. Wuytack, E. De Greef, F. Balasa, L. Nachtergaele, and A. Vandecappelle, Custom Memory Management Methodology— Exploration of Memory Organization for Embedded Multimedia System Design. Boston: Kluwer Academic, 1998.
[6] S. Ghosh, M. Martonosi, and S. Malik, “Cache Miss Equations: A Compiler Framework for Analyzing and Tuning Memory Behaviour,” ACM Trans. Programming Languages and Systems, vol. 21, no. 4, pp. 702-746, July 1999.
[7] S. Gupta, M. Miranda, F. Catthoor, and R. Gupta, “Analysis of High-Level Address Code Transformations,” Proc. Design Automation and Test in Europe (DATE) Conf., Mar. 2000.
[8] N. Jouppi et al., “A 300-MHz 115-W 32-b Bipolar ECL Microprocessor,” IEEE J. Solid-State Circuits, pp. 1152-1165, Nov. 1993.
[9] M. Kandemir, J. Ramanujam, and A. Choudhary, “Improving Cache Locality by a Combination of Loop and Data Transformations,” IEEE Trans. Computers, vol. 48, no. 2, pp. 159-167, Feb. 1999.
[10] C. Kulkarni, “Cache Optimization for Multimedia Applications,” doctoral dissertation, Katholieke Universiteit Leuven, Belgium, Feb. 2001.
[11] D. Kulkarni and M. Stumm, “Linear Loop Transformations in Optimizing Compilers for Parallel Machines,” The Australian Computer J., pp. 41-50, May 1995.
[12] M. Lam, E. Rothberg, and M. Wolf, “The Cache Performance and Optimizations of Blocked Algorithms” Proc. Sixth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), pp. 63-74, 1991.
[13] N. Manjikian and T. Abdelrahman, “Array Data Layout for Reduction of Cache Conflicts,” Proc. Int'l Conf. Parallel and Distributed Computing Systems, 1995.
[14] K.S. McKinley and O. Temam, “A Quantitative Analysis of Loop Nest Locality” Proc. Eighth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), Oct. 1996.
[15] M. Miranda, C. Ghez, C. Kulkarni, F. Catthoor, and D. Verkest, “Systematic Speed-Power Memory Data-Layout Exploration for Cache Controlled Embedded Multimedia Applications” Proc. 14th ACM/IEEE Int'l Symp. System-Level Synthesis (ISSS), pp. 107-112, Oct. 2001.
[16] G. L. Nemhauser and L. A. Wolsey, Integer and Combinatorial Optimization. J. Wiley & Sons, 1988.
[17] P.R. Panda, N.D. Dutt, and A. Nicolau, “Memory Data Organization for Improved Cache Performance in Embedded Processor Applications,” Proc. Int'l Symp. System-Level Synthesis (ISSS-96), pp. 90-95, Nov. 1996.
[18] P. R. Panda, H. Nakamura, N. D. Dutt, and A. Nicolau, “Augmented Loop Tiling with Data Alignment for Improved Cache Performance” IEEE Trans. Computers, vol. 48, no. 2, pp. 142-149, Feb. 1999.
[19] D. Burger and T. Austin, “The Simplescalar Toolset,” version 2.0,, 10 Mar. 2000.
[20] G. Rivera and C. Tseng, ”Compiler Optimizations for Eliminating Cache Conflict Misses,” technical report, Univ. of Maryland, July 1997.
[21] CACTI,, 28 Nov. 2001.
[22] Infineon Technologies,, 28 Nov. 2001.

Index Terms:
RISC/CISC, VLIW architectures, VLSI systems.
C. Kulkarni, C. Ghez, M. Miranda, F. Catthoor, H. De Man, "Cache Conscious Data Layout Organization for Conflict Miss Reduction in Embedded Multimedia Applications," IEEE Transactions on Computers, vol. 54, no. 1, pp. 76-81, Jan. 2005, doi:10.1109/TC.2005.2
Usage of this product signifies your acceptance of the Terms of Use.