|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Enric Gibert, Jes? S?nchez, Antonio Gonz?lez, "Distributed Data Cache Designs for Clustered VLIW Processors," IEEE Transactions on Computers, vol. 54, no. 10, pp. 1227-1241, October, 2005. | |||
| BibTex | x | ||
| @article{ 10.1109/TC.2005.163, author = {Enric Gibert and Jes? S?nchez and Antonio Gonz?lez}, title = {Distributed Data Cache Designs for Clustered VLIW Processors}, journal ={IEEE Transactions on Computers}, volume = {54}, number = {10}, issn = {0018-9340}, year = {2005}, pages = {1227-1241}, doi = {http://doi.ieeecomputersociety.org/10.1109/TC.2005.163}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Computers TI - Distributed Data Cache Designs for Clustered VLIW Processors IS - 10 SN - 0018-9340 SP1227 EP1241 EPD - 1227-1241 A1 - Enric Gibert, A1 - Jes? S?nchez, A1 - Antonio Gonz?lez, PY - 2005 KW - Index Terms- Single data stream architectures KW - design styles. VL - 54 JA - IEEE Transactions on Computers ER - | |||
[1] V. Agarwal, M.S. Hrishikesh, S.W. Keckler, and D. Burger, “Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures,” Proc. 27th Int'l Symp. Computer Architecture, June 2000.
[2] A. Aggarwal and M. Franklin, “An Empirical Study of the Scalability Aspects of Instruction Distribution Algorithms for Clustered Processors,” Proc. Int'l Symp. Performance Analysis of Systems and Software, 2001.
[3] O. Avissar, R. Barua, and D. Stewart, “An Optimal Memory Allocation Scheme for Scratch-Pad-Based Embedded Systems,” ACM Trans. Embedded Computing Systems, 2002.
[4] R. Bahar, G. Albera, and S. Manne, “Power and Performance Tradeoffs Using Various Caching Strategies,” Proc. Int'l Symp. Low Power Electronics and Design, 1998.
[5] R. Balasubramonian, S. Dwarkadas, and D. Albonesi, “Dynamically Managing the Communication-Parallelism Trade-Off in Future Clustered Processors,” Proc. 30th Int'l Symp. Computer Architecture, June 2003.
[6] R. Canal, J.M. Parcerisa, and A. González, “Dynamic Cluster Assignment Mechanisms,” Proc. Sixth Int'l Symp. High-Performance Computer Architecture, Jan. 2000.
[7] P.P. Chang, S.A. Mahlke, W.Y. Chen, N.J. Water, and W.W. Hwu, “IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors,” Proc. 18th Int'l Symp. Computer Architecture, May 1991.
[8] A. Charlesworth, “An Approach to Scientific Array Processing: The Architectural Design of the AP120B/FPS-164 Family,” Computer, vol. 14, no. 9, Sept. 1981.
[9] B. Cheng, “Compile-Time Memory Disambiguation for C Programs,” PhD thesis, Dept. of Computer Science, Univ. of Illi nois, May 2000.
[10] J.M. Codina, J. Sánchez, and A. González, “A Unified Modulo Scheduling and Register Allocation Technique for Clustered Processors,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, Sept. 2001.
[11] J.M. Codina, J. Llosa, and A. González, “A Comparative Study of Modulo Scheduling Techniques,” Proc. Int'l Conf. Supercomputing, June 2002.
[12] P. Faraboschi, G. Brown, J. Fisher, G. Desoli, and F. Homewood, “Lx: A Technology Platform for Customizable VLIW Embedded Processing,” Proc. 27th Int'l Symp. Computer Architecture, June 2000.
[13] J. Fridman and Z. Greefield, “The TigerSharc DSP Architecture,” IEEE Micro, Jan./Feb. 2000.
[14] E. Gibert, J. Sánchez, and A. González, “An Interleaved Cache Clustered VLIW Processor,” Proc. Int'l Conf. Supercomputing, June 2002.
[15] E. Gibert, J. Sánchez, and A. González, “Effective Instruction Scheduling Techniques for an Interleaved Cache Clustered VLIW Processor,” Proc. 35th Int'l Symp. Microarchitecture, Nov. 2002.
[16] E. Gibert, J. Sánchez, and A. González, “Local Scheduling Techniques for Memory Coherence in a Clustered VLIW Processor with a Distributed Data Cache,” Proc. First Int'l Symp. Code Generation and Optimization, Mar. 2003.
[17] E. Gibert, J. Sánchez, and A. González, “Flexible Compiler-Managed L0 Buffers for Clustered VLIW Processors,” Proc. 36th Int'l Symp. Microarchitecture, Dec. 2003.
[18] P.N. Glaskowsky, “MAP1000 Unfolds at Equator,” Microprocessor Report, vol. 16, no. 12, Dec. 1998.
[19] L. Gwennap, “Digital 21264 Sets New Standard,” Microprocessor Report, vol. 14, no. 10, Oct. 1996.
[20] G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel, “The Microarchitecture of the Pentium 4 Processor,” Intel Technology J., Q1, Feb. 2001.
[21] R. Huff, “Lifetime-Sensitive Modulo Scheduling,” Proc. ACM SIGPLAN '93 Conf. Programming Languages Design and Implementation, 1993.
[22] K. Kailas, K. Ebcioglu, and A. Agrawala, “CARS: A New Code Generation Framework for Clustered ILP Processors,” Proc. Seventh Int'l Symp. High-Performance Computer Architecture, Jan. 2001.
[23] Y. Kang, W. Huang, S. Yoo, D. Keen, Z. Ge, V. Lam, P. Pattnaik, and J. Torrellas, “FlexRAM: Toward an Advanced Intelligent Memory System,” Proc. Int'l Conf. Computer Design, Oct. 1999.
[24] J. Kin, M. Gupta, and W.H. Mangione-Smith, “The Filter Cache: An Energy Efficient Memory Structure,” Proc. 30th Int'l Symp. Microarchitecture, Dec. 1997.
[25] C.E. Kozyrakis, S. Perissakis, D. Patterson, T. Anderson, K. Asanovic, N. Cardwell, R. Fromm, J. Golbus, B. Gribstad, K. Keeton, R. Thomas, N. Treuhaft, and K. Yelick, “Scalable Processors in the Billion-Transistor Era: IRAM,” Computer, vol. 30, no. 9, Sept. 1997.
[26] C. Lee, M. Potkonjak, and W.H. Mangione-Smith, “MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communication Systems,” Proc. 30th Int'l Symp. Microarchitecture, Dec. 1997.
[27] J. Llosa, A. González, E. Ayguadé, and M. Valero, “Swing Modulo Scheduling,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, Oct. 1996.
[28] S.A. Mahlke, D.C. Lin, W.Y. Chen, R.E. Hank, and R.A. Bringmann, “Effective Compiler Support for Predicated Execution Using the Hyperblock,” Proc. 25th Int'l Symp. Microarchitecture, Dec. 1992.
[29] E. Nystrom and A.E. Eichenberger, “Effective Cluster Assignment for Modulo Scheduling,” Proc. 31st Int'l Symp. Microarchitecture, 1998.
[30] M. Oskin, F.T. Chong, and T. Sherwood, “Active Pages: A Computation Model for Intelligent Memory,” Proc. 25th Ann. Int'l Symp. Computer Architecture, June 1998.
[31] E. Özer, S. Banerjia, and T.M. Conte, “Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File Microarchitectures,” Proc. 31st Symp. Microarchitecture, Nov. 1998.
[32] S. Palacharla, N.P. Jouppi, and J.E. Smith, “Complexity-Effective Superscalar Processors,” Proc. 24th Int'l Symp. Computer Architecture, June 1997.
[33] P. Panda, N. Dutt, and A. Nicolau, “Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications,” Proc. European Design and Test Conf., Mar. 1997.
[34] P. Racunas and Y. Patt, “Partitioned First-Level Cache Design for Clustered Microarchitecture,” Proc. 17th Int'l Conf. Supercomputing, June 2003.
[35] B.R. Rau, “Iterative Modulo Scheduling: An Algorithm for Software Pipelining Loops,” Proc. 27th Int'l Symp. Microarchitecture, Nov. 1994.
[36] J. Sánchez and A. González, “Cache Sensitive Modulo Scheduling,” Proc. 30th Int'l Symp. Microarchitecture, Dec. 1997.
[37] J. Sánchez and A. González, “The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures,” Proc. 29th Int'l Conf. Parallel Processing, Aug. 2000.
[38] J. Sánchez and A. González, “Modulo Scheduling for a Fully-Distributed Clustered VLIW Architecture,” Proc. 33rd Int'l Symp. Microarchitecture, Dec. 2000.
[39] K. Sankaralingam, R. Nagarajan, H. Liu, J. Huh, C.K. Kim, D. Burger, S.W. Keckler, and C.R. Moore, “Exploiting ILP, TLP, and DLP Using Polymorphism in the TRIPS Architecture,” Proc. 30th Ann. Int'l Symp. Computer Architecture, June 2003.
[40] S. Swanson, K. Michelson, A. Schwerin, and M. Oskin, “Wavescalar,” Proc. 36th Int'l Symp. Microarchitecture, Dec. 2003.
[41] Texas Instruments Inc., “TMS320C62x/67x CPU and Instruction Set Reference Guide,” 1998.
[42] M. Tomasevic and V. Milutinovic, “Hardware Approaches to Cache Coherence in Shared-Memory Multiprocessors,” IEEE Micro, vol. 14, nos. 5-6, pp. 52-59, 61-66, Oct., Dec. 1994.
[43] E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal, “Baring It All to Software: Raw Machines,” Computer, vol. 30, no. 9, Sept. 1997.
[44] Y. Wu, R. Rakvic, L. Chen, C. Miao, G. Chrysos, and J. Fang, “Compiler Managed Micro-Cache Bypassing for High Performance EPIC Processors,” Proc. 35th Int'l Symp. Microarchitecture, Nov. 2002.
[45] L. Zhang, Z. Fang, M. Parker, B. Mathew, L. Schaelicke, J. Carter, W. Hsieh, and S. McKee, “The Impulse Memory Controller,” IEEE Trans. Computers, special issue on advances in high-performance memory mystems, vol. 50, no. 11, Nov. 2001.
[46] V.V. Zyuban, “Inherently Lower-Power High-Performance Superscalar Architectures,” PhD thesis, Dept. of Computer Science and Eng., Univ. of Notre Dame, Mar. 2000.

