The Community for Technology Leaders
2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2011)
Porto Alegre, Brazil
Dec. 3, 2011 to Dec. 7, 2011
ISBN: 978-1-5090-6605-6
TABLE OF CONTENTS

Front matter (Abstract)

pp. 1-14

Active management of timing guardband to save energy in POWER7 (Abstract)

Charles R. Lefurgy , IBM, 11501 Burnet Rd., Austin, TX, USA
Alan J. Drake , IBM, 11501 Burnet Rd., Austin, TX, USA
Michael S. Floyd , IBM, 11400 Burnet Rd., Austin, TX, USA
Malcolm S. Allen-Ware , IBM, 11501 Burnet Rd., Austin, TX, USA
Bishop Brock , IBM, 11400 Burnet Rd., Austin, TX, USA
Jose A. Tierno , IBM, 1101 Kitchawan Rd., Yorktown Heights, NY, USA
John B. Carter , IBM, 11501 Burnet Rd., Austin, TX, USA
pp. 1-11

Bundled execution of recurring traces for energy-efficient general purpose processing (Abstract)

Shantanu Gupta , Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor, USA
Shuguang Feng , Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor, USA
Amin Ansari , Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor, USA
Scott Mahlke , Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor, USA
David August , Department of Computer Science, Princeton University, NJ, USA
pp. 12-23

Minimalist open-page: A DRAM page-mode scheduling policy for the many-core era (Abstract)

Dimitris Kaseridis , Electrical and Computer Engineering, The University of Texas at Austin, USA
Jeffrey Stuecheli , IBM Corp. & Electrical and Computer Engineering, The University of Texas at Austin, USA
Lizy Kurian John , Electrical and Computer Engineering, The University of Texas at Austin, USA
pp. 24-35

The NoX router (Abstract)

Mitchell Hayenga , Department of Electrical and Computer Engineering, University of Wisconsin-Madison, USA
Mikko Lipasti , Department of Electrical and Computer Engineering, University of Wisconsin-Madison, USA
pp. 36-46

A systematic methodology to develop resilient cache coherence protocols (Abstract)

Konstantinos Aisopos , Princeton University, NJ, USA
Li-Shiuan Peh , Massachusetts Institute of Technology, Cambridge, USA
pp. 47-58

Dataflow execution of sequential imperative programs on multicore architectures (Abstract)

Gagan Gupta , Computer Sciences Department, University of Wisconsin-Madison, USA
Gurindar S. Sohi , Computer Sciences Department, University of Wisconsin-Madison, USA
pp. 59-70

Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication (Abstract)

Tushar Krishna , Department of EECS, MIT, Cambridge, MA, USA
Li-Shiuan Peh , Department of EECS, MIT, Cambridge, MA, USA
Bradford M. Beckmann , AMD Research, Bellevue, WA, USA
Steven K. Reinhardt , AMD Research, Bellevue, WA, USA
pp. 71-82

Packet chaining: Efficient single-cycle allocation for on-chip networks (Abstract)

George Michelogiannakis , Electrical Engineering Dept., Stanford University, CA 94305, USA
Nan Jiang , Electrical Engineering Dept., Stanford University, CA 94305, USA
Daniel Becker , Electrical Engineering Dept., Stanford University, CA 94305, USA
William J. Dally , Electrical Engineering Dept., Stanford University, CA 94305, USA
pp. 83-94

Resilient microring resonator based photonic networks (Abstract)

Christopher J. Nitta , Department of Computer Science, University of California, Davis, One Shields Avenue, USA
Matthew K. Farrens , Department of Computer Science, University of California, Davis, One Shields Avenue, USA
Venkatesh Akella , Department of Electrical & Computer Engineering, University of California, Davis, One Shields Avenue, USA
pp. 95-104

FeatherWeight: Low-cost optical arbitration with QoS support (Abstract)

Yan Pan , Globalfoundries Inc., Malta, NY, USA
John Kim , Web Science Technology Division & Dept. of Computer Science, KAIST, Daejeon, Korea
Gokhan Memik , Dept. of Electrical Eng. and Computer Science, Northwestern University, Evanston, IL, USA
pp. 105-116

A new case for the TAGE branch predictor (Abstract)

Andre Seznec , INRIA/IRISA, Campus de Beaulieu, 35042 Rennes Cedex, France
pp. 117-127

Identifying and predicting timing-critical instructions to boost timing speculation (Abstract)

Jing Xin , Department of EECS, Northwestern University, USA
Russ Joseph , Department of EECS, Northwestern University, USA
pp. 128-139

Idempotent processor architecture (Abstract)

Marc de Kruijf , Vertical Research Group, University of Wisconsin - Madison, USA
Karthikeyan Sankaralingam , Vertical Research Group, University of Wisconsin - Madison, USA
pp. 140-151

Proactive instruction fetch (Abstract)

Michael Ferdman , Computer Architecture Lab (CALCM), Carnegie Mellon University, Pittsburgh, PA, USA
Cansu Kaynak , Parallel Systems Architecture Lab (PARSA), Ecole Polytechnique Fédérale de Lausanne, Switzerland
Babak Falsafi , Parallel Systems Architecture Lab (PARSA), Ecole Polytechnique Fédérale de Lausanne, Switzerland
pp. 152-162

QSCORES: Trading dark silicon for scalable energy efficiency with quasi-specific cores (Abstract)

Ganesh Venkatesh , Department of Computer Science and Engineering, University of California, San Diego, USA
Jack Sampson , Department of Computer Science and Engineering, University of California, San Diego, USA
Nathan Goulding-Hotta , Department of Computer Science and Engineering, University of California, San Diego, USA
Sravanthi Kota Venkata , Department of Computer Science and Engineering, University of California, San Diego, USA
Michael Bedford Taylor , Department of Computer Science and Engineering, University of California, San Diego, USA
Steven Swanson , Department of Computer Science and Engineering, University of California, San Diego, USA
pp. 163-174

Pack & Cap: Adaptive DVFS and thread packing under power caps (Abstract)

Ryan Cochran , School of Engineering, Brown University, Providence, RI 02912, USA
Can Hankendi , ECE Department, Boston University, MA 02215, USA
Ayse K. Coskun , ECE Department, Boston University, MA 02215, USA
Sherief Reda , School of Engineering, Brown University, Providence, RI 02912, USA
pp. 175-185

Preventing PCM banks from seizing too much power (Abstract)

Andrew Hay , Dept. of Comp. Science, University of Auckland, NZ
Karin Strauss , Microsoft Research, Microsoft, Inc., Redmond, WA, USA
Timothy Sherwood , Dept. of Computer Science, University of California, Santa Barbara, USA
Gabriel H. Loh , Microsoft Research, Microsoft, Inc., Redmond, WA, USA
Doug Burger , Microsoft Research, Microsoft, Inc., Redmond, WA, USA
pp. 186-195

CRAM: Coded registers for amplified multiporting (Abstract)

Vignyan Reddy Kothinti Naresh , Department of Electrical and Computer Engineering, University of Wisconsin-Madison, USA
David J. Palframan , Department of Electrical and Computer Engineering, University of Wisconsin-Madison, USA
Mikko H. Lipasti , Department of Electrical and Computer Engineering, University of Wisconsin-Madison, USA
pp. 196-205

ATDetector: Improving the accuracy of a commercial data race detector by identifying address transfer (Abstract)

Jiaqi Zhang , Dept of Computer Science, University of California San Diego, La Jolla, 92093, USA
Weiwei Xiong , Dept of Computer Science, University of Illinois at Urbana-Champaign, 61801, USA
Yang Liu , Dept of Computer Science, University of California San Diego, La Jolla, 92093, USA
Soyeon Park , Dept of Computer Science, University of California San Diego, La Jolla, 92093, USA
Yuanyuan Zhou , Dept of Computer Science, University of California San Diego, La Jolla, 92093, USA
Zhiqiang Ma , Software and Services Group, Intel Corporation, Champaign, IL 61820, USA
pp. 206-215

CoreRacer: A practical memory race recorder for multicore x86 TSO processors (Abstract)

Gilles Pokam , Intel Corporation, USA
Cristiano Pereira , Intel Corporation, USA
Shiliang Hu , Intel Corporation, USA
Ali-Reza Adl-Tabatabai , Intel Corporation, USA
Justin Gottschlich , Intel Corporation, USA
Jungwoo Ha , Google, USA
Youfeng Wu , Intel Corporation, USA
pp. 216-225

Manager-client pairing: A framework for implementing coherence hierarchies (Abstract)

Jesse G. Beu , Georgia Institute of Technology, Atlanta, USA
Michael C. Rosier , Apple Inc., Cupertino, CA USA
Thomas M. Conte , Georgia Institute of Technology, Atlanta, USA
pp. 226-236

TransCom: Transforming stream communication for load balance and efficiency in networks-on-chip (Abstract)

Ahmed H. Abdel-Gawad , School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA
Mithuna Thottethodi , School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA
pp. 237-247

Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations (Abstract)

Jason Mars , University of Virginia, USA
Lingjia Tang , University of Virginia, USA
Robert Hundt , Google, USA
Kevin Skadron , University of Virginia, USA
Mary Lou Soffa , University of Virginia, USA
pp. 248-259

System-level integrated server architectures for scale-out datacenters (Abstract)

Sheng Li , Hewlett-Packard Labs, USA
Kevin Lim , Hewlett-Packard Labs, USA
Paolo Faraboschi , Hewlett-Packard Labs, USA
Jichuan Chang , Hewlett-Packard Labs, USA
Parthasarathy Ranganathan , Hewlett-Packard Labs, USA
Norman P. Jouppi , Hewlett-Packard Labs, USA
pp. 260-271

Architectural support for secure virtualization under a vulnerable hypervisor (Abstract)

Seongwook Jin , Computer Science Department, KAIST, Daejeon, Korea
Jeongseob Ahn , Computer Science Department, KAIST, Daejeon, Korea
Sanghoon Cha , Computer Science Department, KAIST, Daejeon, Korea
Jaehyuk Huh , Computer Science Department, KAIST, Daejeon, Korea
pp. 272-283

Complementing user-level coarse-grain parallelism with implicit speculative parallelism (Abstract)

Nikolas Ioannou , School of Informatics, University of Edinburgh, UK
Marcelo Cintra , School of Informatics, University of Edinburgh, UK
pp. 284-295

Hardware transactional memory for GPU architectures (Abstract)

Wilson W. L. Fung , Department of Computer and Electrical Engineering, University of British Columbia, Canada
Inderpreet Singh , Department of Computer and Electrical Engineering, University of British Columbia, Canada
Andrew Brownsword , Department of Computer and Electrical Engineering, University of British Columbia, Canada
Tor M. Aamodt , Department of Computer and Electrical Engineering, University of British Columbia, Canada
pp. 296-307

Improving GPU performance via large warps and two-level warp scheduling (Abstract)

Veynu Narasiman , The University of Texas at Austin, USA
Michael Shebanow , Nvidia Corporation, USA
Chang Joo Lee , Intel Corporation, USA
Rustam Miftakhutdinov , The University of Texas at Austin, USA
Onur Mutlu , Carnegie Mellon University, USA
Yale N. Patt , The University of Texas at Austin, USA
pp. 308-317

Pay-As-You-Go: Low-overhead hard-error correction for phase change memories (Abstract)

Moinuddin K. Qureshi , Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, USA
pp. 318-328

Multi retention level STT-RAM cache designs with a dynamic refresh scheme (Abstract)

Zhenyu Sun , Polytechnic Institute of New York University, 6 Metrotech Center, Brooklyn, USA
Xiuyuan Bi , Polytechnic Institute of New York University, 6 Metrotech Center, Brooklyn, USA
Hai Li , Polytechnic Institute of New York University, 6 Metrotech Center, Brooklyn, USA
Weng-Fai Wong , National University of Singapore, 13 Computing Drive, Singapore
Zhong-Liang Ong , National University of Singapore, 13 Computing Drive, Singapore
Xiaochun Zhu , Qualcomm Incorporated, 5775 Morehouse Drive, San Diego, USA
Wenqing Wu , Qualcomm Incorporated, 5775 Morehouse Drive, San Diego, USA
pp. 329-338

A resistive TCAM accelerator for data-intensive computing (Abstract)

Qing Guo , Department of Computer Science, University of Rochester, NY 14627 USA
Xiaochen Guo , Department of Electrical and Computer Engineering, University of Rochester, NY 14627 USA
Yuxin Bai , Department of Electrical and Computer Engineering, University of Rochester, NY 14627 USA
Engin Ipek , Department of Electrical and Computer Engineering, University of Rochester, NY 14627 USA
pp. 339-350

A register-file approach for row buffer caches in die-stacked DRAMs (Abstract)

Gabriel H. Loh , AMD Research, Advanced Micro Devices, Inc., USA
pp. 351-361

Parallel application memory scheduling (Abstract)

Eiman Ebrahimi , Department of Electrical and Computer Engineering, The University of Texas at Austin, USA
Rustam Miftakhutdinov , Department of Electrical and Computer Engineering, The University of Texas at Austin, USA
Chris Fallin , Carnegie Mellon University, USA
Chang Joo Lee , Intel Corporation, USA
Jose A. Joao , Department of Electrical and Computer Engineering, The University of Texas at Austin, USA
Onur Mutlu , Carnegie Mellon University, USA
Yale N. Patt , Department of Electrical and Computer Engineering, The University of Texas at Austin, USA
pp. 362-373

Reducing memory interference in multicore systems via application-aware memory channel partitioning (Abstract)

Sai Prashanth Muralidhara , Pennsylvania State University, USA
Lavanya Subramanian , Carnegie Mellon University, USA
Onur Mutlu , Carnegie Mellon University, USA
Mahmut Kandemir , Pennsylvania State University, USA
Thomas Moscibroda , Microsoft Research Asia, USA
pp. 374-385

Accelerating microprocessor silicon validation by exposing ISA diversity (Abstract)

Nikos Foutris , Dept. of Informatics & Telecom., University of Athens, Greece
Dimitris Gizopoulos , Dept. of Informatics & Telecom., University of Athens, Greece
Mihalis Psarakis , Dept. of Informatics, University of Piraeus, Greece
Xavier Vera , Intel Barcelona Research Center, Barcelona, Spain
Antonio Gonzalez , Intel Barcelona Research Center, Barcelona, Spain
pp. 386-397

Encore: Low-cost, fine-grained transient fault recovery (Abstract)

Shuguang Feng , Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor, USA
Shantanu Gupta , Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor, USA
Amin Ansari , Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor, USA
Scott A. Mahlke , Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor, USA
David I. August , Department of Computer Science, Princeton University, NJ, USA
pp. 398-409

Formally enhanced runtime verification to ensure NoC functional correctness (Abstract)

Ritesh Parikh , University of Michigan, Ann Arbor, USA
Valeria Bertacco , University of Michigan, Ann Arbor, USA
pp. 410-419

Residue cache: A low-energy low-area L2 cache architecture via compression and partial hits (Abstract)

Soontae Kim , KAIST, 335 Gwahangno Yuseong-gu, Daejeon Korea
Jesung Kim , LG Electronics, Gasan-dong Geumchun-gu, Seoul Korea
Jongmin Lee , KAIST, 335 Gwahangno Yuseong-gu, Daejeon Korea
Seokin Hong , KAIST, 335 Gwahangno Yuseong-gu, Daejeon Korea
pp. 420-429

SHiP: Signature-based Hit Predictor for high performance caching (Abstract)

Carole-Jean Wu , Princeton University, NJ, USA
Aamer Jaleel , Intel Corporation, VSSAD, Hudson, MA, USA
Will Hasenplaugh , Intel Corporation, VSSAD, Hudson, MA, USA
Margaret Martonosi , Princeton University, NJ, USA
Simon C. Steely , Intel Corporation, VSSAD, Hudson, MA, USA
Joel Emer , Intel Corporation, VSSAD, Hudson, MA, USA
pp. 430-441

PACMan: Prefetch-Aware Cache Management for high performance caching (Abstract)

Carole-Jean Wu , Princeton University, NJ, USA
Aamer Jaleel , Intel Corporation, VSSAD, Hudson, MA, USA
Margaret Martonosi , Princeton University, NJ, USA
Simon C. Steely , Intel Corporation, VSSAD, Hudson, MA, USA
Joel Emer , Intel Corporation, VSSAD, Hudson, MA, USA
pp. 442-453

Efficiently enabling conventional block sizes for very large die-stacked DRAM caches (Abstract)

Gabriel H. Loh , AMD Research, Advanced Micro Devices, Inc., USA
Mark D. Hill , Department of Computer Sciences, University of Wisconsin - Madison, USA
pp. 454-464

A compile-time managed multi-level register file hierarchy (Abstract)

Mark Gebhart , The University of Texas at Austin, USA
Stephen W. Keckler , The University of Texas at Austin, USA
William J. Dally , NVIDIA, Santa Clara, CA, USA
pp. 465-476

SIMD re-convergence at thread frontiers (Abstract)

Gregory Diamos , Georgia Institute of Technology, School of ECE, Atlanta, USA
Benjamin Ashbaugh , Intel Visual Computing Group, Folsom, CA, USA
Subramaniam Maiyuran , Intel Visual Computing Group, Folsom, CA, USA
Andrew Kerr , Georgia Institute of Technology, School of ECE, Atlanta, USA
Haicheng Wu , Georgia Institute of Technology, School of ECE, Atlanta, USA
Sudhakar Yalamanchili , Georgia Institute of Technology, School of ECE, Atlanta, USA
pp. 477-488

A data layout optimization framework for NUCA-based multicores (Abstract)

Yuanrui Zhang , Department of Computer Science and Engineering, The Pennsylvania State University, University Park, 16802, USA
Wei Ding , Department of Computer Science and Engineering, The Pennsylvania State University, University Park, 16802, USA
Mahmut Kandemir , Department of Computer Science and Engineering, The Pennsylvania State University, University Park, 16802, USA
Jun Liu , Department of Computer Science and Engineering, The Pennsylvania State University, University Park, 16802, USA
Ohyoung Jang , Department of Computer Science and Engineering, The Pennsylvania State University, University Park, 16802, USA
pp. 489-500

Author index (Abstract)

pp. 501-502
87 ms
(Ver 3.3 (11022016))