The Community for Technology Leaders
2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA) (2011)
San Antonio, TX USA
Feb. 12, 2011 to Feb. 16, 2011
ISBN: 978-1-4244-9432-3
TABLE OF CONTENTS
Papers

[Front cover] (PDF)

pp. c1
Papers

Fg-STP: Fine-Grain Single Thread Partitioning on Multicores (Abstract)

Antonio Gonzalez , Intel Barcelona Research Center
Rakesn Ranjan , Intel Barcelona Research Center
Fernando Latorre , Intel Barcelona Research Center
Pedro Marcuello , Intel Barcelona Research Center
pp. 15-24

Thread block compaction for efficient SIMT control flow (Abstract)

Tor M. Aamodt , University of British Columbia
Wilson W. L. Fung , University of British Columbia
pp. 25-36

Low-voltage on-chip cache architecture using heterogeneous cell sizes for high-performance processors (Abstract)

Hamid Reza Ghasemi , Department of Electrical and Computer Engineering, University of Wisconsin-Madison
Nam Sung Kim , Department of Electrical and Computer Engineering, University of Wisconsin-Madison
Stark C. Draper , Department of Electrical and Computer Engineering, University of Wisconsin-Madison
pp. 38-49

Relaxing non-volatility for fast and energy-efficient STT-RAM caches (Abstract)

Mircea R. Stan , Department of Electrical and Computer Engineering, University of Virginia
Vidyabhushan Mohan , Department of Computer Science, University of Virginia
Anurag Nigam , Department of Electrical and Computer Engineering, University of Virginia
Sudhanva Gurumurthi , Department of Computer Science, University of Virginia
Clinton W. Smullen , Department of Computer Science, University of Virginia
pp. 50-61

Shared last-level TLBs for chip multiprocessors (Abstract)

Margaret Martonosi , Dept. of, Computer Science, Princeton University
Abhishek Bhattacharjee , Dept. of Computer Science, Rutgers University
Daniel Lustig , Dept. of Electrical Engineering, Princeton University
pp. 62-63

Bloom Filter Guided Transaction Scheduling (Abstract)

Trevor Mudge , Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor
Ronald G. Dreslinski , Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor
Geoffrey Blake , Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor
pp. 75-86

Dynamic parallelization of JavaScript applications using an ultra-lightweight speculation mechanism (Abstract)

Mehrzad Samadi , University of Michigan, Ann Arbor
Po-Chun Hsu , University of Michigan, Ann Arbor
Mojtaba Mehrara , University of Michigan, Ann Arbor
Scott Mahlke , University of Michigan, Ann Arbor
pp. 87-98

HAQu: Hardware-accelerated queueing for fine-grained threading on a chip multiprocessor (Abstract)

Devesh Tiwari , Department of Electrical & Computer Engineering, North Carolina State University
James Tuck , Department of Electrical & Computer Engineering, North Carolina State University
Yan Solihin , Department of Electrical & Computer Engineering, North Carolina State University
Sanghoon Lee , Department of Electrical & Computer Engineering, North Carolina State University
pp. 99-110

MOPED: Orchestrating interprocess message data on CMPs (Abstract)

Junli Gu , Institute of Microelectronics, Tsinghua University Beijing, China Illinois, USA
Steven S. Lumetta , Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Rakesh Kumar , Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Yihe Sun , Institute of Microelectronics, Tsinghua University Beijing, China Illinois, USA
pp. 111-120

Addressing system-level trimming issues in on-chip nanophotonic networks (Abstract)

Venkatesh Akella , University of California, Davis, Davis, CA 95616
Matthew Farrens , University of California, Davis, Davis, CA 95616
Christopher Nitta , University of California, Davis, Davis, CA 95616
pp. 122-131

Atomic Coherence: Leveraging nanophotonics to build race-free cache coherence protocols (Abstract)

Nathan Binkert , HP Labs, Palo Alto, CA
Dana Vantrease , Univ of Wisconsin - Madison, Madison, WI
Mikko H. Lipasti , Univ of Wisconsin - Madison, Madison, WI
pp. 132-143

CHIPPER: A low-complexity bufferless deflection router (Abstract)

Chris Craik , Computer Architecture Lab (CALCM), Carnegie Mellon University
Onur Mutlu , Computer Architecture Lab (CALCM), Carnegie Mellon University
Chris Fallin , Computer Architecture Lab (CALCM), Carnegie Mellon University
pp. 144-155

Power shifting in Thrifty Interconnection Network (Abstract)

Kun Wang , IBM Research - China, Beijing, China
Jian Li , IBM Research - Austin, Austin, TX
Richard R. Treumann , IBM System & Technology Group, Poughkeepsie, NY
Wei Huang , IBM Research - Austin, Austin, TX
Lixin Zhang , Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Charles Lefurgy , IBM Research - Austin, Austin, TX
Wolfgang E. Denzel , IBM Research - Zurich, Rueschlikon, Switzerland
pp. 156-167

Cuckoo directory: A scalable directory for many-core systems (Abstract)

Pejman Lotfi-Kamran , Parallel Systems Architecture Lab, École Polytechnique Fédérale de Lausanne
Ken Balet , Parallel Systems Architecture Lab, École Polytechnique Fédérale de Lausanne
Babak Falsafi , Parallel Systems Architecture Lab, École Polytechnique Fédérale de Lausanne
Michael Ferdman , Computer Architecture Lab, Carnegie Mellon University
pp. 169-180

Data-triggered threads: Eliminating redundant computation (Abstract)

Hung-Wei Tseng , Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, U.S.A.
Dean M. Tullsen , Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, U.S.A.
pp. 181-192

Fast thread migration via cache working set prediction (Abstract)

Dean M. Tullsen , University of California, San Diego, La Jolla, CA 92093-0404
Leo Porter , University of California, San Diego, La Jolla, CA 92093-0404
Jeffery A. Brown , University of California, San Diego, La Jolla, CA 92093-0404
pp. 193-204

SolarCore: Solar energy driven multi-core architecture power management (Abstract)

Chang-Burm Cho , Intelligent Design of Efficient Architectures Laboratory (IDEAL), Department of Electrical and, Computer Engineering, University of Florida
Chao Li , Intelligent Design of Efficient Architectures Laboratory (IDEAL), Department of Electrical and, Computer Engineering, University of Florida
Wangyuan Zhang , Intelligent Design of Efficient Architectures Laboratory (IDEAL), Department of Electrical and, Computer Engineering, University of Florida
Tao Li , Intelligent Design of Efficient Architectures Laboratory (IDEAL), Department of Electrical and, Computer Engineering, University of Florida
pp. 205-216

Keynote address II: How's the parallel computing revolution going? (Abstract)

Kathryn S. McKinley , The Department of Computer Science, The University of Texas at Austin
pp. 217

CloudCache: Expanding and shrinking private caches (Abstract)

Sangyeun Cho , Computer Science Department, University of Pittsburgh
Bruce R. Childers , Computer Science Department, University of Pittsburgh
Hyunjin Lee , Computer Science Department, University of Pittsburgh
pp. 219-230

MorphCache: A Reconfigurable Adaptive Multi-level Cache hierarchy (Abstract)

Mahmut Kandemir , The Pennsylvania State University, University Park, PA - 16802
Emre Kultursay , The Pennsylvania State University, University Park, PA - 16802
Mary Jane Irwin , The Pennsylvania State University, University Park, PA - 16802
Yuan Xie , The Pennsylvania State University, University Park, PA - 16802
Shekhar Srikantaiah , The Pennsylvania State University, University Park, PA - 16802
Tao Zhang , The Pennsylvania State University, University Park, PA - 16802
pp. 231-242

NUcache: An efficient multicore cache organization based on Next-Use distance (Abstract)

R Manikantan , Indian Institute of Science, Bangalore, India
Kaushik Rajan , Microsoft Research India, Bangalore, India
R Govindarajan , Indian Institute of Science, Bangalore, India
pp. 243-253

A new server I/O architecture for high speed networks (Abstract)

Xia Znu , Intel Labs
Laxmi Bnuyan , University of California, Riverside
Guangdeng Liao , University of California, Riverside
pp. 255-265

Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing (Abstract)

Xiaodong Zhang , Dept. of Computer Science & Engineering, The Ohio State University
Feng Chen , Dept. of Computer Science & Engineering, The Ohio State University
Rubao Lee , Institute of Computing Technology, Chinese Academy of Sciences
pp. 266-277

I-CASH: Intelligently Coupled Array of SSD and HDD (Abstract)

Jin Ren , Dept. of Electrical, Computer, and Biomedical Engineering, University of Rhode Island, Kingston, RI 02881
Qing Yang , Dept. of Electrical, Computer, and Biomedical Engineering, University of Rhode Island, Kingston, RI 02881
pp. 278-289

A case for guarded power gating for multi-core processors (Abstract)

Murali Annavaram , University of Southern California
Niti Madan , IBM T. J. Watson Research Center
Alper Buyuktosunoglu , IBM T. J. Watson Research Center
Pradip Bose , IBM T. J. Watson Research Center
pp. 291-300

Beyond block I/O: Rethinking traditional storage primitives (Abstract)

David Nellans , FusionIO
David Flynn , FusionIO
Robert Wipfel , FusionIO
Dhabaleswar K. Panda , The Ohio State University
Xiangyong Ouyang , FusionIO
pp. 301-311

Efficient data streaming with on-chip accelerators: Opportunities and challenges (Abstract)

Hubertus Franke , IBM T. J. Watson Research Center
Xiaotao Chang , IBM China Research Laboratory
Yi Ge , IBM China Research Laboratory
Rui Hou , IBM China Research Laboratory
Kun Wang , IBM China Research Laboratory
Lixin Zhang , National Research Center of High Performance Computers, Institute of Computing, Technology, Chinese Academy of Sciences
Michael C. Huang , IBM T. J. Watson Research Center
pp. 312-320

Hardware/software-based diagnosis of load-store queues using expandable activity logs (Abstract)

Tanausu Ramirez , Intel Barcelona Research Center, Intel Labs - UPC, Barcelona, Spain
Javier Carretero , Intel Barcelona Research Center, Intel Labs - UPC, Barcelona, Spain
Xavier Vera , Intel Barcelona Research Center, Intel Labs - UPC, Barcelona, Spain
Antonio Gonzalez , Intel Barcelona Research Center, Intel Labs - UPC, Barcelona, Spain
Jaume Abella , Intel Barcelona Research Center, Intel Labs - UPC, Barcelona, Spain
Matteo Monchiero , Intel Barcelona Research Center, Intel Labs - UPC, Barcelona, Spain
pp. 321-331

Calvin: Deterministic or not? Free will to choose (Abstract)

David A. Wood , Computer Sciences Department, University of Wisconsin-Madison, 1210 W Dayton St, Madison, WI 53706
Mark D. Hill , Computer Sciences Department, University of Wisconsin-Madison, 1210 W Dayton St, Madison, WI 53706
Derek R Hower , Computer Sciences Department, University of Wisconsin-Madison, 1210 W Dayton St, Madison, WI 53706
Polina Dudnik , Computer Sciences Department, University of Wisconsin-Madison, 1210 W Dayton St, Madison, WI 53706
pp. 333-334

Mercury: A fast and energy-efficient multi-level cell based Phase Change Memory system (Abstract)

Wangyuan Zhang , Intelligent Design of Efficient Architectures Laboratory (IDEAL), Department of Electrical and, Computer Engineering, University of Florida
Tao Li , Intelligent Design of Efficient Architectures Laboratory (IDEAL), Department of Electrical and, Computer Engineering, University of Florida
Madhura Joshi , Intelligent Design of Efficient Architectures Laboratory (IDEAL), Department of Electrical and, Computer Engineering, University of Florida
pp. 345-356

Offline symbolic analysis to infer Total Store Order (Abstract)

Dongyoon Lee , University of Michigan, Ann Arbor
Zijiang Yang , Western Michigan University
Satish Narayanasamy , University of Michigan, Ann Arbor
Mahmoud Said , Western Michigan University
pp. 357-358

Safe and efficient supervised memory systems (Abstract)

David A. Wood , University of Wisconsin-Madison
Jayaram Bobba , Intel Corporation
Marc Lupon , Universitat Politècnica de Catalunya
Mark D. Hill , University of Wisconsin-Madison
pp. 369-380

A quantitative performance analysis model for GPU architectures (Abstract)

John D. Owens , Department of Electrical and Computer Engineering, University of California, Davis
Yao Zhang , Department of Electrical and Computer Engineering, University of California, Davis
pp. 382-393

Abstraction and microarchitecture scaling in early-stage power modeling (Abstract)

Emrah Acar , IBM T. J. Watson Research Center, Yorktown Heights, NY
Alper Buyuktosunoglu , IBM T. J. Watson Research Center, Yorktown Heights, NY
Pradip Bose , IBM T. J. Watson Research Center, Yorktown Heights, NY
Richard Eickemeyer , IBM Systems & Technology, Group, Rochester, MN
Hans Jacobson , IBM T. J. Watson Research Center, Yorktown Heights, NY
pp. 394-405

HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing (Abstract)

Michel Kinsy , Computation Structures Group, Computer Science and A.I. Lab, Massachusetts Institute of Technology
Michael Pellauer , Computation Structures Group, Computer Science and A.I. Lab, Massachusetts Institute of Technology
Angshuman Parashar , VSSAD Group, Intel Corporation
Michael Adler , VSSAD Group, Intel Corporation
Joel Emer , Computation Structures Group, Computer Science and A.I. Lab, Massachusetts Institute of Technology
pp. 406-417

Checked Load: Architectural support for JavaScript type-checking on mobile processors (Abstract)

Owen Anderson , Computer Science and Engineering, University of Washington
Luis Ceze , Computer Science and Engineering, University of Washington
Susan Eggers , Computer Science and Engineering, University of Washington
Emily Fortuna , Computer Science and Engineering, University of Washington
pp. 419-430

Exploiting criticality to reduce bottlenecks in distributed uniprocessors (Abstract)

Sibi Govindan , Department of Computer Science, The University of Texas at Austin
Stephen W. Keckler , Department of Computer Science, The University of Texas at Austin
Doug Burger , Microsoft Research
Behnam Robatmili , Department of Computer Science, The University of Texas at Austin
pp. 431-442

Storage free confidence estimation for the TAGE branch predictor (Abstract)

Andre Seznec , Centre de Recherche INRIA Rennes Bretagne-Atlantique, Campus de Beaulieu, 35042, Rennes Cedex, France
pp. 443-454

Architectural framework for supporting operating system survivability (Abstract)

Xiaowei Jiang , North Carolina State University
Yan Solihin , North Carolina State University
pp. 456-465

FREE-p: Protecting non-volatile memory against both hard and soft errors (Abstract)

Naveen Muralimanohar , Hewlett-Packard Labs, Intelligent Infrastructure Lab
Mattan Erez , The University of Texas at Austin, Electrical and Computer Engineering Dept
Parthasarathy Ranganathan , Hewlett-Packard Labs, Intelligent Infrastructure Lab
Doe Hyun Yoon , The University of Texas at Austin, Electrical and Computer Engineering Dept
Jichuan Chang , Hewlett-Packard Labs, Intelligent Infrastructure Lab
Norman P. Jouppi , Hewlett-Packard Labs, Intelligent Infrastructure Lab
pp. 466-477

Practical and secure PCM systems by online detection of malicious write streams (Abstract)

Michele M. Franceschini , IBM T. J. Watson Research Center, Yorktown Heights, NY
Moinuddin K. Qureshi , IBM T. J. Watson Research Center, Yorktown Heights, NY
Andre Seznec , IRISA/INRIA, France
Luis A. Lastras , IBM T. J. Watson Research Center, Yorktown Heights, NY
pp. 478-489

Efficient complex operators for irregular codes (Abstract)

Saturnino Garcia , Department of Computer Science & Engineering, University of California at San Diego
Michael Bedford Taylor , Department of Computer Science & Engineering, University of California at San Diego
Nathan Goulding-Hotta , Department of Computer Science & Engineering, University of California at San Diego
Ganesh Venkatesh , Department of Computer Science & Engineering, University of California at San Diego
Steven Swanson , Department of Computer Science & Engineering, University of California at San Diego
Jack Sampson , Department of Computer Science & Engineering, University of California at San Diego
pp. 491-502

Dynamically Specialized Datapaths for energy efficient computing (Abstract)

Venkatraman Govindaraju , Vertical Research Group, University of Wisconsin-Madison
Karthikeyan Sankaralingam , Vertical Research Group, University of Wisconsin-Madison
Chen-Han Ho , Vertical Research Group, University of Wisconsin-Madison
pp. 503-514

Hardware/software techniques for DRAM thermal management (Abstract)

Gokhan Memik , Northwestern University
Brian Leung , Northwestern University
Nikos Hardavellas , Northwestern University
Song Liu , Northwestern University
Alexander Neckar , Stanford University
Seda Ogrenci Memik , Northwestern University
pp. 515-525

ACCESS: Smart scheduling for asymmetric cache CMPs (Abstract)

Asit Mishra , Dept. of Computer Science and Engineering, The Pennsylvania State University
Paul Brett , Intel Labs, Intel Corporation
Sadagopan Srinivasan , Intel Labs, Intel Corporation
Chita R. Das , Dept. of Computer Science and Engineering, The Pennsylvania State University
Zhen Fang , Intel Labs, Intel Corporation
Srihari Makineni , Intel Labs, Intel Corporation
Li Zhao , Intel Labs, Intel Corporation
Xiaowei Jiang , Intel Labs, Intel Corporation
Ravishankar Iyer , Intel Labs, Intel Corporation
pp. 527-538

Archipelago: A polymorphic cache design for enabling robust near-threshold operation (Abstract)

Shuguang Feng , Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor, MI 48109
Shantanu Gupta , Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor, MI 48109
Scott Mahlke , Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor, MI 48109
Amin Ansari , Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor, MI 48109
pp. 539-550

Author index (PDF)

pp. 551-553
89 ms
(Ver )