The Community for Technology Leaders
2016 IEEE International Symposium on High Performance Computer Architecture (HPCA) (2016)
Barcelona, Spain
March 12, 2016 to March 16, 2016
ISSN: 2378-203X
ISBN: 978-1-4673-9211-2
TABLE OF CONTENTS

[Front cover] (PDF)

pp. 1

Organizing committees (PDF)

pp. vii-xiv

Sponsors (PDF)

pp. xv

Table of contents (PDF)

pp. xvi-xxi

Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning (Abstract)

Mahdi Nazm Bojnordi , University of Rochester, Rochester, NY 14627 USA
Engin Ipek , University of Rochester, Rochester, NY 14627 USA
pp. 1-13

TABLA: A unified template-based framework for accelerating statistical machine learning (Abstract)

Divya Mahajan , Alternative Computing Technologies (ACT) Lab, Georgia Institute of Technology
Jongse Park , Alternative Computing Technologies (ACT) Lab, Georgia Institute of Technology
Emmanuel Amaro , Alternative Computing Technologies (ACT) Lab, Georgia Institute of Technology
Hardik Sharma , Alternative Computing Technologies (ACT) Lab, Georgia Institute of Technology
Amir Yazdanbakhsh , Alternative Computing Technologies (ACT) Lab, Georgia Institute of Technology
Joon Kyung Kim , Alternative Computing Technologies (ACT) Lab, Georgia Institute of Technology
Hadi Esmaeilzadeh , Alternative Computing Technologies (ACT) Lab, Georgia Institute of Technology
pp. 14-26

Pushing the limits of accelerator efficiency while retaining programmability (Abstract)

Tony Nowatzki , University of Wisconsin-Madison
Vinay Gangadhan , University of Wisconsin-Madison
Karthikeyan Sankaralingam , University of Wisconsin-Madison
Greg Wright , Qualcomm
pp. 27-39

A low power software-defined-radio baseband processor for the Internet of Things (Abstract)

Yajing Chen , University of Michigan, Ann Arbor
Shengshuo Lu , University of Michigan, Ann Arbor
Hun-Seok Kim , University of Michigan, Ann Arbor
David Blaauw , University of Michigan, Ann Arbor
Ronald G. Dreslinski , University of Michigan, Ann Arbor
Trevor Mudge , University of Michigan, Ann Arbor
pp. 40-51

Improving smartphone user experience by balancing performance and energy with probabilistic QoS guarantee (Abstract)

Benjamin Gaudette , School of Computing, Informatics, & Decision Systems Engineering, Arizona State University, Tempe, AZ 85281
Carole-Jean Wu , School of Computing, Informatics, & Decision Systems Engineering, Arizona State University, Tempe, AZ 85281
Sarma Vrudhula , School of Computing, Informatics, & Decision Systems Engineering, Arizona State University, Tempe, AZ 85281
pp. 52-63

Mobile CPU's rise to power: Quantifying the impact of generational mobile CPU design trends on performance, energy, and user satisfaction (Abstract)

Matthew Halpern , The University of Texas at Austin, Department of Electrical and Computer Engineering
Yuhao Zhu , The University of Texas at Austin, Department of Electrical and Computer Engineering
Vijay Janapa Reddi , The University of Texas at Austin, Department of Electrical and Computer Engineering
pp. 64-76

Atomic persistence for SCM with a non-intrusive backend controller (Abstract)

Kshitij Doshi , Intel Corp., Portland, Oregon
Ellis Giles , Rice University, Houston, Texas
Peter Varman , Rice University, Houston, Texas
pp. 77-89

CompEx: Compression-expansion coding for energy, latency, and lifetime improvements in MLC/TLC NVM (Abstract)

Poovaiah M. Palangappa , Department of Electrical and Computer Engineering, University of Pittsburgh, PA
Kartik Mohanram , Department of Electrical and Computer Engineering, University of Pittsburgh, PA
pp. 90-101

A low-power hybrid reconfigurable architecture for resistive random-access memories (Abstract)

Miguel Angel Lastras-Montano , Electrical and Computer Engineering Department, University of California, Santa Barbara
Amirali Ghofrani , Electrical and Computer Engineering Department, University of California, Santa Barbara
Kwang-Ting Cheng , Electrical and Computer Engineering Department, University of California, Santa Barbara
pp. 102-113

A performance analysis framework for optimizing OpenCL applications on FPGAs (Abstract)

Zeke Wang , Nanyang Technological University, Singapore
Bingsheng He , Nanyang Technological University, Singapore
Wei Zhang , HKUST
Shunning Jiang , Cornell University
pp. 114-125

Software transparent dynamic binary translation for coarse-grain reconfigurable architectures (Abstract)

Matthew A. Watkins , Lafayette College, Easton, PA
Tony Nowatzki , Univ. of Wisconsin-Madison, Madison, WI
Anthony Carno , Virginia Tech, Blacksburg, VA
pp. 138-150

Core tunneling: Variation-aware voltage noise mitigation in GPUs (Abstract)

Renji Thomas , Department of Computer Science and Engineering, The Ohio State University
Kristin Barber , Department of Computer Science and Engineering, The Ohio State University
Naser Sedaghati , Department of Computer Science and Engineering, The Ohio State University
Li Zhou , Department of Computer Science and Engineering, The Ohio State University
Radu Teodorescu , Department of Computer Science and Engineering, The Ohio State University
pp. 151-162

Warped-preexecution: A GPU pre-execution approach for improving latency hiding (Abstract)

Keunsoo Kim , School of Electrical and Electronic Engineering, Yonsei University
Sangpil Lee , School of Electrical and Electronic Engineering, Yonsei University
Myung Kuk Yoon , School of Electrical and Electronic Engineering, Yonsei University
Gunjae Koo , Ming Hsieh Department of Electrical Engineering, University of Southern California
Won Woo Ro , School of Electrical and Electronic Engineering, Yonsei University
Murali Annavaram , Ming Hsieh Department of Electrical Engineering, University of Southern California
pp. 163-175

Approximating warps with intra-warp operand value similarity (Abstract)

Daniel Wong , University of California, Riverside
Nam Sung Kim , University of Illinois, Urbana-Champaign
Murali Annavaram , University of Southern California
pp. 176-187

A case for toggle-aware compression for GPU systems (Abstract)

Gennady Pekhimenko , Carnegie Mellon University
Nandita Vijaykumar , Carnegie Mellon University
Onur Mutlu , Carnegie Mellon University
Todd C. Mowry , Carnegie Mellon University
pp. 188-200

Minimal disturbance placement and promotion (Abstract)

Elvira Teran , Texas A&M University
Yingying Tian , Advanced Micro Devices, Inc.
Zhe Wang , Intel Labs
Daniel A. Jimenez , Texas A&M University
pp. 201-211

Revisiting virtual L1 caches: A practical design using dynamic synonym remapping (Abstract)

Hongil Yoon , Computer Sciences Department, University of Wisconsin-Madison
Gurindar S. Sohi , Computer Sciences Department, University of Wisconsin-Madison
pp. 212-224

Modeling cache performance beyond LRU (Abstract)

Nathan Beckmann , Massachusetts Institute of Technology
Daniel Sanchez , Massachusetts Institute of Technology
pp. 225-236

Efficient footprint caching for Tagless DRAM Caches (Abstract)

Hakbeom Jang , Sungkyunkwan University, Suwon, Korea
Yongjun Lee , Sungkyunkwan University, Suwon, Korea
Jongwon Kim , Sungkyunkwan University, Suwon, Korea
Youngsok Kim , POSTECH, Pohang, Korea
Jangwoo Kim , POSTECH, Pohang, Korea
Jinkyu Jeong , Sungkyunkwan University, Suwon, Korea
Jae W. Lee , Sungkyunkwan University, Suwon, Korea
pp. 237-248

SCsafe: Logging sequential consistency violations continuously and precisely (Abstract)

Yuelu Duan , University of Illinois at Urbana-Champaign
David Koufaty , Intel Labs
Josep Torrellas , University of Illinois at Urbana-Champaign
pp. 249-260

LASER: Light, Accurate Sharing dEtection and Repair (Abstract)

Liang Luo , University of Washington
Akshitha Sriraman , University of Michigan
Brooke Fugate , University of Pennsylvania
Shiliang Hu , Intel Corporation
Gilles Pokam , Intel Corporation
Chris J. Newburn , Intel Corporation
Joseph Devietti , University of Pennsylvania
pp. 261-273

Efficient GPU hardware transactional memory through early conflict resolution (Abstract)

Sui Chen , Division of Electrical & Computer Engineering, Louisiana State University
Lu Peng , Division of Electrical & Computer Engineering, Louisiana State University
pp. 274-284

PleaseTM: Enabling transaction conflict management in requester-wins hardware transactional memory (Abstract)

Sunjae Park , Georgia Institute of Technology
Milos Prvulovic , Georgia Institute of Technology
pp. 285-296

Efficient synthetic traffic models for large, complex SoCs (Abstract)

Jieming Yin , Advanced Micro Devices, Inc.
Onur Kayiran , Advanced Micro Devices, Inc.
Matthew Poremba , Advanced Micro Devices, Inc.
Natalie Enright Jerger , Advanced Micro Devices, Inc.
Gabriel H. Loh , Advanced Micro Devices, Inc.
pp. 297-308

DVFS for NoCs in CMPs: A thread voting approach (Abstract)

Yuan Yao , KTH Royal Institute of Technology, Stockholm, Sweden
Zhonghai Lu , KTH Royal Institute of Technology, Stockholm, Sweden
pp. 309-320

SLaC: Stage laser control for a flattened butterfly network (Abstract)

Yigit Demir , Intel Corporation, Portland, OR, USA
Nikos Hardavellas , Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL, USA
pp. 321-332

The runahead network-on-chip (Abstract)

Zimo Li , University of Toronto
Joshua San Miguel , University of Toronto
Natalie Enright Jerger , University of Toronto
pp. 333-344

Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing (Abstract)

Zhenning Wang , ShanghaiJiao Tong University, P. R. China
Jun Yang , Electrical and Computer Engineering Department, ShanghaiJiao Tong University, P. R. China
Rami Melhem , Department of Computer Science, University of Pittsburgh, U. S. A.
Bruce Childers , Department of Computer Science, University of Pittsburgh, U. S. A.
Youtao Zhang , Department of Computer Science, University of Pittsburgh, U. S. A.
Minyi Guo , ShanghaiJiao Tong University, P. R. China
pp. 358-369

iPAWS: Instruction-issue pattern-based adaptive warp scheduling for GPGPUs (Abstract)

Minseok Lee , KAIST, Daejeon, Korea
Gwangsun Kim , KAIST, Daejeon, Korea
John Kim , KAIST, Daejeon, Korea
Woong Seo , Samsung Electronics, Giheung, Korea
Yeongon Cho , Samsung Electronics, Giheung, Korea
Soojung Ryu , Samsung Electronics, Giheung, Korea
pp. 370-381

Lattice priority scheduling: Low-overhead timing-channel protection for a shared memory controller (Abstract)

Andrew Ferraiuolo , Cornell University, Ithaca, NY 14850, USA
Yao Wang , Cornell University, Ithaca, NY 14850, USA
Danfeng Zhang , Penn State University, University Park, PA 16802
Andrew C. Myers , Cornell University, Ithaca, NY 14850, USA
G. Edward Suh , Cornell University, Ithaca, NY 14850, USA
pp. 382-393

A complete key recovery timing attack on a GPU (Abstract)

Zhen Hang Jiang , Electrical & Computer Engineering Department, Northeastern University, Boston, MA 02115 USA
Yunsi Fei , Electrical & Computer Engineering Department, Northeastern University, Boston, MA 02115 USA
David Kaeli , Electrical & Computer Engineering Department, Northeastern University, Boston, MA 02115 USA
pp. 394-405

CATalyst: Defeating last-level cache side channel attacks in cloud computing (Abstract)

Fangfei Liu , Department of Electrical Engineering, Princeton University
Qian Ge , NICTA
Yuval Yarom , NICTA
Frank Mckeen , Intel Labs
Carlos Rozas , Intel Labs
Ruby B. Lee , Department of Electrical Engineering, Princeton University
pp. 406-418

Predicting the memory bandwidth and optimal core allocations for multi-threaded applications on large-scale NUMA machines (Abstract)

Wei Wang , Department of Computer Science, University of Virginia
Jack W. Davidson , Department of Computer Science, University of Virginia
Mary Lou Soffa , Department of Computer Science, University of Virginia
pp. 419-431

A market approach for handling power emergencies in multi-tenant data center (Abstract)

Mohammad A. Islam , University of California, Riverside
Xiaoqi Ren , California Institute of Technology
Shaolei Ren , University of California, Riverside
Adam Wierman , California Institute of Technology
Xiaorui Wang , The Ohio State University
pp. 432-443

SizeCap: Efficiently handling power surges in fuel cell powered data centers (Abstract)

Yang Li , Carnegie Mellon University
Di Wang , Microsoft Corporation
Saugata Ghose , Carnegie Mellon University
Jie Liu , Microsoft Corporation
Sriram Govindan , Microsoft Corporation
Sean James , Microsoft Corporation
Eric Peterson , Microsoft Corporation
John Siegler , Microsoft Corporation
Rachata Ausavarungnirun , Carnegie Mellon University
Onur Mutlu , Carnegie Mellon University
pp. 444-456

MaPU: A novel mathematical computing architecture (Abstract)

Donglin Wang , CASIA, Beijing, China
Xueliang Du , CASIA, Beijing, China
Leizu Yin , Spreadtrum Comm, Inc.
Chen Lin , CASIA, Beijing, China
Hong Ma , CASIA, Beijing, China
Weili Ren , CASIA, Beijing, China
Huijuan Wang , CASIA, Beijing, China
Xingang Wang , CASIA, Beijing, China
Shaolin Xie , CASIA, Beijing, China
Lei Wang , CASIA, Beijing, China
Zijun Liu , CASIA, Beijing, China
Tao Wang , Huawei Tech Co, Ltd.
Zhonghua Pu , CASIA, Beijing, China
Guangxin Ding , CASIA, Beijing, China
Mengchen Zhu , CASIA, Beijing, China
Lipeng Yang , CASIA, Beijing, China
Ruoshan Guo , CASIA Beijing China
Zhiwei Zhang , CASIA, Beijing, China
Xiao Lin , CASIA, Beijing, China
Jie Hao , CASIA, Beijing, China
Yongyong Yang , Huawei Tech Co, Ltd.
Wenqin Sun , CASIA, Beijing, China
Fabiao Zhou , CASIA, Beijing, China
NuoZhou Xiao , CASIA, Beijing, China
Qian Cui , CASIA, Beijing, China
Xiaoqin Wang , CASIA Beijing China
pp. 457-468

Best-offset hardware prefetching (Abstract)

Pierre Michaud , Inria, Campus de Beaulieu, Rennes, France
pp. 469-480

DUANG: Fast and lightweight page migration in asymmetric memory systems (Abstract)

Hao Wang , University of Wisconsin, Madison
Jie Zhang , Yonsei University School of Integrated Technology Yonsei University Convergence Technology
Sharmila Shridhar , University of Wisconsin, Madison
Gieseo Park , University of Texas, Dallas
Myoungsoo Jung , Yonsei University School of Integrated Technology Yonsei University Convergence Technology
Nam Sung Kim , University of Illinois, Urbana-Champaign
pp. 481-493

Selective GPU caches to eliminate CPU-GPU HW cache coherence (Abstract)

Neha Agarwal , University of Michigan
David Nellans , NVIDIA
Thomas F. Wenisch , University of Michigan
John Danskin , NVIDIA
pp. 494-506

Venice: Exploring server architectures for effective resource sharing (Abstract)

Jianbo Dong , SKL Computer Architecture, ICT, CAS
Rui Hou , SKL Computer Architecture, ICT, CAS
Michael Huang , University of Rochester
Tao Jiang , SKL Computer Architecture, ICT, CAS
Boyan Zhao , SKL Computer Architecture, ICT, CAS
Sally A. McKee , Chalmers University of Technology
Haibin Wang , Huawei Technologies Co., Ltd
Xiaosong Cui , Huawei Technologies Co., Ltd
Lixin Zhang , SKL Computer Architecture, ICT, CAS
pp. 507-518

A large-scale study of soft-errors on GPUs in the field (Abstract)

Bin Nie , College of William and Mary
Devesh Tiwari , Oak Ridge National Laboratory
Saurabh Gupta , Oak Ridge National Laboratory
Evgenia Smirni , College of William and Mary
James H. Rogers , Oak Ridge National Laboratory
pp. 519-530

Design and implementation of a mobile storage leveraging the DRAM interface (Abstract)

Sungyong Seo , Memory Business, Samsung Electronics Co., Ltd.
Youngjin Cho , Memory Business, Samsung Electronics Co., Ltd.
Youngkwang Yoo , Memory Business, Samsung Electronics Co., Ltd.
Otae Bae , Memory Business, Samsung Electronics Co., Ltd.
Jaegeun Park , Memory Business, Samsung Electronics Co., Ltd.
Heehyun Nam , Memory Business, Samsung Electronics Co., Ltd.
Sunmi Lee , Memory Business, Samsung Electronics Co., Ltd.
Yongmyung Lee , Memory Business, Samsung Electronics Co., Ltd.
Seungdo Chae , Memory Business, Samsung Electronics Co., Ltd.
Moonsang Kwon , Memory Business, Samsung Electronics Co., Ltd.
Jin-Hyeok Choi , Memory Business, Samsung Electronics Co., Ltd.
Sangyeun Cho , Memory Business, Samsung Electronics Co., Ltd.
Jaeheon Jeong , Memory Business, Samsung Electronics Co., Ltd.
Duckhyun Chang , Memory Business, Samsung Electronics Co., Ltd.
pp. 531-542

Restore truncation for performance improvement in future DRAM systems (Abstract)

Xianwei Zhang , Computer Science Department, University of Pittsburgh, PA, USA
Youtao Zhang , Computer Science Department, University of Pittsburgh, PA, USA
Bruce R. Childers , Computer Science Department, University of Pittsburgh, PA, USA
Jun Yang , Electrical and Computer Engineering Department, University of Pittsburgh, PA, USA
pp. 543-554

Parity Helix: Efficient protection for single-dimensional faults in multi-dimensional memory systems (Abstract)

Xun Jian , University of Illinois at Urbana-Champaign
Vilas Sridharan , RAS Architecture, Advanced Micro Devices, Inc.
Rakesh Kumar , University of Illinois at Urbana-Champaign
pp. 555-567

Low-Cost Inter-Linked Subarrays (LISA): Enabling fast inter-subarray data movement in DRAM (Abstract)

Kevin K. Chang , Carnegie Mellon University
Prashant J. Nair , Georgia Institute of Technology
Donghyuk Lee , Carnegie Mellon University
Saugata Ghose , Carnegie Mellon University
Moinuddin K. Qureshi , Georgia Institute of Technology
Onur Mutlu , Carnegie Mellon University
pp. 568-580

ChargeCache: Reducing DRAM latency by exploiting row access locality (Abstract)

Hasan Hassan , Carnegie Mellon University
Gennady Pekhimenko , Carnegie Mellon University
Nandita Vijaykumar , Carnegie Mellon University
Vivek Seshadri , Carnegie Mellon University
Donghyuk Lee , Carnegie Mellon University
Oguz Ergin , TOBB University of Economics & Technology
Onur Mutlu , Carnegie Mellon University
pp. 581-593

Amdahl's law for lifetime reliability scaling in heterogeneous multicore processors (Abstract)

William J. Song , School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332
Saibal Mukhopadhyay , School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332
Sudhakar Yalamanchili , School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332
pp. 594-605

LiveSim: Going live with microarchitecture simulation (Abstract)

Sina Hassani , Dept. of Computer Engineering, University of California, Santa Cruz
Gabriel Southern , Dept. of Computer Engineering, University of California, Santa Cruz
Jose Renau , Dept. of Computer Engineering, University of California, Santa Cruz
pp. 606-617

Energy-efficient address translation (Abstract)

Vasileios Karakostas , Barcelona Supercomputing Center
Jayneel Gandhi , University of Wisconsin - Madison
Adrian Cristal , Barcelona Supercomputing Center
Mark D. Hill , University of Wisconsin - Madison
Kathryn S. McKinley , Microsoft Research
Mario Nemirovsky , ICREA at Barcelona Supercomputing Center
Michael M. Swift , University of Wisconsin - Madison
Osman S. Unsal , Barcelona Supercomputing Center
pp. 631-643

RADAR: Runtime-assisted dead region management for last-level caches (Abstract)

Madhavan Manivannan , Chalmers University of Technology
Vassilis Papaefstathiou , Chalmers University of Technology
Miquel Pericas , Chalmers University of Technology
Per Stenstrom , Chalmers University of Technology
pp. 644-656

Cache QoS: From concept to reality in the Intel® Xeon® processor E5-2600 v3 product family (Abstract)

Andrew Herdrich , Intel Corporation
Edwin Verplanke , Intel Corporation
Priya Autee , Intel Corporation
Ramesh Illikkal , Intel Corporation
Chris Gianos , Intel Corporation
Ronak Singhal , Intel Corporation
Ravi Iyer , Intel Corporation
pp. 657-668

Symbiotic job scheduling on the IBM POWER8 (Abstract)

Josue Feliu , Dept. of Computer Engineering (DISCA), Universitat Politècnica de València, València, Spain
Stijn Eyerman , Dept. of Electronics and Information Systems (ELIS), Ghent University, Ghent, Belgium
Julio Sahuquillo , Dept. of Computer Engineering (DISCA), Universitat Politècnica de València, València, Spain
Salvador Petit , Dept. of Computer Engineering (DISCA), Universitat Politècnica de València, València, Spain
pp. 669-680

ScalCore: Designing a core for voltage scalability (Abstract)

Bhargava Gopireddy , University of Illinois at Urbana-Champaign
Choungki Song , University of Wisconsin Madison
Josep Torrellas , University of Illinois at Urbana-Champaign
Nam Sung Kim , University of Illinois at Urbana-Champaign
Aditya Agrawal , NVIDIA Corp.
Asit Mishra , Intel Corp.
pp. 681-693

Cost effective physical register sharing (Abstract)

Arthur Perais , IRISA/INRISA
Andre Seznec , IRISA/INRISA
pp. 694-706

Author index (PDF)

pp. 707-723
87 ms
(Ver 3.3 (11022016))