The Community for Technology Leaders
2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) (2014)
Orlando, FL, USA
Feb. 15, 2014 to Feb. 19, 2014
ISBN: 978-1-4799-3097-5
TABLE OF CONTENTS

Locality-aware data replication in the Last-Level Cache (Abstract)

George Kurian , Massachusetts Inst. of Technol., Cambridge, MA, USA
Srinivas Devadas , Massachusetts Inst. of Technol., Cambridge, MA, USA
Omer Khan , Univ. of Connecticut, Storrs, CT, USA
pp. 1-12

FADE: A programmable filtering accelerator for instruction-grain monitoring (Abstract)

Sotiria Fytraki , EcoCloud, EPFL, Switzerland
Evangelos Vlachos , Oracle Labs, USA
Onur Kocberber , EcoCloud, EPFL, Switzerland
Babak Falsafi , EcoCloud, EPFL, Switzerland
Boris Grot , University of Edinburgh, UK
pp. 108-119

Dynamically detecting and tolerating IF-Condition Data Races (Abstract)

Shanxiang Qi , University of Illinois at Urbana-Champaign, USA
Abdullah A. Muzahid , University of Illinois at Urbana-Champaign, USA
Wonsun Ahn , University of Illinois at Urbana-Champaign, USA
Josep Torrellas , University of Illinois at Urbana-Champaign, USA
pp. 120-131

Exploiting thermal energy storage to reduce data center capital and operating expenses (Abstract)

Wenli Zheng , The Ohio State University, Columbus, 43210, USA
Kai Ma , The Ohio State University, Columbus, 43210, USA
Xiaorui Wang , The Ohio State University, Columbus, 43210, USA
pp. 132-141

Implications of high energy proportional servers on cluster-wide energy proportionality (Abstract)

Daniel Wong , Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, 90089, USA
Murali Annavaram , Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, 90089, USA
pp. 142-153

Strategies for anticipating risk in heterogeneous system design (Abstract)

Marisabel Guevara , Duke University, USA
Benjamin Lubin , Boston University, USA
Benjamin C. Lee , Duke University, USA
pp. 154-164

TSO-CC: Consistency directed cache coherence for TSO (Abstract)

Marco Elver , University of Edinburgh, UK
Vijay Nagarajan , University of Edinburgh, UK
pp. 165-176

Stash directory: A scalable directory for many-core coherence (Abstract)

Socrates Demetriades , Computer Science Department, University of Pittsburgh, USA
Sangyeun Cho , Memory Division, Samsung Electronics Co., Korea
pp. 177-188

QuickRelease: A throughput-oriented approach to release consistency on GPUs (Abstract)

Blake A. Hechtman , Advanced Micro Devices, Inc., USA
Shuai Che , Advanced Micro Devices, Inc., USA
Derek R. Hower , Advanced Micro Devices, Inc., USA
Yingying Tian , Advanced Micro Devices, Inc., USA
Bradford M. Beckmann , Advanced Micro Devices, Inc., USA
Mark D. Hill , University of Wisconsin-Madison, Computer Sciences, USA
Steven K. Reinhardt , Advanced Micro Devices, Inc., USA
David A. Wood , University of Wisconsin-Madison, Computer Sciences, USA
pp. 189-200

A Non-Inclusive Memory Permissions architecture for protection against cross-layer attacks (Abstract)

Jesse Elwell , State University of New York at Binghamton, USA
Ryan Riley , Qatar University, Qatar
Nael Abu-Ghazaleh , State University of New York at Binghamton, USA
Dmitry Ponomarev , State University of New York at Binghamton, USA
pp. 201-212

Suppressing the Oblivious RAM timing channel while making information leakage and program efficiency trade-offs (Abstract)

Christopher W. Fletchery , Massachusetts Institute of Technology, USA
Ling Ren , Massachusetts Institute of Technology, USA
Xiangyao Yu , Massachusetts Institute of Technology, USA
Marten Van Dijk , University of Connecticut, USA
Omer Khan , University of Connecticut, USA
Srinivas Devadas , Massachusetts Institute of Technology, USA
pp. 213-224

Adaptive placement and migration policy for an STT-RAM-based hybrid cache (Abstract)

Zhe Wang , Texas A&M University, USA
Daniel A. Jimenez , Texas A&M University, USA
Cong Xu , Pennsylvania State University, USA
Guangyu Sun , Peking University, USA
Yuan Xie , Pennsylvania State University, USA
pp. 13-24

Timing channel protection for a shared memory controller (Abstract)

Yao Wang , Cornell University, Ithaca, NY 14850, USA
Andrew Ferraiuolo , Cornell University, Ithaca, NY 14850, USA
G. Edward Suh , Cornell University, Ithaca, NY 14850, USA
pp. 225-236

STM: Cloning the spatial and temporal memory access behavior (Abstract)

Amro Awad , Dept. of Electrical and Computer Engineering, North Carolina State University, USA
Yan Solihin , Dept. of Electrical and Computer Engineering, North Carolina State University, USA
pp. 237-247

A scalable multi-path microarchitecture for efficient GPU control flow (Abstract)

Ahmed ElTantawy , University of British Columbia, Canada
Jessica Wenjie Ma , University of British Columbia, Canada
Mike O'Connor , NVIDIA Research, USA
Tor M. Aamodt , University of British Columbia, Canada
pp. 248-259

Improving GPGPU resource utilization through alternative thread block scheduling (Abstract)

Minseok Lee , KAIST, Daejeon, Korea
Seokwoo Song , KAIST, Daejeon, Korea
Joosik Moon , KAIST, Daejeon, Korea
John Kim , KAIST, Daejeon, Korea
Woong Seo , Samsung Electronics, Giheung, Korea
Yeongon Cho , Samsung Electronics, Giheung, Korea
Soojung Ryu , Samsung Electronics, Giheung, Korea
pp. 260-271

MRPB: Memory request prioritization for massively parallel processors (Abstract)

Wenhao Jia , Princeton University, USA
Kelly A. Shaw , University of Richmond, USA
Margaret Martonosi , Princeton University, USA
pp. 272-283

Warp-level divergence in GPUs: Characterization, impact, and mitigation (Abstract)

Ping Xiang , Dept. of Electrical and Computer Engineering, North Carolina State University, Raleigh, USA
Yi Yang , Dept. of Computing Systems Architecture, NEC Laboratories America, Princeton, NJ, USA
Huiyang Zhou , Dept. of Electrical and Computer Engineering, North Carolina State University, Raleigh, USA
pp. 284-295

MP3: Minimizing performance penalty for power-gating of Clos network-on-chip (Abstract)

Lizhong Chen , Ming Hsieh Department of Electrical Engineering, 2Information Sciences Institute, University of Southern California, Los Angeles, USA
Lihang Zhao , Ming Hsieh Department of Electrical Engineering, 2Information Sciences Institute, University of Southern California, Los Angeles, USA
Ruisheng Wang , Ming Hsieh Department of Electrical Engineering, 2Information Sciences Institute, University of Southern California, Los Angeles, USA
Timothy M. Pinkston , Ming Hsieh Department of Electrical Engineering, 2Information Sciences Institute, University of Southern California, Los Angeles, USA
pp. 296-307

Up by their bootstraps: Online learning in Artificial Neural Networks for CMP uncore power management (Abstract)

Jae-Yeon Won , Texas A&M University, USA
Xi Chen , Texas A&M University, USA
Paul Gratz , Texas A&M University, USA
Jiang Hu , Texas A&M University, USA
Vassos Soteriou , Cyprus University of Technology, Cyprus
pp. 308-319

QORE: A fault tolerant network-on-chip architecture with power-efficient quad-function channel (QFC) buffers (Abstract)

Dominic DiTomaso , Electrical Engineering and Computer Science, Ohio University, Athens, 45701, USA
Avinash Kodi , Electrical Engineering and Computer Science, Ohio University, Athens, 45701, USA
Ahmed Louri , Electrical and Computer Engineering, University of Arizona, Tucson, 85721, USA
pp. 320-331

Transportation-network-inspired network-on-chip (Abstract)

Hanjoon Kim , Dept. of Computer Science, KAIST, Korea
Gwangsun Kim , Dept. of Computer Science, KAIST, Korea
Seungryoul Maeng , Dept. of Computer Science, KAIST, Korea
Hwasoo Yeo , Dept. of Civil & Environmental Engineering, KAIST, Korea
John Kim , Dept. of Computer Science, KAIST, Korea
pp. 332-343

DASCA: Dead Write Prediction Assisted STT-RAM Cache Architecture (Abstract)

Junwhan Ahn , Department of Electrical and Computer Engineering, Seoul National University, Republic of Korea
Sungjoo Yoo , Department of Electrical Engineering, POSTECH, Pohang, Republic of Korea
Kiyoung Choi , Department of Electrical and Computer Engineering, Seoul National University, Republic of Korea
pp. 25-36

Improving system throughput and fairness simultaneously in shared memory CMP systems via Dynamic Bank Partitioning (Abstract)

Mingli Xie , Microprocessor Research and Development Center, Peking University, Beijing, China
Dong Tong , Microprocessor Research and Development Center, Peking University, Beijing, China
Kan Huang , Microprocessor Research and Development Center, Peking University, Beijing, China
Xu Cheng , Microprocessor Research and Development Center, Peking University, Beijing, China
pp. 344-355

Improving DRAM performance by parallelizing refreshes with accesses (Abstract)

Kevin Kai-Wei Chang , Carnegie Mellon University, USA
Donghyuk Lee , Carnegie Mellon University, USA
Zeshan Chishti , Intel Labs, USA
Alaa R. Alameldeen , Intel Labs, USA
Chris Wilkerson , Intel Labs, USA
Yoongu Kim , Carnegie Mellon University, USA
Onur Mutlu , Carnegie Mellon University, USA
pp. 356-367

CREAM: A Concurrent-Refresh-Aware DRAM Memory architecture (Abstract)

Tao Zhang , The Department of Computer Science and Engineering, Pennsylvania State University, USA
Matt Poremba , The Department of Computer Science and Engineering, Pennsylvania State University, USA
Cong Xu , The Department of Computer Science and Engineering, Pennsylvania State University, USA
Guangyu Sun , The School of Electronics Engineering and Computer Science, Peking University, USA
Yuan Xie , The Department of Computer Science and Engineering, Pennsylvania State University, USA
pp. 368-379

DraMon: Predicting memory bandwidth usage of multi-threaded programs with high accuracy and low overhead (Abstract)

Wei Wang , Department of Computer Science, University of Virginia, USA
Tanima Dey , Department of Computer Science, University of Virginia, USA
Jack W. Davidson , Department of Computer Science, University of Virginia, USA
Mary Lou Soffa , Department of Computer Science, University of Virginia, USA
pp. 380-391

PVCoherence: Designing flat coherence protocols for scalable verification (Abstract)

Meng Zhang , Department of ECE, Duke University, USA
Jesse D. Bingham , Intel Corporation, USA
John Erickson , Intel Corporation, USA
Daniel J. Sorin , Department of ECE, Duke University, USA
pp. 392-403

Atomic SC for simple in-order processors (Abstract)

Dibakar Gope , University of Wisconsin - Madison, USA
Mikko H. Lipasti , University of Wisconsin - Madison, USA
pp. 404-415

Concurrent and consistent virtual machine introspection with hardware transactional memory (Abstract)

Yutao Liu , Shanghai Key Laboratory of Scalable Computing and Systems, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University, China
Yubin Xia , Shanghai Key Laboratory of Scalable Computing and Systems, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University, China
Haibing Guan , Shanghai Key Laboratory of Scalable Computing and Systems, Department of Computer Science, Shanghai Jiao Tong University, China
Binyu Zang , Shanghai Key Laboratory of Scalable Computing and Systems, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University, China
Haibo Chen , Shanghai Key Laboratory of Scalable Computing and Systems, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University, China
pp. 416-427

Practical data value speculation for future high-end processors (Abstract)

Arthur Perais , IRISA/INRIA, Rennes, France
Andre Seznec , IRISA/INRIA, Rennes, France
pp. 428-439

Tangle: Route-oriented dynamic voltage minimization for variation-afflicted, energy-efficient on-chip networks (Abstract)

Amin Ansari , University of Illinois at Urbana-Champaign, USA
Asit Mishra , Intel Corporation, USA
Jianping Xu , Intel Corporation, USA
Josep Torrellas , University of Illinois at Urbana-Champaign, USA
pp. 440-451

Improving cache performance using read-write partitioning (Abstract)

Samira Khan , Carnegie Mellon University, USA
Alaa R. Alameldeen , Intel Labs, USA
Chris Wilkerson , Intel Labs, USA
Onur Mutluy , Carnegie Mellon University, USA
Daniel A. Jimenezz , Texas A&M University, USA
pp. 452-463

A detailed GPU cache model based on reuse distance theory (Abstract)

Cedric Nugteren , Eindhoven University of Technology, The Netherlands
Gert-Jan van den Braak , Eindhoven University of Technology, The Netherlands
Henk Corporaal , Eindhoven University of Technology, The Netherlands
Henri Bal , Vrije Universiteit Amsterdam, The Netherlands
pp. 37-48

NUAT: A non-uniform access time memory controller (Abstract)

Wongyu Shin , Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Korea
Jeongmin Yang , Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Korea
Jungwhan Choi , Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Korea
Lee-Sup Kim , Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Korea
pp. 464-475

Improving in-memory database index performance with Intel® Transactional Synchronization Extensions (Abstract)

Tomas Karnagel , Intel Corporation, Munich, Germany and Hillsboro, USA
Roman Dementiev , Intel Corporation, Munich, Germany and Hillsboro, USA
Ravi Rajwar , Intel Corporation, Munich, Germany and Hillsboro, USA
Konrad Lai , Intel Corporation, Munich, Germany and Hillsboro, USA
Thomas Legler , SAP AG, Database Development, Walldorf, Germany
Benjamin Schlegel , TU Dresden, Database Technology Group, Germany
Wolfgang Lehner , TU Dresden, Database Technology Group, Germany
pp. 476-487

BigDataBench: A big data benchmark suite from internet services (Abstract)

Lei Wang , State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), China
Jianfeng Zhan , State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), China
Chunjie Luo , State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), China
Yuqing Zhu , State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), China
Qiang Yang , State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), China
Yongqiang He , Dropbox, USA
Wanling Gao , State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), China
Zhen Jia , State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), China
Yingjie Shi , State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), China
Shujie Zhang , Huawei, China
Chen Zheng , State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), China
Gang Lu , State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), China
Kent Zhan , Tencent, China
Xiaona Li , Baidu, China
Bizhu Qiu , Yahoo! USA
pp. 488-499

3D stacking of high-performance processors (Abstract)

Philip Emma , IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
Alper Buyuktosunoglu , IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
Michael Healy , IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
Krishnan Kailas , IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
Valentin Puente , IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
Roy Yu , IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
Allan Hartstein , IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
Pradip Bose , IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
Jaime Moreno , IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
pp. 500-511

Reducing the cost of persistence for nonvolatile heaps in end user devices (Abstract)

Sudarsun Kannan , Georgia Institute of Technology, College of Computing, Atlanta, USA
Ada Gavrilovska , Georgia Institute of Technology, College of Computing, Atlanta, USA
Karsten Schwan , Georgia Institute of Technology, College of Computing, Atlanta, USA
pp. 512-523

Sprinkler: Maximizing resource utilization in many-chip solid state disks (Abstract)

Myoungsoo Jung , Department of EE, The University of Texas at Dallas, Computer Architecture and Memory Systems Laboratory, USA
Mahmut T. Kandemir , Department of CSE, The Pennsylvania State University, USA
pp. 524-535

Over-clocked SSD: Safely running beyond flash memory chip I/O clock specs (Abstract)

Kai Zhao , ECSE Department, Rensselaer Polytechnic Institute, USA
Xuebin Zhang , ECSE Department, Rensselaer Polytechnic Institute, USA
Jiangpeng Li , Shanghai Jiaotong University, China
Ning Zheng , ECSE Department, Rensselaer Polytechnic Institute, USA
Tong Zhang , ECSE Department, Rensselaer Polytechnic Institute, USA
pp. 536-545

GPUdmm: A high-performance and memory-oblivious GPU architecture using dynamic memory management (Abstract)

Youngsok Kim , Department of Computer Science and Engineering, POSTECH, Korea
Jaewon Lee , Department of Computer Science and Engineering, POSTECH, Korea
Jae-Eon Jo , Department of Computer Science and Engineering, POSTECH, Korea
Jangwoo Kim , Department of Computer Science and Engineering, POSTECH, Korea
pp. 546-557

Increasing TLB reach by exploiting clustering in page translations (Abstract)

Binh Pham , Department of Computer Science, Rutgers University, USA
Abhishek Bhattacharjee , Department of Computer Science, Rutgers University, USA
Yasuko Eckert , AMD Research, Advanced Micro Devices, Inc., USA
Gabriel H. Loh , AMD Research, Advanced Micro Devices, Inc., USA
pp. 558-567

Supporting x86-64 address translation for 100s of GPU lanes (Abstract)

Jason Power , Department of Computer Sciences, University of Wisconsin-Madison, USA
Mark D. Hill , Department of Computer Sciences, University of Wisconsin-Madison, USA
David A. Wood , Department of Computer Sciences, University of Wisconsin-Madison, USA
pp. 568-578

Precision-aware soft error protection for GPUs (Abstract)

David J. Palframan , Department of Electrical and Computer Engineering, University of Wisconsin-Madison, USA
Nam Sung Kim , Department of Electrical and Computer Engineering, University of Wisconsin-Madison, USA
Mikko H. Lipasti , Department of Electrical and Computer Engineering, University of Wisconsin-Madison, USA
pp. 49-59

Scalably verifiable dynamic power management (Abstract)

Opeoluwa Matthews , Department of Electrical and Computer Engineering, Duke University, USA
Meng Zhang , Department of Electrical and Computer Engineering, Duke University, USA
Daniel J. Sorin , Department of Electrical and Computer Engineering, Duke University, USA
pp. 579-590

Revolver: Processor architecture for power efficient loop execution (Abstract)

Mitchell Hayenga , ARM Inc., USA
Vignyan Reddy Kothinti Naresh , University of Wisconsin-Madison, USA
Mikko H. Lipasti , University of Wisconsin-Madison, USA
pp. 591-602

Dynamic management of TurboMode in modern multi-core chips (Abstract)

David Lo , Stanford University, USA
Christos Kozyrakis , Stanford University, USA
pp. 603-613

Spare register aware prefetching for graph algorithms on GPUs (Abstract)

Nagesh B. Lakshminarayana , School of Computer Science, Georgia Institute of Technology, USA
Hyesoon Kim , School of Computer Science, Georgia Institute of Technology, USA
pp. 614-625

Sandbox Prefetching: Safe run-time evaluation of aggressive prefetchers (Abstract)

Seth H Pugsley , University of Utah, USA
Zeshan Chishti , Intel Labs, USA
Chris Wilkerson , Intel Labs, USA
Peng-fei Chuang , Intel Software and Services Group, USA
Robert L Scott , Intel Software and Services Group, USA
Aamer Jaleel , Intel Corporation VSSAD, USA
Shih-Lien Lu , Intel Labs, USA
Kingsum Chow , Intel Software and Services Group, USA
Rajeev Balasubramonian , University of Utah, USA
pp. 626-637

MemZip: Exploring unconventional benefits from memory compression (Abstract)

Ali Shafiee , University of Utah, USA
Meysam Taassori , University of Utah, USA
Rajeev Balasubramonian , University of Utah, USA
Al Davis , University of Utah, USA
pp. 638-649

CDTT: Compiler-generated data-triggered threads (Abstract)

Hung-Wei Tseng , Department of Computer Science and Engineering, University of California, San Diego, La Jolla, U.S.A.
Dean M. Tullsen , Department of Computer Science and Engineering, University of California, San Diego, La Jolla, U.S.A.
pp. 650-661

Accelerating decoupled look-ahead via weak dependence removal: A metaheuristic approach (Abstract)

Raj Parihar , Dept. of Electrical & Computer Engineering, University of Rochester, NY 14627, USA
Michael C. Huang , Dept. of Electrical & Computer Engineering, University of Rochester, NY 14627, USA
pp. 662-677

Undersubscribed threading on clustered cache architectures (Abstract)

Wim Heirman , Ghent University, Belgium
Trevor E. Carlson , Ghent University, Belgium
Kenzo Van Craeynest , Ghent University, Belgium
Ibrahim Hur , Intel, ExaScience Lab, Belgium
Aamer Jaleel , Intel, VSSAD, Belgium
Lieven Eeckhout , Ghent University, Belgium
pp. 678-689

Understanding the impact of gate-level physical reliability effects on whole program execution (Abstract)

Raghuraman Balasubramanian , University of Wisconsin-Madison, USA
Karthikeyan Sankaralingam , University of Wisconsin-Madison, USA
pp. 60-71

Accordion: Toward soft Near-Threshold Voltage Computing (Abstract)

Ulya R. Karpuzcu , University of Minnesota, Twin Cities, USA
Ismail Akturk , University of Minnesota, Twin Cities, USA
Nam Sung Kim , University of Wisconsin, Madison, USA
pp. 72-83

Mosaic: Exploiting the spatial locality of process variation to reduce refresh energy in on-chip eDRAM modules (Abstract)

Aditya Agrawal , University of Illinois at Urbana-Champaign, USA
Amin Ansari , University of Illinois at Urbana-Champaign, USA
Josep Torrellas , University of Illinois at Urbana-Champaign, USA
pp. 84-95

Low-overhead and high coverage run-time race detection through selective meta-data management (Abstract)

Ruirui Huang , Intel Corporation, Hillsboro, OR 97124, USA
Erik Halberg , Cornell University, Ithaca, NY 14850, USA
Andrew Ferraiuolo , Cornell University, Ithaca, NY 14850, USA
G. Edward Suh , Cornell University, Ithaca, NY 14850, USA
pp. 96-107

Author index (PDF)

pp. 687-688

[Front cover] (PDF)

pp. c1

Table of contents (PDF)

pp. xiii-xx
94 ms
(Ver 3.3 (11022016))