The Community for Technology Leaders
2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) (2015)
Burlingame, CA, USA
Feb. 7, 2015 to Feb. 11, 2015
ISBN: 978-1-4799-8930-0
TABLE OF CONTENTS

Table of contents (PDF)

pp. i-vi

Exploring architectural heterogeneity in intelligent vision systems (Abstract)

Nanchini Chandramoorthy , The Pennsylvania State University
Giuseppe Tagliavini , University of Bologna
Kevin Irick , The Pennsylvania State University
Antonio Pullini , ETH Zurich
Siddharth Advani , The Pennsylvania State University
Sulaiman Al Habsi , The Pennsylvania State University
Matthew Cotter , The Pennsylvania State University
John Sampson , The Pennsylvania State University
Vijaykrishnan Narayanan , The Pennsylvania State University
Luca Benini , University of Bologna
pp. 1-12

BeBoP: A cost effective predictor infrastructure for superscalar value prediction (Abstract)

Arthur Perais , IRISA/INRIA, Campus de Beaulieu 35042 Rennes, France
Andre Seznec , IRISA/INRIA, Campus de Beaulieu 35042 Rennes, France
pp. 13-25

VSR sort: A novel vectorised sorting algorithm & architecture extensions for future microprocessors (Abstract)

Timothy Hayes , Barcelona Supercomputing Center
Oscar Palomar , Barcelona Supercomputing Center
Osman Unsal , Barcelona Supercomputing Center
Adrian Cristal , Barcelona Supercomputing Center
Mateo Valero , Barcelona Supercomputing Center
pp. 26-38

Increasing multicore system efficiency through intelligent bandwidth shifting (Abstract)

Victor Jimenez , IBM T. J. Watson Research Center, Yorktown Heights, NY 10598
Alper Buyuktosunoglu , IBM T. J. Watson Research Center, Yorktown Heights, NY 10598
Pradip Bose , IBM T. J. Watson Research Center, Yorktown Heights, NY 10598
Francis P. O'Connell , Apple Inc., Lone Star Design Center, Austin, TX 78746
Francisco Cazorla , Barcelona Supercomputing Center, Barcelona, Spain
Mateo Valero , Barcelona Supercomputing Center, Barcelona, Spain
pp. 39-50

Exploiting compressed block size as an indicator of future reuse (Abstract)

Gennady Pekhimenko , Carnegie Mellon University
Tyler Huberty , Carnegie Mellon University
Rui Cai , Carnegie Mellon University
Onur Mutlu , Carnegie Mellon University
Phillip B. Gibbons , Intel Labs Pittsburgh
Michael A. Kozuch , Intel Labs Pittsburgh
Todd C. Mowry , Carnegie Mellon University
pp. 51-63

Talus: A simple way to remove cliffs in cache performance (Abstract)

Nathan Beckmann , Massachusetts Institute of Technology
Daniel Sanchez , Massachusetts Institute of Technology
pp. 64-75

Coordinated static and dynamic cache bypassing for GPUs (Abstract)

Xiaolong Xie , Center for Energy-Efficient Computing and Applications, School of EECS, Peking University, China
Yun Liang , Center for Energy-Efficient Computing and Applications, School of EECS, Peking University, China
Yu Wang , Tsinghua National Laboratory for Information Science and Technology, Department of EE, Tsinghua University, China
Guangyu Sun , Center for Energy-Efficient Computing and Applications, School of EECS, Peking University, China
Tao Wang , Center for Energy-Efficient Computing and Applications, School of EECS, Peking University, China
pp. 76-88

Priority-based cache allocation in throughput processors (Abstract)

Dong Li , The University of Texas at Austin
Minsoo Rhu , The University of Texas at Austin
Mike O'Connor , The University of Texas at Austin
Mattan Erez , The University of Texas at Austin
Doug Burger , Microsoft
Donald S. Fussell , The University of Texas at Austin
Stephen W. Redder , The University of Texas at Austin
pp. 89-100

Bamboo ECC: Strong, safe, and flexible codes for reliable computer memory (Abstract)

Jungrae Kim , Department of Electrical and Computer Engineering The University of Texas at Austin
Michael Sullivan , Department of Electrical and Computer Engineering The University of Texas at Austin
Mattan Erez , Department of Electrical and Computer Engineering The University of Texas at Austin
pp. 101-112

XChange: A market-based approach to scalable dynamic multi-resource allocation in multicore architectures (Abstract)

Xiaodong Wang , Computer Systems Laboratory, Cornell University, Ithaca, NY 14853 USA
Jose F. Martinez , Computer Systems Laboratory, Cornell University, Ithaca, NY 14853 USA
pp. 113-125

Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories (Abstract)

Mitesh R. Meswani , AMD Research, Advanced Micro Devices, Inc.
Sergey Blagodurov , AMD Research, Advanced Micro Devices, Inc.
David Roberts , AMD Research, Advanced Micro Devices, Inc.
John Slice , AMD Research, Advanced Micro Devices, Inc.
Mike Ignatowski , AMD Research, Advanced Micro Devices, Inc.
Gabriel H. Loh , AMD Research, Advanced Micro Devices, Inc.
pp. 126-136

Event-based scheduling for energy-efficient QoS (eQoS) in mobile Web applications (Abstract)

Yuhao Zhu , Department of Electrical and Computer Engineering The University of Texas at Austin
Matthew Halpern , Department of Electrical and Computer Engineering The University of Texas at Austin
Vijay Janapa Reddi , Department of Electrical and Computer Engineering The University of Texas at Austin
pp. 137-149

Domain knowledge based energy management in handhelds (Abstract)

Nachiappan Chidambaram Nachiappan , The Pennsylvania State University
Praveen Yedlapalli , The Pennsylvania State University
Anand Sivasubramaniam , The Pennsylvania State University
Mahmut T. Kandemir , The Pennsylvania State University
Ravi Iyer , Intel Corp.
Chita R. Das , The Pennsylvania State University
pp. 150-160

GPU voltage noise: Characterization and hierarchical smoothing of spatial and temporal voltage noise interference in GPU architectures (Abstract)

Jingwen Leng , Department of Electrical and Computer Engineering, The University of Texas at Austin
Yazhou Zu , Department of Electrical and Computer Engineering, The University of Texas at Austin
Vijay Janapa Reddi , Department of Electrical and Computer Engineering, The University of Texas at Austin
pp. 161-173

Mascar: Speeding up GPU warps by reducing memory pitstops (Abstract)

Ankit Sethia , Advanced Computer Architecture Laboratory University of Michigan, Ann Arbor, MI
D. Anoushe Jamshidi , Advanced Computer Architecture Laboratory University of Michigan, Ann Arbor, MI
Scott Mahlke , Advanced Computer Architecture Laboratory University of Michigan, Ann Arbor, MI
pp. 174-185

Hierarchical private/shared classification: The key to simple and efficient coherence for clustered cache hierarchies (Abstract)

Alberto Ros , Department of Computer Engineering Universidad de Murcia, Spain
Mahdad Davari , Department of Information Technology Uppsala University, Sweden
Stefanos Kaxiras , Department of Information Technology Uppsala University, Sweden
pp. 186-197

Flask coherence: A morphable hybrid coherence protocol to balance energy, performance and scalability (Abstract)

Lucia G. Menezo , University of Cantabria, Santander, Spain
Valentin Puente , University of Cantabria, Santander, Spain
Jose-Angel Gregorio , University of Cantabria, Santander, Spain
pp. 198-209

Prediction-based superpage-friendly TLB designs (Abstract)

Misel-Myrto Papadopoulou , Dept. of Electrical and Computer Engineering, University of Toronto, Toronto, Canada
Xin Tong , Dept. of Electrical and Computer Engineering, University of Toronto, Toronto, Canada
Andre Seznec , IRISA/EMRIA, Rennes, France
Andreas Moshovos , Dept. of Electrical and Computer Engineering, University of Toronto, Toronto, Canada
pp. 210-222

Supporting superpages in non-contiguous physical memory (Abstract)

Yu Du , Department of Computer Science, University of Pittsburgh
Miao Zhou , Department of Computer Science, University of Pittsburgh
Bruce R. Childers , Department of Computer Science, University of Pittsburgh
Daniel Mosse , Department of Computer Science, University of Pittsburgh
Rami Melhem , Department of Computer Science, University of Pittsburgh
pp. 223-234

Paying to save: Reducing cost of colocation data center via rewards (Abstract)

Mohammad A. Islam , Florida International University
Hasan Mahmud , Florida International University
Shaolei Ren , Florida International University
Xiaorui Wang , The Ohio State University
pp. 235-245

Octopus-Man: QoS-driven task management for heterogeneous multicores in warehouse-scale computers (Abstract)

Vinicius Petrucci , Federal University of Bahia, Salvador, BA, Brazil
Michael A. Laurenzano , Clarity Lab, University of Michigan, Ann Arbor, MI, USA
John Doherty , Clarity Lab, University of Michigan, Ann Arbor, MI, USA
Yunqi Zhang , Clarity Lab, University of Michigan, Ann Arbor, MI, USA
Daniel Mosse , University of Pittsburgh, Pittsburgh, PA, USA
Jason Mars , Clarity Lab, University of Michigan, Ann Arbor, MI, USA
Lingjia Tang , Clarity Lab, University of Michigan, Ann Arbor, MI, USA
pp. 246-258

Understanding the virtualization "Tax" of scale-out pass-through GPUs in GaaS clouds: An empirical study (Abstract)

Ming Liu , Intelligent Design of Efficient Architectures Laboratory (IDEAL), Department of Electrical and Computer Engineering, University of Florida
Tao Li , Intelligent Design of Efficient Architectures Laboratory (IDEAL), Department of Electrical and Computer Engineering, University of Florida
Neo Jia , NVIDIA
Andy Currid , NVIDIA
Vladimir Troy , NVIDIA
pp. 259-270

Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting (Abstract)

Chang-Hong Hsu , Clarity Lab University of Michigan - Ann Arbor, MI
Yunqi Zhang , Clarity Lab University of Michigan - Ann Arbor, MI
Michael A. Laurenzano , Clarity Lab University of Michigan - Ann Arbor, MI
David Meisner , Facebook, Inc, Menlo Park, CA
Thomas Wenisch , Clarity Lab University of Michigan - Ann Arbor, MI
Jason Mars , Clarity Lab University of Michigan - Ann Arbor, MI
Lingjia Tang , Clarity Lab University of Michigan - Ann Arbor, MI
Ronald G. Dreslinski , Clarity Lab University of Michigan - Ann Arbor, MI
pp. 271-282

NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules (Abstract)

Amin Farmahini-Farahani , University of Wisconsin-Madison
Jung Ho Ahn , Seoul National University
Katherine Morrow , University of Wisconsin-Madison
Nam Sung Kim , University of Wisconsin-Madison
pp. 283-295

Alloy: Parallel-serial memory channel architecture for single-chip heterogeneous processor systems (Abstract)

Hao Wang , The University of Wisconsin-Madison
Chang-Jae Park , Samsung Electronics
Gyung-su Byun , Southern Methodist University
Jung Ho Ahn , Seoul National University
Nam Sung Kim , The University of Wisconsin-Madison
pp. 296-308

Reducing read latency of phase change memory via early read and Turbo Read (Abstract)

Prashant J. Nair , School of Electrical and Computer Engineering Georgia Institute of Technology
Chiachen Chou , School of Electrical and Computer Engineering Georgia Institute of Technology
Bipin Rajendran , Department of Electrical Engineering Indian Institute of Technology, Bombay
Moinuddin K. Qureshi , School of Electrical and Computer Engineering Georgia Institute of Technology
pp. 309-319

CAFO: Cost aware flip optimization for asymmetric memories (Abstract)

Rakan Maddah , Computer Science Department, University of Pittsburgh
Seyed Mohammad Seyedzadeh , Computer Science Department, University of Pittsburgh
Rami Melhem , Computer Science Department, University of Pittsburgh
pp. 320-330

Understanding GPU errors on large-scale HPC systems and the implications for system design and operation (Abstract)

Devesh Tiwari , Oak Ridge Leadership Computing Facility, Oak Ridge National Laboratory
Saurabh Gupta , Oak Ridge Leadership Computing Facility, Oak Ridge National Laboratory
James Rogers , Oak Ridge Leadership Computing Facility, Oak Ridge National Laboratory
Don Maxwell , Oak Ridge Leadership Computing Facility, Oak Ridge National Laboratory
Paolo Rech , Federal University of Rio Grande do Sul
Sudharshan Vazhkudai , Oak Ridge Leadership Computing Facility, Oak Ridge National Laboratory
Daniel Oliveira , Federal University of Rio Grande do Sul
Dave Londo , Cray Inc.
Nathan DeBardeleben , Los Alamos National Laboratory
Philippe Navaux , Federal University of Rio Grande do Sul
Luigi Carro , Federal University of Rio Grande do Sul
Arthur Bland , Oak Ridge Leadership Computing Facility, Oak Ridge National Laboratory
pp. 331-342

High performing cache hierarchies for server workloads: Relaxing inclusion to capture the latency benefits of exclusive caches (Abstract)

Aamer Jaleel , Intel Corporation, Hudson, MA
Joseph Nuzman , Intel Corporation, Hudson, MA
Adrian Moga , Intel Corporation, Hudson, MA
Simon C. Steely , Intel Corporation, Hudson, MA
Joel Emer , Massachusetts Institute of Technology (MIT) Cambridge, MA
pp. 343-353

Unlocking bandwidth for GPUs in CC-NUMA systems (Abstract)

Neha Agarwal , University of Michigan
David Nellans , NVIDIA
Mike O'Connor , NVIDIA
Thomas F. Wenisch , University of Michigan
pp. 354-365

Understanding idle behavior and power gating mechanisms in the context of modern benchmarks on CPU-GPU Integrated systems (Abstract)

Manish Arora , Advanced Micro Devices, Inc.
Srilatha Manne , Advanced Micro Devices, Inc.
Indrani Paul , Advanced Micro Devices, Inc.
Nuwan Jayasena , Advanced Micro Devices, Inc.
Dean M. Tullsen , University of California, San Diego
pp. 366-377

Power punch: Towards non-blocking power-gating of NoC routers (Abstract)

Lizhong Chen , School of Electrical Engineering and Computer Science, Oregon State University, USA
Di Zhu , Department of Electrical Engineering, University of Southern California, USA
Massoud Pedram , Department of Electrical Engineering, University of Southern California, USA
Timothy M. Pinkston , Department of Electrical Engineering, University of Southern California, USA
pp. 378-389

Augmenting low-latency HPC network with free-space optical links (Abstract)

Ikki Fujiwara , National Institute of Informatics/JST, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, Japan 101-8430
Michihiro Koibuchi , National Institute of Informatics/JST, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, Japan 101-8430
Tomoya Ozaki , Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa, Japan 223-8522
Hiroki Matsutani , Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa, Japan 223-8522
Henri Casanova , University of Hawai'i at Manoa 1680 East-West Road, Honolulu, HI, U.S.A. 96822
pp. 390-401

SCOC: High-radix switches made of bufferless clos networks (Abstract)

Nikolaos Chrysos , IBM Research - Zurich, Switzerland
Cyriel Minkenberg , IBM Research - Zurich, Switzerland
Mark Rudquist , IBM Systems & Technology Group, Rochester, USA
Claude Basso , IBM Systems & Technology Group, Rochester, USA
Brian Vanderpool , IBM Systems & Technology Group, Rochester, USA
pp. 402-414

Overcoming far-end congestion in large-scale networks (Abstract)

Jongmin Won , KAIST
Gwangsun Kim , KAIST
John Kim , KAIST
Ted Jiang , NVIDIA
Mike Parker , Intel Corp.
Steve Scott , Cray Inc.
pp. 415-427

iPatch: Intelligent fault patching to improve energy efficiency (Abstract)

David J. Palframan , Department of Electrical and Computer Engineering, University of Wisconsin-Madison
Nam Sung Kim , Department of Electrical and Computer Engineering, University of Wisconsin-Madison
Mikko H. Lipasti , Department of Electrical and Computer Engineering, University of Wisconsin-Madison
pp. 428-438

Balancing reliability, cost, and performance tradeoffs with FreeFault (Abstract)

Dong Wan Kim , Electrical and Computer Engineering Department, The University of Texas at Austin
Mattan Erez , Electrical and Computer Engineering Department, The University of Texas at Austin
pp. 439-450

FTXen: Making hypervisor resilient to hardware faults on relaxed cores (Abstract)

Xinxin Jin , University of California San Diego
Soyeon Park , Whova Inc
Tianwei Sheng , Whova Inc
Rishan Chen , University of California San Diego
Zhiyong Shan , University of California San Diego
Yuanyuan Zhou , University of California San Diego
pp. 451-462

Correction prediction: Reducing error correction latency for on-chip memories (Abstract)

Henry Duwe , University of Illinois at Urbana-Champaign
Xun Jian , University of Illinois at Urbana-Champaign
Rakesh Kumar , University of Illinois at Urbana-Champaign
pp. 463-475

Overcoming the challenges of crossbar resistive memory architectures (Abstract)

Cong Xu , Pennsylvania State University
Dimin Niu , Pennsylvania State University
Tao Zhang , Pennsylvania State University
Shimeng Yu , Arizona State University
Yuan Xie , University of California Santa Barbara
pp. 476-488

Adaptive-latency DRAM: Optimizing DRAM timing for the common-case (Abstract)

Donghyuk Lee , Carnegie Mellon University
Yoongu Kim , Carnegie Mellon University
Gennady Pekhimenko , Carnegie Mellon University
Samira Khan , Carnegie Mellon University
Vivek Seshadri , Carnegie Mellon University
Kevin Chang , Carnegie Mellon University
Onur Mutlu , Carnegie Mellon University
pp. 489-501

CiDRA: A cache-inspired DRAM resilience architecture (Abstract)

Young Hoon Son , Seoul National University
Sukhan Lee , Seoul National University
O Seongil , Seoul National University
Sanghyuk Kwon , Seoul National University
Nam Sung Kim , University of Wisconsin-Madison
Jung Ho Ahn , Seoul National University
pp. 502-513

Tag tables (Abstract)

Sean Franey , University of Wisconsin - Madison
Mikko Lipasti , University of Wisconsin - Madison
pp. 514-525

Architecture exploration for ambient energy harvesting nonvolatile processors (Abstract)

Kaisheng Ma , Pennsylvania State University
Yang Zheng , Pennsylvania State University
Shuangchen Li , Pennsylvania State University
Karthik Swaminathan , Pennsylvania State University
Xueqing Li , Pennsylvania State University
Yongpan Liu , Tsinghua University
Jack Sampson , Pennsylvania State University
Yuan Xie , University of California, Santa Barbara
Vijaykrishnan Narayanan , Pennsylvania State University
pp. 526-537

Scaling distributed cache hierarchies through computation and data co-scheduling (Abstract)

Nathan Beckmann , Massachusetts Institute of Technology
Po-An Tsai , Massachusetts Institute of Technology
Daniel Sanchez , Massachusetts Institute of Technology
pp. 538-550

Data retention in MLC NAND flash memory: Characterization, optimization, and recovery (Abstract)

Yu Cai , Carnegie Mellon University
Yixin Luo , Carnegie Mellon University
Erich F. Haratsch , LSI Corporation
Ken Mai , Carnegie Mellon University
Onur Mutlu , Carnegie Mellon University
pp. 551-563

GPGPU performance and power estimation using machine learning (Abstract)

Gene Wu , Electrical and Computer Engineering The University of Texas at Austin
Joseph L. Greathouse , AMD Research Advanced Micro Devices, Inc.
Alexander Lyashevsky , AMD Research Advanced Micro Devices, Inc.
Nuwan Jayasena , AMD Research Advanced Micro Devices, Inc.
Derek Chiou , Electrical and Computer Engineering The University of Texas at Austin
pp. 564-576

Quantifying sources of error in McPAT and potential impacts on architectural studies (Abstract)

Sam Likun Xi , Harvard University, School of Engineering and Applied Sciences
Hans Jacobson , IBM Corporation, T. J. Watson Research Center
Pradip Bose , IBM Corporation, T. J. Watson Research Center
Gu-Yeon Wei , Harvard University, School of Engineering and Applied Sciences
David Brooks , Harvard University, School of Engineering and Applied Sciences
pp. 577-589

Studying the impact of multicore processor scaling on directory techniques via reuse distance analysis (Abstract)

Minshu Zhao , Department of Electrical and Computer Engineering, University of Maryland at College Park
Donald Yeung , Department of Electrical and Computer Engineering, University of Maryland at College Park
pp. 590-602

SNNAP: Approximate computing on programmable SoCs via neural acceleration (Abstract)

Thierry Moreau , University of Washington
Mark Wyse , University of Washington
Jacob Nelson , University of Washington
Adrian Sampson , University of Washington
Hadi Esmaeilzadeh , Georgia Institute of Technology
Luis Ceze , University of Washington
Mark Oskin , University of Washington
pp. 603-614

BRAINIAC: Bringing reliable accuracy into neurally-implemented approximate computing (Abstract)

Beayna Grigorian , Computer Science Department, University of California, Los Angeles (UCLA)
Nazanin Farahpour , Computer Science Department, University of California, Los Angeles (UCLA)
Glenn Reinman , Computer Science Department, University of California, Los Angeles (UCLA)
pp. 615-626

Scalable communication architecture for network-attached accelerators (Abstract)

Sarah Neuwirth , Institute of Computer Engineering, University of Heidelberg, Germany
Dirk Frey , Institute of Computer Engineering, University of Heidelberg, Germany
Mondrian Nuessle , XTOLL GmbH, Germany
Ulrich Bruening , Institute of Computer Engineering, University of Heidelberg, Germany
pp. 627-638

Understanding contention-based channels and using them for defense (Abstract)

Casen Hunger , The University of Texas at Austin
Mikhail Kazdagli , The University of Texas at Austin
Ankit Rawat , The University of Texas at Austin
Alex Dimakis , The University of Texas at Austin
Sriram Vishwanath , The University of Texas at Austin
Mohit Tiwari , The University of Texas at Austin
pp. 639-650

Malware-aware processors: A framework for efficient online malware detection (Abstract)

Meltem Ozsoy , State University of New York at Binghamton
Caleb Donovick , State University of New York at Binghamton
Iakov Gorelik , State University of New York at Binghamton
Nael Abu-Ghazaleh , University of California, Riverside
Dmitry Ponomarev , State University of New York at Binghamton
pp. 651-661

Run-time monitoring with adjustable overhead using dataflow-guided filtering (Abstract)

Daniel Lo , Cornell University Ithaca, NY 14850, USA
Tao Chen , Cornell University Ithaca, NY 14850, USA
Mohamed Ismail , Cornell University Ithaca, NY 14850, USA
G. Edward Suh , Cornell University Ithaca, NY 14850, USA
pp. 662-674

Author index (PDF)

pp. 1-6
93 ms
(Ver 3.3 (11022016))