The Community for Technology Leaders
2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2016)
Taipei, Taiwan
Oct. 15, 2016 to Oct. 19, 2016
ISBN: 978-1-5090-3509-0
TABLE OF CONTENTS

Cover page (PDF)

pp. 1

Program (PDF)

pp. 1-4

Sponsors (PDF)

pp. 1

Dictionary sharing: An efficient cache compression scheme for compressed caches (Abstract)

Biswabandan Panda , INRIA, Campus de Beaulieu, Rennes, France
Andre Seznec , INRIA, Campus de Beaulieu, Rennes, France
pp. 1-12

Perceptron learning for reuse prediction (Abstract)

Elvira Teran , Texas A&M University
Zhe Wang , Intel Labs
Daniel A. Jimenez , Texas A&M University
pp. 1-12

pTask: A smart prefetching scheme for OS intensive applications (Abstract)

Prathmesh Kallurkar , Department of Computer Science, Indian Institute of Technology, New Delhi, India
Smruti R. Sarangi , Department of Computer Science, Indian Institute of Technology, New Delhi, India
pp. 1-12

Register sharing for equality prediction (Abstract)

Arthur Perais , INRIA/IRISA
Fernando A. Endo , INRIA/IRISA
Andre Seznec , INRIA/IRISA
pp. 1-12

Data-centric execution of speculative parallel programs (Abstract)

Mark C. Jeffrey , Massachustmnetts Institute of Technology
Suvinay Subramanian , Massachustmnetts Institute of Technology
Maleen Abeydeera , Massachustmnetts Institute of Technology
Joel Emer , NVIDIA/MIT
Daniel Sanchez , Massachustmnetts Institute of Technology
pp. 1-13

SABRes: Atomic object reads for in-memory rack-scale computing (Abstract)

Alexandras Daglis , EcoCloud, EPFL
Dmitrii Ustiugov , EcoCloud, EPFL
Stanko Novakovic , EcoCloud, EPFL
Edouard Bugnion , EcoCloud, EPFL
Babak Falsafi , EcoCloud, EPFL
Boris Grot , University of Edinburgh
pp. 1-13

A cloud-scale acceleration architecture (Abstract)

Adrian M. Caulfield , Microsoft Corporation
Eric S. Chung , Microsoft Corporation
Andrew Putnam , Microsoft Corporation
Hari Angepat , Microsoft Corporation
Jeremy Fowers , Microsoft Corporation
Michael Haselman , Microsoft Corporation
Stephen Heil , Microsoft Corporation
Matt Humphrey , Microsoft Corporation
Puneet Kaur , Microsoft Corporation
Joo-Young Kim , Microsoft Corporation
Daniel Lo , Microsoft Corporation
Todd Massengill , Microsoft Corporation
Kalin Ovtcharov , Microsoft Corporation
Michael Papamichael , Microsoft Corporation
Lisa Woods , Microsoft Corporation
Sitaram Lanka , Microsoft Corporation
Derek Chiou , Microsoft Corporation
Doug Burger , Microsoft Corporation
pp. 1-13

Towards efficient server architecture for virtualized network function deployment: Implications and implementations (Abstract)

Yang Hu , IDEAL Lab, University of Florida Gainesville, FL, USA
Tao Li , IDEAL Lab, University of Florida Gainesville, FL, USA
pp. 1-12

Bridging the I/O performance gap for big data workloads: A new NVDIMM-based approach (Abstract)

Renhai Chent , Embedded Systems and CPS Laboratory, Department of Computing, The Hong Kong Polytechnic University
Zili Shao , Embedded Systems and CPS Laboratory, Department of Computing, The Hong Kong Polytechnic University
Tao Li , Department of Electrical and Computer Engineering, University of Florida
pp. 1-12

NeSC: Self-virtualizing nested storage controller (Abstract)

Yonatan Gottesman , Electrical Engineering, Technion - Israel Institute of Technology
Yoav Etsion , Electrical Engineering and Computer Science, Technion - Israel Institute of Technology
pp. 1-12

MIMD synchronization on SIMT architectures (Abstract)

Ahmed ElTantawy , University of British Columbia
Tor M. Aamodt , University of British Columbia
pp. 1-14

Efficient kernel synthesis for performance portable programming (Abstract)

Li-Wen Chang , University of Illinois at Urbana-Champaign
Izzat El Hajj , University of Illinois at Urbana-Champaign
Christopher Rodrigues , Huawei America Research Lab
Juan Gomez-Luna , Universidad de Cordoba
Wen-mei Hwu , University of Illinois at Urbana-Champaign
pp. 1-13

KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism (Abstract)

Izzat El Hajj , University of Illinois at Urbana-Champaign
Juan Gomez-Luna , Universidad de Cordoba
Cheng Li , University of Illinois at Urbana-Champaign
Li-Wen Chang , University of Illinois at Urbana-Champaign
Dejan Milojicic , Hewlett-Packard Labs
Wen-mei Hwu , University of Illinois at Urbana-Champaign
pp. 1-12

Cache-emulated register file: An integrated on-chip memory architecture for high performance GPGPUs (Abstract)

Naifeng Jing , Advanced Computer Architecture Laboratory, Shanghai Jiao Tong University, Shanghai, China, 200240
Jianfei Wang , Advanced Computer Architecture Laboratory, Shanghai Jiao Tong University, Shanghai, China, 200240
Fengfeng Fan , Advanced Computer Architecture Laboratory, Shanghai Jiao Tong University, Shanghai, China, 200240
Wenkang Yu , Advanced Computer Architecture Laboratory, Shanghai Jiao Tong University, Shanghai, China, 200240
Li Jiang , Advanced Computer Architecture Laboratory, Shanghai Jiao Tong University, Shanghai, China, 200240
Chao Li , Advanced Computer Architecture Laboratory, Shanghai Jiao Tong University, Shanghai, China, 200240
Xiaoyao Liang , Advanced Computer Architecture Laboratory, Shanghai Jiao Tong University, Shanghai, China, 200240
pp. 1-12

Zorua: A holistic approach to resource virtualization in GPUs (Abstract)

Nandita Vijaykumar , Carnegie Mellon University
Kevin Hsieh , Carnegie Mellon University
Gennady PekhimenW , Microsoft Research
Samira Khan , University of Virginia
Ashish Shrestha , Carnegie Mellon University
Saugata Ghose , Carnegie Mellon University
Adwait Jog , College of William and Mary
Phillip B. Gibbons , Carnegie Mellon University
Onur Mutlu , ETH Zürich
pp. 1-14

GRAPE: Minimizing energy for GPU applications with performance requirements (Abstract)

Muhammad Husni Santriaji , Surya University & University of Chicago
Henry Hoffmann , University of Chicago
pp. 1-13

From high-level deep neural models to FPGAs (Abstract)

Hardik Sharma , Alternative Computing Technologies (ACT) Lab, School of Computer Science, Georgia Institute of Technology
Jongse Park , Alternative Computing Technologies (ACT) Lab, School of Computer Science, Georgia Institute of Technology
Divya Mahajan , Alternative Computing Technologies (ACT) Lab, School of Computer Science, Georgia Institute of Technology
Emmanuel Amaro , Alternative Computing Technologies (ACT) Lab, School of Computer Science, Georgia Institute of Technology
Joon Kyung Kim , Alternative Computing Technologies (ACT) Lab, School of Computer Science, Georgia Institute of Technology
Chenkai Shao , Alternative Computing Technologies (ACT) Lab, School of Computer Science, Georgia Institute of Technology
Asit Mishra , Intel Corporation
Hadi Esmaeilzadeh , Intel Corporation
pp. 1-12

vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design (Abstract)

Minsoo Rhu , NVIDIA, Santa Clara, CA, 95050
Natalia Gimelshein , NVIDIA, Santa Clara, CA, 95050
Jason Clemons , NVIDIA, Santa Clara, CA, 95050
Arslan Zulfiqar , NVIDIA, Santa Clara, CA, 95050
Stephen W. Keckler , NVIDIA, Santa Clara, CA, 95050
pp. 1-13

Stripes: Bit-serial deep neural network computing (Abstract)

Patrick Judd , Department of Electrical and Computer Engineering, University of Toronto
Jorge Albericio , Department of Electrical and Computer Engineering, University of Toronto
Tayler Hetherington , Department of Electrical and Computer Engineering, University of British Columbia
Tor M. Aamodt , Department of Electrical and Computer Engineering, University of British Columbia
Andreas Moshovos , Department of Electrical and Computer Engineering, University of Toronto
pp. 1-12

Cambricon-X: An accelerator for sparse neural networks (Abstract)

Shijin Zhang , SKL of Computer Architecture, Institute of Computing Technology, CAS, Beijing, China
Zidong Du , SKL of Computer Architecture, Institute of Computing Technology, CAS, Beijing, China
Lei Zhang , SKL of Computer Architecture, Institute of Computing Technology, CAS, Beijing, China
Huiying Lan , SKL of Computer Architecture, Institute of Computing Technology, CAS, Beijing, China
Shaoli Liu , SKL of Computer Architecture, Institute of Computing Technology, CAS, Beijing, China
Ling Li , Institute of Automation, CAS, Beijing, China
Qi Guo , SKL of Computer Architecture, Institute of Computing Technology, CAS, Beijing, China
Tianshi Chen , SKL of Computer Architecture, Institute of Computing Technology, CAS, Beijing, China
Yunji Chen , SKL of Computer Architecture, Institute of Computing Technology, CAS, Beijing, China
pp. 1-12

NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints (Abstract)

Yu Ji , Department of Computer Science and Technology, Tsinghua University, P.R. China
YouHui Zhang , Department of Computer Science and Technology, Tsinghua University, P.R. China
ShuangChen Li , Department of Electrical and Computer Engineering, University of California at Santa Barbara, USA
Ping Chi , Department of Electrical and Computer Engineering, University of California at Santa Barbara, USA
CiHang Jiang , Department of Computer Science and Technology, Tsinghua University, P.R. China
Peng Qu , Department of Computer Science and Technology, Tsinghua University, P.R. China
Yuan Xie , Department of Electrical and Computer Engineering, University of California at Santa Barbara, USA
WenGuang Chen , Department of Computer Science and Technology, Tsinghua University, P.R. China
pp. 1-13

Fused-layer CNN accelerators (Abstract)

Manoj Alwani , Stony Brook University
Han Chen , Stony Brook University
Michael Ferdman , Stony Brook University
Peter Milder , Stony Brook University
pp. 1-12

Continuous shape shifting: Enabling loop co-optimization via near-free dynamic code rewriting (Abstract)

Animesh Jain , University of Michigan, Ann Arbor
Michael A. Laurenzano , University of Michigan, Ann Arbor
Lingjia Tang , University of Michigan, Ann Arbor
Jason Mars , University of Michigan, Ann Arbor
pp. 1-12

CrystalBall: Statically analyzing runtime behavior via deep sequence learning (Abstract)

Stephen Zekany , University of Michigan - Ann Arbor, MI
Daniel Rings , University of Michigan - Ann Arbor, MI
Nathan Harada , University of Michigan - Ann Arbor, MI
Michael A. Laurenzano , University of Michigan - Ann Arbor, MI
Lingjia Tang , University of Michigan - Ann Arbor, MI
Jason Mars , University of Michigan - Ann Arbor, MI
pp. 1-12

Low-cost soft error resilience with unified data verification and fine-grained recovery for acoustic sensor based detection (Abstract)

Qingrui Liu , Virginia Tech, Blacksburg, Virginia, USA
Changhee Jung , Virginia Tech, Blacksburg, Virginia, USA
Dongyoon Lee , Virginia Tech, Blacksburg, Virginia, USA
Devesh Tiwarit , Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
pp. 1-12

Lazy release consistency for GPUs (Abstract)

Johnathan Alsop , University of Illinois at Urbana-Champaign
Marc S. Orr , University of Wisconsin-Madison
Bradford M. Beckmann , AMD Research
David A. Wood , University of Wisconsin-Madison
pp. 1-14

Improving energy efficiency of DRAM by exploiting half page row access (Abstract)

Heonjae Ha , Stanford University
Ardavan Pedram , Stanford University
Stephen Richardson , Stanford University
Shahar Kvatinsky , Technion - Israel Institute of Technology
Mark Horowitz , Stanford University
pp. 1-12

OSCAR: Orchestrating STT-RAM cache traffic for heterogeneous CPU-GPU architectures (Abstract)

Jia Zhan , University of California, Santa Barbara
Onur Kayiran , Advanced Micro Devices, Inc.
Gabriel H. Loh , Advanced Micro Devices, Inc.
Chita R. Das , The Pennsylvania State University
Yuan Xie , University of California, Santa Barbara
pp. 1-13

A unified memory network architecture for in-memory computing in commodity servers (Abstract)

Jia Zhan , University of California, Santa Barbara
Itir Akgun , University of California, Santa Barbara
Jishen Zhao , University of California, Santa Cruz
Al Davis , HP Labs
Yuangang Wang , Huawei
Yuan Xie , University of California, Santa Barbara
pp. 1-14

Dynamic error mitigation in NoCs using intelligent prediction techniques (Abstract)

Dominic DiTomaso , Electrical Engineering and Computer Science, Ohio University, Athens, OH 45701
Travis Boraten , Electrical Engineering and Computer Science, Ohio University, Athens, OH 45701
Avinash Kodi , Electrical Engineering and Computer Science, Ohio University, Athens, OH 45701
Ahmed Louri , Electrical and Computer Engineering, George Washington University, Washington, DC 20052
pp. 1-12

Reducing data movement energy via online data clustering and encoding (Abstract)

Shibo Wang , Department of Computer Science, University of Rochester, Rochester, NY 14627 USA
Engin Ipek , Department of Computer Science, University of Rochester, Rochester, NY 14627 USA
pp. 1-13

Racer: TSO consistency via race detection (Abstract)

Alberto Ros , Department of Computer Engineering, Universidad de Murcia, Spain
Stefanos Kaxiras , Department of Information Technology, Uppsala Universitet, Sweden
pp. 1-13

Exploiting semantic commutativity in hardware speculation (Abstract)

Guowei Zhang , Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology
Virginia Chiu , Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology
Daniel Sanchez , Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology
pp. 1-12

CANDY: Enabling coherent DRAM caches for multi-node systems (Abstract)

Chiachen Chou , School of Electrical and Computer Engineering, Georgia Institute of Technology
Aamer Jaleel , NVIDIA
Moinuddin K. Qureshi , School of Electrical and Computer Engineering, Georgia Institute of Technology
pp. 1-13

C3D: Mitigating the NUMA bottleneck via coherent DRAM caches (Abstract)

Cheng-Chieh Huang , Institute of Computing Systems Architecture, University of Edinburgh
Rakesh Kumar , Institute of Computing Systems Architecture, University of Edinburgh
Marco Elver , Institute of Computing Systems Architecture, University of Edinburgh
Boris Grot , Institute of Computing Systems Architecture, University of Edinburgh
Vijay Nagarajan , Institute of Computing Systems Architecture, University of Edinburgh
pp. 1-12

PoisonIvy: Safe speculation for secure memory (Abstract)

Tamara Silbergleit Lehman , Electrical and Computer Engineering, Duke University
Andrew D. Hilton , Electrical and Computer Engineering, Duke University
Benjamin C. Lee , Electrical and Computer Engineering, Duke University
pp. 1-13

ReplayConfusion: Detecting cache-based covert channel attacks using record and replay (Abstract)

Mengjia Yan , University of Illinois, Urbana-Champaign
Yasser Shalabi , University of Illinois, Urbana-Champaign
Josep Torrellas , University of Illinois, Urbana-Champaign
pp. 1-14

Jump over ASLR: Attacking branch predictors to bypass ASLR (Abstract)

Dmitry Evtyushkin , Department of Computer Science, State University of New York at Binghamton
Dmitry Ponomarev , Department of Computer Science, State University of New York at Binghamton
Nael Abu-Ghazaleh , Computer Science and Engineering Department, University of California, Riverside
pp. 1-13

Concise loads and stores: The case for an asymmetric compute-memory architecture for approximation (Abstract)

Animesh Jain , University of Michigan
Parker Hill , University of Michigan
Shih-Chieh Lin , University of Michigan
Muneeb Khan , Uppsala University
Md E. Haque , University of Michigan
Michael A. Laurenzano , University of Michigan
Scott Mahlke , University of Michigan
Lingjia Tang , University of Michigan
Jason Mars , University of Michigan
pp. 1-13

Approxilyzer: Towards a systematic framework for instruction-level approximate computing and its application to hardware resiliency (Abstract)

Radha Venkatagiri , University of Illinois at Urbana-Champaign
Abdulrahman Mahmoud , University of Illinois at Urbana-Champaign
Sarita V. Adve , University of Illinois at Urbana-Champaign
pp. 1-14

The Bunker Cache for spatio-value approximation (Abstract)

Joshua San Miguel , University of Toronto
Jorge Albericio , University of Toronto
Natalie Enright Jerger , University of Toronto
Aamer Jaleel , NVIDIA
pp. 1-12

HARE: Hardware accelerator for regular expressions (Abstract)

Vaibhav Gogte , University of Michigan
Aasheesh Kolli , University of Michigan
Michael J. Cafarella , University of Michigan
Loris D'Antoni , University of Wisconsin-Madison
Thomas F. Wenisch , University of Michigan
pp. 1-12

The microarchitecture of a real-time robot motion planning accelerator (Abstract)

Sean Murray , Departments of Computer Science and Electrical & Computer Engineering, Duke University
William Floyd-Jones , Departments of Computer Science and Electrical & Computer Engineering, Duke University
Ying Qi , Departments of Computer Science and Electrical & Computer Engineering, Duke University
George Konidaris , Departments of Computer Science and Electrical & Computer Engineering, Duke University
Daniel J. Sorin , Departments of Computer Science and Electrical & Computer Engineering, Duke University
pp. 1-12

Efficient data supply for hardware accelerators with prefetching and access/execute decoupling (Abstract)

Tao Chen , Cornell University Ithaca, NY 14850, USA
G. Edward Suh , Cornell University Ithaca, NY 14850, USA
pp. 1-12

An ultra low-power hardware accelerator for automatic speech recognition (Abstract)

Reza Yazdani , Computer Architecture Department, Universitat Politecnica de Catalunya
Albert Segura , Computer Architecture Department, Universitat Politecnica de Catalunya
Jose-Maria Arnau , Computer Architecture Department, Universitat Politecnica de Catalunya
Antonio Gonzalez , Computer Architecture Department, Universitat Politecnica de Catalunya
pp. 1-12

Co-designing accelerators and SoC interfaces using gem5-Aladdin (Abstract)

Yakun Sophia Shao , NVIDIA Research
Sam Likun Xi , Harvard University
Gu-Yeon Wei , Harvard University
David Brooks , Harvard University
pp. 1-12

Chainsaw: Von-neumann accelerators to leverage fused instruction chains (Abstract)

Amirali Sharifian , School of Computing Science, Simon Fraser University
Snehasish Kumar , School of Computing Science, Simon Fraser University
Apala Guha , School of Computing Science, Simon Fraser University
Arrvindh Shriraman , School of Computing Science, Simon Fraser University
pp. 1-14

Chameleon: Versatile and practical near-DRAM acceleration architecture for large memory systems (Abstract)

Hadi Asghari-Moghaddam , University of Illinois at Urbana-Champaign
Young Hoon Son , Seoul National University
Jung Ho Ahn , Seoul National University
Nam Sung Kim , University of Illinois at Urbana-Champaign
pp. 1-13

A patch memory system for image processing and computer vision (Abstract)

Jason Clemons , NVIDIA, Santa Clara, CA
Chih-Chi Cheng , Qualcomm, Santa Clara, CA
Iuri Frosio , NVIDIA, Santa Clara, CA
Daniel Johnson , NVIDIA, Santa Clara, CA
Stephen W. Keckler , NVIDIA, Santa Clara, CA
pp. 1-13

Evaluating programmable architectures for imaging and vision applications (Abstract)

Artem Vasilyev , Stanford University
Nikhil Bhagdikar , Stanford University
Ardavan Pedram , Stanford University
Stephen Richardson , Stanford University
Shahar Kvatinsky , Technion
Mark Horowitz , Stanford University
pp. 1-13

Redefining QoS and customizing the power management policy to satisfy individual mobile users (Abstract)

Kaige Yan , Department of Electrical and Computer Engineering, University of Houston, Houston, Texas, 77004
Xingyao Zhang , Department of Electrical and Computer Engineering, University of Houston, Houston, Texas, 77004
Jingweijia Tan , Department of Electrical and Computer Engineering, University of Houston, Houston, Texas, 77004
Xin Fu , Department of Electrical and Computer Engineering, University of Houston, Houston, Texas, 77004
pp. 1-12

Snatch: Opportunistically reassigning power allocation between processor and memory in 3D stacks (Abstract)

Dimitrios Skarlatos , University of Illinois, Urbana-Champaign
Renji Thomas , Ohio State University
Aditya Agrawal , NVIDIA Corp.
Shibin Qin , University of Illinois, Urbana-Champaign
Robert Pilawa-Podgurski , University of Illinois, Urbana-Champaign
Ulya R. Karpuzcu , University of Minnesota, Twin Cities
Radu Teodorescu , Ohio State University
Nam Sung Kim , University of Illinois, Urbana-Champaign
Josep Torrellas , University of Illinois, Urbana-Champaign
pp. 1-12

Ti-states: Processor power management in the temperature inversion region (Abstract)

Yazhou Zu , The University of Texas at Austin
Wei Huang , Advanced Micro Devices, Inc.
Indrani Paul , Advanced Micro Devices, Inc.
Vijay Janapa Reddi , The University of Texas at Austin
pp. 1-13

Graphicionado: A high-performance and energy-efficient accelerator for graph analytics (Abstract)

Tae Jun Ham , Princeton University
Lisa Wu , University of California, Berkeley
Narayanan Sundaram , Parallel Computing Lab, Intel Corporation
Nadathur Satish , Parallel Computing Lab, Intel Corporation
Margaret Martonosi , Princeton University
pp. 1-13

Improving bank-level parallelism for irregular applications (Abstract)

Xulong Tang , Pennsylvania State University, University Park, PA, USA
Mahmut Kandemir , Pennsylvania State University, University Park, PA, USA
Praveen Yedlapalli , VMware, Inc., Palo Alto, CA, USA
Jagadish Kotra , Pennsylvania State University, University Park, PA, USA
pp. 1-12

Delegated persist ordering (Abstract)

Aasheesh Kolli , University of Michigan
Jeff Rosen , Snowflake Computing
Ali Saidi , ARM
Steven Pelley , Snowflake Computing
Sihang Liu , University of Michigan
Peter M. Chen , University of Michigan
Thomas F. Wenisch , University of Michigan
pp. 1-13

Spectral profiling: Observer-effect-free profiling by monitoring EM emanations (Abstract)

Nader Sehatbakhsh , Georgia Institute of Technology, Atlanta, Georgia
Alireza Nazari , Georgia Institute of Technology, Atlanta, Georgia
Alenka Zajic , Georgia Institute of Technology, Atlanta, Georgia
Milos Prvulovic , Georgia Institute of Technology, Atlanta, Georgia
pp. 1-11

Path confidence based lookahead prefetching (Abstract)

Jinchun Kim , Texas A&M University
Seth H. Pugsley , Intel Labs
Paul V. Gratz , Texas A&M University
A. L. Narasimha Reddy , Texas A&M University
Chris Wilkerson , Intel Labs
Zeshan Chishti , Intel Labs
pp. 1-12

Continuous runahead: Transparent hardware acceleration for memory intensive workloads (Abstract)

Milad Hashemi , The University of Texas at Austin
Onur Mutlu , ETH Zürich
Yale N. Patt , The University of Texas at Austin
pp. 1-12

Author index (PDF)

pp. 1-29
81 ms
(Ver 3.3 (11022016))