The Community for Technology Leaders
2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2015)
Waikiki, HI, USA
Dec. 5, 2015 to Dec. 9, 2015
ISSN: 2379-3155
ISBN: 978-1-5090-6601-8
TABLE OF CONTENTS

Front matters (Abstract)

pp. 1-xviii

Large pages and lightweight memory management in virtualized environments: Can you have it both ways? (Abstract)

Binh Pham , Department of Computer Science, Rutgers University
Jan Vesely , Department of Computer Science, Rutgers University
Gabriel H. Loh , AMD Research, Advanced Micro Devices, Inc.
Abhishek Bhattacharjee , Department of Computer Science, Rutgers University
pp. 1-12

Exploiting commutativity to reduce the cost of updates to shared data in cache-coherent systems (Abstract)

Guowei Zhang , Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology
Webb Horn , Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology
Daniel Sanchez , Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology
pp. 13-25

CCICheck: Using μhb graphs to verify the coherence-consistency interface (Abstract)

Yatin A. Manerkar , Princeton University
Daniel Lustig , Princeton University
Margaret Martonosi , Princeton University
pp. 26-37

HyComp: A hybrid cache compression method for selection of data-type-specific compression methods (Abstract)

Angelos Arelakis , Chalmers University of Technology, Göteborg, Sweden
Fredrik Dahlgren , Chalmers University of Technology, Göteborg, Sweden
Per Stenstrom , Chalmers University of Technology, Göteborg, Sweden
pp. 38-49

Doppelgänger: A cache for approximate computing (Abstract)

Joshua San Miguel , University of Toronto
Jorge Albericio , University of Toronto
Andreas Moshovos , University of Toronto
Natalie Enright Jerger , University of Toronto
pp. 50-61

The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory (Abstract)

Lavanya Subramanian , Carnegie Mellon University
Vivek Seshadri , Carnegie Mellon University
Arnab Ghosh , Carnegie Mellon University
Samira Khan , Carnegie Mellon University
Onur Mutlu , Carnegie Mellon University
pp. 62-75

MORC: A manycore-oriented compressed cache (Abstract)

Tri M. Nguyen , Princeton University
David Wentzlaff , Princeton University
pp. 76-88

Avoiding information leakage in the memory controller with fixed service policies (Abstract)

Ali Shafiee , University of Utah, Salt Lake City, UT, USA
Akhila Gundu , University of Utah, Salt Lake City, UT, USA
Manjunath Shevgoor , University of Utah, Salt Lake City, UT, USA
Rajeev Balasubramonian , University of Utah, Salt Lake City, UT, USA
Mohit Tiwari , University of Texas, Austin, TX, USA
pp. 89-101

Fork Path: Improving efficiency of ORAM by removing redundant memory accesses (Abstract)

Xian Zhang , Center for Energy-Efficient Computing and Applications, Peking University, Beijing 100871, China
Guangyu Sun , Center for Energy-Efficient Computing and Applications, Peking University, Beijing 100871, China
Chao Zhang , Center for Energy-Efficient Computing and Applications, Peking University, Beijing 100871, China
Weiqi Zhang , Center for Energy-Efficient Computing and Applications, Peking University, Beijing 100871, China
Yun Liang , Center for Energy-Efficient Computing and Applications, Peking University, Beijing 100871, China
Tao Wang , Center for Energy-Efficient Computing and Applications, Peking University, Beijing 100871, China
Yiran Chen , Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh PA 15261, USA
Jia Di , Computer Science and Computer Engineering Department, University of Arkansas, Fayetteville AR 72701, USA
pp. 102-114

Locking down insecure indirection with hardware-based control-data isolation (Abstract)

William Arthur , University of Michigan, Ann Arbor, Michigan
Sahil Madeka , University of Michigan, Ann Arbor, Michigan
Reetuparna Das , University of Michigan, Ann Arbor, Michigan
Todd Austin , University of Michigan, Ann Arbor, Michigan
pp. 115-127

Authenticache: Harnessing cache ECC for system authentication (Abstract)

Anys Bacha , Computer Science and Engineering, The Ohio State University
Radu Teodorescu , Computer Science and Engineering, The Ohio State University
pp. 128-140

Efficiently prefetching complex address patterns (Abstract)

Manjunath Shevgoor , University of Utah, Salt Lake City, UT, USA
Sahil Koladiya , University of Utah, Salt Lake City, UT, USA
Rajeev Balasubramonian , University of Utah, Salt Lake City, UT, USA
Chris Wilkerson , Intel Labs, Hillsboro, OR, USA
Seth H Pugsley , Intel Labs, Hillsboro, OR, USA
Zeshan Chishti , Intel Labs, Hillsboro, OR, USA
pp. 141-152

Self-contained, accurate precomputation prefetching (Abstract)

Islam Atta , University of Toronto, Toronto, On, Canada
Xin Tong , University of Toronto, Toronto, On, Canada
Vijayalakshmi Srinivasan , IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
Loana Balding , IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
Andreas Moshovos , University of Toronto, Toronto, On, Canada
pp. 153-165

Confluence: Unified instruction supply for scale-out servers (Abstract)

Cansu Kaynak , EcoCloud, EPFL
Boris Grot , University of Edinburgh
Babak Falsafi , EcoCloud, EPFL
pp. 166-177

IMP: Indirect memory prefetcher (Abstract)

Xiangyao Yu , Massachusetts Institute of Technology
Christopher J. Hughes , Parallel Computing Lab, Intel Labs
Nadathur Satish , Parallel Computing Lab, Intel Labs
Srinivas Devadas , Massachusetts Institute of Technology
pp. 178-190

DeSC: Decoupled supply-compute communication management for heterogeneous architectures (Abstract)

Tae Jun Ham , Princeton University
Juan L. Aragon , University of Murcia
Margaret Martonosi , Princeton University
pp. 191-203

Efficient warp execution in presence of divergence with collaborative context collection (Abstract)

Farzad Khorasani , Computer Science and Engineering, Department University of California, Riverside, CA, USA
Rajiv Gupta , Computer Science and Engineering, Department University of California, Riverside, CA, USA
Laxmi N. Bhuyan , Computer Science and Engineering, Department University of California, Riverside, CA, USA
pp. 204-215

Control flow coalescing on a hybrid dataflow/von Neumann GPGPU (Abstract)

Dani Voitsechov , Electrical Engineering, Technion - Israel Institute of Technology
Yoav Etsion , Electrical Engineering and Computer Science, Technion - Israel Institute of Technology
pp. 216-227

A scalable architecture for ordered parallelism (Abstract)

Mark C. Jeffrey , MIT CSAIL
Cong Yan , MIT CSAIL
Joel Emer , NVIDIA / MIT CSAIL
Daniel Sanchez , MIT CSAIL
pp. 228-241

More is less: Improving the energy efficiency of data movement via opportunistic use of sparse codes (Abstract)

Yanwei Song , Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY 14627 USA
Engin Ipek , Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY 14627 USA
pp. 242-254

Improving DRAM latency with dynamic asymmetric subarray (Abstract)

Shih-Lien Lu , Intel Corp. Hillsboro, Oregon 97124 USA
Ying-Chen Lin , National Taiwan University, Taipei, Taiwan ROC
Chia-Lin Yang , National Taiwan University, Taipei, Taiwan ROC
pp. 255-266

Gather-Scatter DRAM: In-DRAM address translation to improve the spatial locality of non-unit strided accesses (Abstract)

Vivek Seshadri , Carnegie Mellon University
Thomas Mullins , Carnegie Mellon University
Amirali Boroumand , Carnegie Mellon University
Onur Mutlu , Carnegie Mellon University
Phillip B Gibbons , Carnegie Mellon University
Michael A. Kozuch , Intel Labs
Todd C Mowry , Carnegie Mellon University
pp. 267-280

The CRISP performance model for dynamic voltage and frequency scaling in a GPGPU (Abstract)

Rajib Nath , University of California, San Diego, 9500 Gilman Drive, La Jolla, CA
Dean Tullsen , University of California, San Diego, 9500 Gilman Drive, La Jolla, CA
pp. 281-293

Safe limits on voltage reduction efficiency in GPUs: A direct measurement approach (Abstract)

Jingwen Leng , The University of Texas at Austin
Alper Buyuktosunoglu , IBM T.J. Watson Research Center
Ramon Bertran , IBM T.J. Watson Research Center
Pradip Bose , IBM T.J. Watson Research Center
Vijay Janapa Reddi , The University of Texas at Austin
pp. 294-307

Adaptive guardband scheduling to improve system-level efficiency of the POWER7+ (Abstract)

Yazhou Zu , The University of Texas at Austin
Jingwen Leng , The University of Texas at Austin
Matthew Halpern , The University of Texas at Austin
Vijay Janapa Reddi , The University of Texas at Austin
pp. 308-321

DynaMOS: Dynamic schedule migration for heterogeneous cores (Abstract)

Shruti Padmanabha , Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor, MI
Andrew Lukefahr , Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor, MI
Reetuparna Das , Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor, MI
Scott Mahlke , Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor, MI
pp. 322-333

Long term parking (LTP): Criticality-aware resource allocation in OOO processors (Abstract)

Andreas Sembrant , Uppsala University
Trevor Carlson , Uppsala University
Erik Hagersten , Uppsala University
David Black-Shaffer , Uppsala University
Arthur Perais , IRISA/INRIA
Andre Seznec , IRISA/INRIA
Pierre Michaud , IRISA/INRIA
pp. 334-346

The inner most loop iteration counter: A new dimension in branch history (Abstract)

Andre Seznec , INRIA
Joshua San Miguel , University of Toronto
Jorge Albericio , University of Toronto
pp. 347-357

Filtered runahead execution with a runahead buffer (Abstract)

Milad Hashemi , The University of Texas at Austin
Yale N. Patt , The University of Texas at Austin
pp. 358-369

Bungee jumps: Accelerating indirect branches through HW/SW co-design (Abstract)

Daniel S. McFarlin , Carnegie Mellon University
Craig Zilles , University of Illinois at Urbana-Champaign
pp. 370-382

SAWS: Synchronization aware GPGPU warp scheduling for multiple independent warp schedulers (Abstract)

Jiwei Liu , University of Pittsburgh, Pittsburgh, PA 15261
Jun Yang , University of Pittsburgh, Pittsburgh, PA 15261
Rami Melhem , University of Pittsburgh, Pittsburgh, PA 15261
pp. 383-394

Enabling coordinated register allocation and thread-level parallelism optimization for GPUs (Abstract)

Xiaolong Xie , Center for Energy-efficient Computing and Applications, School of EECS, Peking University, China
Yun Liang , Center for Energy-efficient Computing and Applications, School of EECS, Peking University, China
Xiuhong Li , Center for Energy-efficient Computing and Applications, School of EECS, Peking University, China
Yudong Wu , Center for Energy-efficient Computing and Applications, School of EECS, Peking University, China
Guangyu Sun , Center for Energy-efficient Computing and Applications, School of EECS, Peking University, China
Tao Wang , Center for Energy-efficient Computing and Applications, School of EECS, Peking University, China
Dongrui Fan , Institute of Computing Technology, Chinese Academy of Sciences
pp. 395-406

Free launch: Optimizing GPU dynamic kernel launches through thread reuse (Abstract)

Guoyang Chen , Computer Science Department, North Carolina State University, 890 Oval Drive, Raleigh, NC, USA, 27695
Xipeng Shen , Computer Science Department, North Carolina State University, 890 Oval Drive, Raleigh, NC, USA, 27695
pp. 407-419

GPU register file visualization (Abstract)

Hyeran Jeon , San Jose State University
Gokul Subramanian Ravi , University of Wisconsin-Madison
Nam Sung Kim , University of Illinois, Urbana-Champaign
Murali Annavaram , University of Southern California
pp. 420-432

WarpPool: Sharing requests with inter-warp coalescing for throughput processors (Abstract)

John Kloosterman , Computer Engineering Lab, University of Michigan, Ann Arbor, MI
Jonathan Beaumont , Computer Engineering Lab, University of Michigan, Ann Arbor, MI
Mick Wollman , Computer Engineering Lab, University of Michigan, Ann Arbor, MI
Ankit Sethia , Computer Engineering Lab, University of Michigan, Ann Arbor, MI
Ron Dreslinski , Computer Engineering Lab, University of Michigan, Ann Arbor, MI
Trevor Mudge , Computer Engineering Lab, University of Michigan, Ann Arbor, MI
Scott Mahlke , Computer Engineering Lab, University of Michigan, Ann Arbor, MI
pp. 433-444

Ultra-low power render-based collision detection for CPU/GPU systems (Abstract)

Enrique de Lucas , Computer Architecture Department, Universitat Politecnica de Catalunya, Barcelona, Spain
Pedro Marcuello , Broadcom Corporation, Barcelona, Spain
Joan-Manuel Parcerisa , Computer Architecture Department, Universitat Politecnica de Catalunya, Barcelona, Spain
Antonio Gonzalez , Computer Architecture Department, Universitat Politecnica de Catalunya, Barcelona, Spain
pp. 445-456

Execution time prediction for energy-efficient hardware accelerators (Abstract)

Tao Chen , Cornell University, Ithaca, NY 14850, USA
Alexander Rucker , Cornell University, Ithaca, NY 14850, USA
G. Edward Suh , Cornell University, Ithaca, NY 14850, USA
pp. 457-469

Border control: Sandboxing accelerators (Abstract)

Lena E. Olson , University of Wisconsin-Madison, Computer Sciences Department
Jason Power , University of Wisconsin-Madison, Computer Sciences Department
Mark D. Hill , University of Wisconsin-Madison, Computer Sciences Department
David A. Wood , University of Wisconsin-Madison, Computer Sciences Department
pp. 470-481

Neural acceleration for GPU throughput processors (Abstract)

Amir Yazdanbakhsh , Alternative Computing Technologies (ACT) Lab, School of Computer Science, Georgia Institute of Technology
Jongse Park , Alternative Computing Technologies (ACT) Lab, School of Computer Science, Georgia Institute of Technology
Hardik Sharma , Alternative Computing Technologies (ACT) Lab, School of Computer Science, Georgia Institute of Technology
Pejman Lotfi-Kamran , School of Computer Science, Institute for Research in Fundamental Sciences (IPM)
Hadi Esmaeilzadeh , Alternative Computing Technologies (ACT) Lab, School of Computer Science, Georgia Institute of Technology
pp. 482-493

Neuromorphic accelerators: A comparison between neuroscience and machine-learning approaches (Abstract)

Zidong Du , State Key Laboratory of Computer Architecture, Institute of Computing Technology (ICT), CAS, China
Daniel D Ben-Dayan Rubin , Intel, Israel
Yunji Chen , State Key Laboratory of Computer Architecture, Institute of Computing Technology (ICT), CAS, China
Liqiang Hel , College of Computer Science, Inner Mongolia Univ., China
Tianshi Chen , State Key Laboratory of Computer Architecture, Institute of Computing Technology (ICT), CAS, China
Lei Zhang , State Key Laboratory of Computer Architecture, Institute of Computing Technology (ICT), CAS, China
Chengyong Wu , State Key Laboratory of Computer Architecture, Institute of Computing Technology (ICT), CAS, China
Olivier Temam , Inria, France
pp. 494-507

Prediction-guided performance-energy trade-off for interactive applications (Abstract)

Daniel Lo , Cornell University, Ithaca, NY, USA
Taejoon Song , Cornell University, Ithaca, NY, USA
G. Edward Suh , Cornell University, Ithaca, NY, USA
pp. 508-520

Architecture-aware automatic computation offload for native applications (Abstract)

Gwangmu Lee , POSTECH, Pohang, Korea
Hyunjoon Park , POSTECH, Pohang, Korea
Seonyeong Heo , POSTECH, Pohang, Korea
Kyung-Ah Chang , Samsung Electronics, Suwon, Korea
Hyogun Lee , Samsung Electronics, Suwon, Korea
Hanjun Kim , POSTECH, Pohang, Korea
pp. 521-532

Fast support for unstructured data processing: The unified automata processor (Abstract)

Yuanwei Fang , Dept of Computer Science, Univ of Chicago
Tung T. Hoang , Dept of Computer Science, Univ of Chicago
Michela Becchi , Dept of Electrical & Computer Eng., Univ of Missouri
Andrew A. Chien , Computer Science, Univ of Chicago, MCS, Argonne Natl Lab
pp. 533-545

Enabling interposer-based disintegration of multi-core processors (Abstract)

Ajaykumar Kannan , Edward S. Rogers Dept. of Electrical and Computer Engineering, University of Toronto
Natalie Enright Jerger , Edward S. Rogers Dept. of Electrical and Computer Engineering, University of Toronto
Gabriel H. Loh , AMD Research, Advanced Micro Devices, Inc.
pp. 546-558

DCS: A fast and scalable device-centric server architecture (Abstract)

Jaehyung Ahn , Department of Computer Science and Engineering, POSTECH
Dongup Kwon , Department of Computer Science and Engineering, POSTECH
Youngsok Kim , Department of Computer Science and Engineering, POSTECH
Mohammadamin Ajdari , Department of Computer Science and Engineering, POSTECH
Jaewon Lee , Department of Computer Science and Engineering, POSTECH
Jangwoo Kim , Department of Computer Science and Engineering, POSTECH
pp. 559-571

Modeling the implications of DRAM failures and protection techniques on datacenter TCO (Abstract)

Panagiota Nikolaou , University of Cyprus
Yiannakis Sazeides , University of Cyprus
Lorena Ndreu , University of Cyprus
Marios Kleanthous , MAP S. Platis
pp. 572-584

TimeTrader: Exploiting latency tail to save datacenter energy for online search (Abstract)

Balajee Vamanan , Purdue University
Hamza Bin Sohail , Purdue University
Jahangir Hasan , Google Inc.
T. N. Vijaykumar , Purdue University
pp. 585-597

Rubik: Fast analytical power management for latency-critical systems (Abstract)

Harshad Kasture , Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology
Davide B. Bartolini , Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology
Nathan Beckmann , Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology
Daniel Sanchez , Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology
pp. 598-610

CLEAN-ECC: High reliability ECC for adaptive granularity memory system (Abstract)

Seong-Lyong Gong , ECE, UT Austin
Minsoo Rhu , NVIDIA
Jungrae Kim , ECE, UT Austin
Jinsuk Chung , ECE, UT Austin
Mattan Erez , ECE, UT Austin
pp. 611-622

vCache: Architectural support for transparent and isolated virtual LLCs in virtualized environments (Abstract)

Daehoon Kim , University of Illinois, Urbana-Champaign
Hwanju Kim , University of Cambridge
Nam Sung Kim , University of Illinois, Urbana-Champaign
Jaehyuk Huh , KAIST
pp. 623-634

An integrated concurrency and core-ISA architectural envelope definition, and test oracle, for IBM POWER multiprocessors (Abstract)

Kathryn E. Gray , University of Cambridge
Gabriel Kernels , University of Cambridge
Dominic Mulligan , University of Cambridge
Christopher Pulte , University of Cambridge
Susmit Sarkar , University of St Andrews
Peter Sewell , University of Cambridge
pp. 635-646

Efficient GPU synchronization without scopes: Saying no to complex consistency models (Abstract)

Matthew D. Sinclair , University of Illinois at Urbana-Champaign
Johnathan Alsop , University of Illinois at Urbana-Champaign
Sarita V. Adve , École Polytechnique Fédérale de Lausanne
pp. 647-659

Efficient persist barriers for multicores (Abstract)

Arpit Joshi , University of Edinburgh
Vijay Nagarajan , University of Edinburgh
Marcelo Cintra , Intel, Germany
Stratis Viglas , University of Edinburgh
pp. 660-671

ThyNVM: Enabling software-transparent crash consistency in persistent memory systems (Abstract)

Jinglei Ren , Tsinghua University
Jishen Zhao , University of California, Santa Cruz
Samira Khan , Carnegie Mellon University
Jongmoo Choi , Dankook University
Yongwei Wu , Tsinghua University
Onur Mutiu , Carnegie Mellon University
pp. 672-685

Coherence domain restriction on large scale systems (Abstract)

Yaosheng Fu , Princeton University, Princeton, NJ
Tri M. Nguyen , Princeton University, Princeton, NJ
David Wentzlaff , Princeton University, Princeton, NJ
pp. 686-698

Efficiently enforcing strong memory ordering in GPUs (Abstract)

Abhayendra Singh , University of Michigan, Ann Arbor
Shaizeen Aga , University of Michigan, Ann Arbor
Satish Narayanasamy , University of Michigan, Ann Arbor
pp. 699-712

Characterizing, modeling, and improving the QoE of mobile devices with low battery level (Abstract)

Kaige Yan , Department of Electrical and Computer Engineering, University of Houston
Xingyao Zhang , Department of Electrical and Computer Engineering, University of Houston
Xin Fu , Department of Electrical and Computer Engineering, University of Houston
pp. 713-724

Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance (Abstract)

Newsha Ardalani , University of Wisconsin-Madison
Clint Lestourgeon , University of Wisconsin-Madison
Karthikeyan Sankaralingam , University of Wisconsin-Madison
Xiaojin Zhu , University of Wisconsin-Madison
pp. 725-737

A fast and accurate analytical technique to compute the AVF of sequential bits in a processor (Abstract)

Steven Raasch , Intel Hudson, MA
Arijit Biswas , Intel Hudson, MA
Jon Stephan , Intel Hudson, MA
Paul Racunas , Nvidia Westford, MA
Joel Emer , Nvidia / MIT, Westford, MA / Cambridge, MA
pp. 738-749

Enabling portable energy efficiency with memory accelerated library (Abstract)

Qi Guo , Carnegie Mellon University
Tze-Meng Low , Carnegie Mellon University
Nikolaos Alachiotis , Carnegie Mellon University
Berkin Akin , Carnegie Mellon University
Larry Pileggi , Carnegie Mellon University
James C. Hoe , Carnegie Mellon University
Franz Franchetti , Carnegie Mellon University
pp. 750-761

Microarchitectural implications of event-driven server-side web applications (Abstract)

Yuhao Zhu , The University of Texas at Austin, Department of Electrical and Computer Engineering
Daniel Richins , The University of Texas at Austin, Department of Electrical and Computer Engineering
Matthew Halpern , The University of Texas at Austin, Department of Electrical and Computer Engineering
Vijay Janapa Reddi , The University of Texas at Austin, Department of Electrical and Computer Engineering
pp. 762-774
75 ms
(Ver 3.3 (11022016))