The Community for Technology Leaders
2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) (2016)
Haifa, Israel
Sept. 11, 2016 to Sept. 15, 2016
ISBN: 978-1-5090-5308-7
TABLE OF CONTENTS

Author index (PDF)

pp. 459-460

[Front matter] (PDF)

pp. i-xii

Big data analytics on flash storage with accelerators (Abstract)

Arvind , Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, USA
pp. 1

Accelerating linked-list traversal through near-data processing (Abstract)

Byungchul Hong , KAIST, Korea
Gwangsun Kim , KAIST, Korea
Jung Ho Ahn , Seoul National University, Korea
Yongkee Kwon , SKHynix, Korea
Hongsik Kim , SKHynix, Korea
John Kim , KAIST, Korea
pp. 113-124

Scalable task parallelism for NUMA: A uniform abstraction for coordinated scheduling and memory management (Abstract)

Andi Drebes , The University of Manchester, School of Computer Science, Oxford Road, M13 9PL, United Kingdom
Antoniu Pop , The University of Manchester, School of Computer Science, Oxford Road, M13 9PL, United Kingdom
Karine Heydemann , Sorbonne Universités, UPMC Univ Paris 06, CNRS, UMR 7606, LIP6, 4, Place Jussieu, F-75252 Cedex 05, France
Albert Cohen , INRIA and École Normale Supérieure, 45 rue d'Ulm, F-75005 Paris, France
Nathalie Drach , Sorbonne Universités, UPMC Univ Paris 06, CNRS, UMR 7606, LIP6, 4, Place Jussieu, F-75252 Cedex 05, France
pp. 125-137

A static cut-off for task parallel programs (Abstract)

Shintaro Iwasaki , Graduate School of Information Science and Technology, The University of Tokyo, Japan
Kenjiro Taura , Graduate School of Information Science and Technology, The University of Tokyo, Japan
pp. 139-150

Greater performance and better efficiency: Predicated execution has shown us the way (Abstract)

Yale N. Patt , The University of Texas at Austin, United States
pp. 151

WearCore: A core for wearable workloads? (Abstract)

Sanyam Mehta , Department of Computer Science, University of Illinois at Urbana-Champaign, USA
Josep Torrellas , Department of Computer Science, University of Illinois at Urbana-Champaign, USA
pp. 153-164

Energy aware persistence: Reducing energy overheads of memory-based persistence in NVMs (Abstract)

Sudarsun Kannan , College of Computing, Georgia Tech, United States
Moinuddin Qureshi , School of ECE, Georgia Tech, United States
Ada Gavrilovska , College of Computing, Georgia Tech, United States
Karsten Schwan , College of Computing, Georgia Tech, United States
pp. 165-177

μC-States: Fine-grained GPU datapath power management (Abstract)

Onur Kayiran , Advanced Micro Devices, Inc., United States
Adwait Jog , College of William and Mary, United States
Ashutosh Pattnaik , Pennsylvania State University, United States
Rachata Ausavarungnirun , Carnegie Mellon University, United States
Xulong Tang , Pennsylvania State University, United States
Mahmut T. Kandemir , Pennsylvania State University, United States
Gabriel H. Loh , Advanced Micro Devices, Inc., United States
Onur Mutlu , ETH Zürich, Switzerland
Chita R. Das , Pennsylvania State University, United States
pp. 17-30

Power tuning HPC jobs on power-constrained systems (Abstract)

Neha Gholkar , North Carolina State University, USA
Frank Mueller , North Carolina State University, USA
Barry Rountree , Lawrence Livermore National Laboratory, USA
pp. 179-190

Online scalability characterization of data-parallel programs on many cores (Abstract)

Younghyun Cho , Department of Computer Science and Engineering, Seoul National University, Korea
Surim Oh , Department of Computer Science and Engineering, Seoul National University, Korea
Bernhard Egger , Department of Computer Science and Engineering, Seoul National University, Korea
pp. 191-205

Speculatively exploiting cross-invocation parallelism (Abstract)

Jialu Huang , Google Inc., United States
Prakash Prabhu , Google Inc., United States
Thomas B. Jablin , UIUC, United States
Soumyadeep Ghosh , Princeton University, United States
Sotiris Apostolakis , Princeton University, United States
Jae W. Lee , Sungkyunkwan University, South Korea
David I. August , Princeton University, United States
pp. 207-219

MicroSpec: Speculation-centric fine-grained parallelization for FSM computations (Abstract)

Junqiao Qiu , Department of Computer Science & Engineering, University of California, Riverside, United States
Zhijia Zhao , Department of Computer Science & Engineering, University of California, Riverside, United States
Bin Ren , Department of Computer Science, The College of William and Mary, United States
pp. 221-233

Hash Map Inlining (Abstract)

Dibakar Gope , Department of Electrical and Computer Engineering, University of Wisconsin - Madison, USA
Mikko H. Lipasti , Department of Electrical and Computer Engineering, University of Wisconsin - Madison, USA
pp. 235-246

Sparso: Context-driven optimizations of sparse linear algebra (Abstract)

Hongbo Rong , Parallel Computing Lab, Intel Corporation, United States
Jongsoo Park , Parallel Computing Lab, Intel Corporation, United States
Lingxiang Xiang , Parallel Computing Lab, Intel Corporation, United States
Todd A. Anderson , Parallel Computing Lab, Intel Corporation, United States
Mikhail Smelyanskiy , Parallel Computing Lab, Intel Corporation, United States
pp. 247-259

Tardis 2.0: Optimized time traveling coherence for relaxed consistency models (Abstract)

Xiangyao Yu , MIT CSAIL, United States
Hongzhe Liu , Algonquin Regional High School, United States
Ethan Zou , Lexington High School, United States
Srinivas Devadas , MIT CSAIL, United States
pp. 261-274

Reducing cache coherence traffic with hierarchical directory cache and NUMA-aware runtime scheduling (Abstract)

Paul Caheny , Barcelona Supercomputing Center, Spain
Marc Casas , Barcelona Supercomputing Center, Spain
Miquel Moreto , Barcelona Supercomputing Center, Spain
Herve Gloaguen , Bull Atos Technologies, Les Clayes-sous-Bois, France
Maxime Saintes , Bull Atos Technologies, Les Clayes-sous-Bois, France
Eduard Ayguade , Barcelona Supercomputing Center, Spain
Jesus Labarta , Barcelona Supercomputing Center, Spain
Mateo Valero , Barcelona Supercomputing Center, Spain
pp. 275-286

Characterizing and optimizing the performance of multithreaded programs under interference (Abstract)

Yong Zhao , The University of Texas at Arlington, United States
Jia Rao , The University of Texas at Arlington, United States
Qing Yi , The University of Colorado Colorado Springs, United States
pp. 287-297

Optimizing indirect memory references with milk (Abstract)

Vladimir Kiriansky , MIT CSAIL, United States
Yunming Zhang , MIT CSAIL, United States
Saman Amarasinghe , MIT CSAIL, United States
pp. 299-312

Combating the reliability challenge of GPU register file at low supply voltage (Abstract)

Jingweijia Tan , ECE Department, University of Houston, TX 77004, United States
Shuaiwen Leon Song , HPC Group, Pacific Northwest National Lab, Richland, WA 99354, United States
Kaige Yan , ECE Department, University of Houston, TX 77004, United States
Xin Fu , ECE Department, University of Houston, TX 77004, United States
Andres Marquez , HPC Group, Pacific Northwest National Lab, Richland, WA 99354, United States
Darren Kerbyson , HPC Group, Pacific Northwest National Lab, Richland, WA 99354, United States
pp. 3-15

Scheduling techniques for GPU architectures with processing-in-memory capabilities (Abstract)

Ashutosh Pattnaik , Pennsylvania State University, United States
Xulong Tang , Pennsylvania State University, United States
Adwait Jog , College of William and Mary, United States
Onur Kayiran , Advanced Micro Devices, Inc., United States
Asit K. Mishra , Intel Labs, United States
Mahmut T. Kandemir , Pennsylvania State University, United States
Onur Mutlu , ETH Zürich, Switzerland
Chita R. Das , Pennsylvania State University, United States
pp. 31-44

Scaling data analytics with moore's law (Abstract)

Kunle Olukotun , Pervasive Parallelism Laboratory, Stanford University, United States
pp. 313

Bridging the semantic gaps of GPU acceleration for scale-out CNN-based big data processing: Think big, see small (Abstract)

Mingcong Song , Department of Electrical and Computer Engineering, University of Florida, Gainesville, USA
Yang Hu , Department of Electrical and Computer Engineering, University of Florida, Gainesville, USA
Yunlong Xu , School of Electronic and Information Engineering, Xi'an Jiaotong University, China
Chao Li , Department of Computer Science and Engineering, Shanghai Jiao Tong University, China
Huixiang Chen , Department of Electrical and Computer Engineering, University of Florida, Gainesville, USA
Jingling Yuan , Wuhan University of Technology, China
Tao Li , Department of Electrical and Computer Engineering, University of Florida, Gainesville, USA
pp. 315-326

A DSL compiler for accelerating image processing pipelines on FPGAs (Abstract)

Nitin Chugh , International Institute of Information Technology, Hyderabad 500032 India
Vinay Vasista , Dept of CSA, Indian Institute of Science, Bengaluru 560012 India
Suresh Purini , International Institute of Information Technology, Hyderabad 500032 India
Uday Bondhugula , Dept of CSA, Indian Institute of Science, Bengaluru 560012 India
pp. 327-338

CAF: Core to core Communication Acceleration Framework (Abstract)

Yipeng Wang , ECE, North Carolina State University, United States
Ren Wang , Intel Corporation, United States
Andrew Herdrich , Intel Corporation, United States
James Tsai , Intel Corporation, United States
Yan Solihin , ECE, North Carolina State University, United States
pp. 351-362

Vectorization of multibyte floating point data formats (Abstract)

Andrew Anderson , Lero, Trinity College Dublin, Ireland
David Gregg , Lero, Trinity College Dublin, Ireland
pp. 363-372

Rinnegan: Efficient resource use in heterogeneous architectures (Abstract)

Sankaralingam Panneerselvam , University of Wisconsin-Madison, United States
Michael Swift , University of Wisconsin-Madison, United States
pp. 373-386

Auto-tuning Spark big data workloads on POWER8: Prediction-based dynamic SMT threading (Abstract)

Zhen Jia , Institute of Computing Technology, Chinese Academy of Sciences, China
Chao Xue , IBM Research-China
Guancheng Chen , IBM Research-China
Jianfeng Zhan , Institute of Computing Technology, Chinese Academy of Sciences, China
Lixin Zhang , Institute of Computing Technology, Chinese Academy of Sciences, China
Yonghua Lin , IBM Research-China
Peter Hofstee , IBM Research-Austin, United States
pp. 387-400

EXCITE-VM: Extending the virtual memory system to support snapshot isolation transactions (Abstract)

Heiner Litz , Stanford University, 353 Serra Mall, CA, 94305, United States of America
Benjamin Braun , Stanford University, 353 Serra Mall, CA, 94305, United States of America
David Cheriton , Stanford University, 353 Serra Mall, CA, 94305, United States of America
pp. 401-412

POSTER: Fly-Over: A light-weight distributed power-gating mechanism for energy-efficient networks-on-chip (Abstract)

Rahul Boyapati , Department of Computer Science and Engineering, Texas A&M University, United States of America
Jiayi Huang , Department of Computer Science and Engineering, Texas A&M University, United States of America
Ningyuan Wang , Google Inc., United States of America
Kyung Hoon Kim , Department of Computer Science and Engineering, Texas A&M University, United States of America
Ki Hwan Yum , Department of Computer Science and Engineering, Texas A&M University, United States of America
Eun Jung Kim , Department of Computer Science and Engineering, Texas A&M University, United States of America
pp. 413-414

POSTER: Exploiting asymmetric multi-core processors with flexible system software (Abstract)

Kallia Chronaki , Barcelona Supercomputing Center, Spain
Miquel Moreto , Barcelona Supercomputing Center, Spain
Marc Casas , Barcelona Supercomputing Center, Spain
Alejandro Rico , ARM, Austin, Texas, United States of America
Rosa M. Badia , Barcelona Supercomputing Center, Spain
Eduard Ayguade , Barcelona Supercomputing Center, Spain
Jesus Labarta , Barcelona Supercomputing Center, Spain
Mateo Valero , Barcelona Supercomputing Center, Spain
pp. 415-417

Poster: Easy PRAM-based high-performance parallel programming with ICE (Abstract)

Fady Ghanim , Electrical and Computer Engineering Department, University of Maryland, College Park, 20742, USA
Rajeev Barua , Electrical and Computer Engineering Department, University of Maryland, College Park, 20742, USA
Uzi Vishkin , Electrical and Computer Engineering Department, University of Maryland, College Park, 20742, USA
pp. 419-420

POSTER: Fault-tolerant execution on COTS multi-core processors with hardware transactional memory support (Abstract)

Florian Haas , Department of Computer Science, University of Augsburg, Germany
Sebastian Weis , Department of Computer Science, University of Augsburg, Germany
Theo Ungerer , Department of Computer Science, University of Augsburg, Germany
Gilles Pokam , Intel Corporation, Santa Clara, United States of America
Youfeng Wu , Intel Corporation, Santa Clara, United States of America
pp. 421-422

POSTER - collective dynamic parallelism for directive based GPU programming languages and compilers (Abstract)

Guray Ozen , Barcelona Supercomputing Center, Universitat Politecnica de Catalunya, Spain
Eduard Ayguade , Barcelona Supercomputing Center, Universitat Politecnica de Catalunya, Spain
Jesus Labarta , Barcelona Supercomputing Center, Universitat Politecnica de Catalunya, Spain
pp. 423-424

POSTER - Firestorm: Operating systems for power-constrained architectures (Abstract)

Sankaralingam Panneerselvam , University of Wisconsin-Madison, United States of America
Michael Swift , University of Wisconsin-Madison, United States of America
pp. 425-427

POSTER: ξ-TAO: A cache-centric execution model and runtime for deep parallel multicore topologies (Abstract)

Miquel Pericas , Chalmers University of Technology, SE-412 96 Göteborg, Sweden
pp. 429-431

POSTER: Efficient self-invalidation/self-downgrade for critical sections with relaxed semantics (Abstract)

Alberto Ros , Universidad de Murcia, Spain
Carl Leonardsson , Uppsala Universitet, Sweden
Christos Sakalis , Uppsala Universitet, Sweden
Stefanos Kaxiras , Uppsala Universitet, Sweden
pp. 433-434

POSTER: SILC-FM: Subblocked interleaved Cache-Like Flat Memory Organization (Abstract)

Jee Ho Ryoo , Department of Electrical and Computer Engineering, The University of Texas at Austin, United States of America
Mitesh R. Meswani , AMD Research, Austin, TX, United States of America
Reena Panda , Department of Electrical and Computer Engineering, The University of Texas at Austin, United States of America
Lizy K. John , Department of Electrical and Computer Engineering, The University of Texas at Austin, United States of America
pp. 435-437

Hybrid data dependence analysis for loop transformations (Abstract)

Diogo Sampaio , Institut National de Recherche en Informatique, France
Alain Ketterlin , Institut National de Recherche en Informatique, France
Louis-Noel Pouchet , The Ohio State University, United States of America
Fabrice Rastello , Institut National de Recherche en Informatique, France
pp. 439-440

POSTER: An optimization of dataflow architectures for scientific applications (Abstract)

Xiaowei Shen , SKL of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Xiaochun Ye , SKL of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Xu Tan , SKL of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Da Wang , SKL of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Zhimin Zhang , SKL of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Dongrui Fan , SKL of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Zhimin Tang , SKL of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
pp. 441-442

POSTER - hVISC: A portable abstraction for heterogeneous parallel systems (Abstract)

Prakalp Srivastava , University of Illinois at Urbana-Champaign, United States of America
Maria Kotsifakou , University of Illinois at Urbana-Champaign, United States of America
Matthew D. Sinclair , University of Illinois at Urbana-Champaign, United States of America
Rakesh Komuravelli , Qualcomm Technologies Inc., United States of America
Vikram Adve , University of Illinois at Urbana-Champaign, United States of America
Sarita Adve , University of Illinois at Urbana-Champaign, United States of America
pp. 443-445

POSTER: An integrated vector-scalar design on an in-order ARM core (Abstract)

Milan Stanic , Barcelona Supercomputing Center, Spain
Oscar Palomar , University of Manchester, UK
Timothy Hayes , Barcelona Supercomputing Center, Spain
Ivan Ratkovic , Barcelona Supercomputing Center, Spain
Osman Unsal , Barcelona Supercomputing Center, Spain
Adrian Cristal , Barcelona Supercomputing Center, Spain
Mateo Valero , Barcelona Supercomputing Center, Spain
pp. 447-448

POSTER: Pagoda: A runtime system to maximize GPU utilization in data parallel tasks with limited parallelism (Abstract)

Tsung Tai Yeh , Department of Electrical and Computer Engineering, Purdue University, United States of America
Amit Sabne , Department of Electrical and Computer Engineering, Purdue University, United States of America
Putt Sakdhnagool , Department of Electrical and Computer Engineering, Purdue University, United States of America
Rudolf Eigenmann , Department of Electrical and Computer Engineering, Purdue University, United States of America
Timothy G. Rogers , Department of Electrical and Computer Engineering, Purdue University, United States of America
pp. 449-450

OAWS: Memory Occlusion Aware Warp Scheduling (Abstract)

Bin Wang , Auburn University, United States of America
Yue Zhu , Florida State University, United States of America
Weikuan Yu , Florida State University, United States of America
pp. 45-55

Student research poster: Slack-aware shared bandwidth management in GPUs (Abstract)

Saumay Dublish , University of Edinburgh, United Kingdom
pp. 451-452

Student research poster - from processing-in-Memory to Processing-in-Storage (Abstract)

Roman Kaplan , Technion - Israel Institute of Technology Haifa, Israel
pp. 453

Student research poster: A low complexity cache sharing mechanism to address system fairness (Abstract)

Vicent Selfa , Dept. of Computer Engineering, Universitat Politècnica de València, Spain
Julio Sahuquillo , Dept. of Computer Engineering, Universitat Politècnica de València, Spain
Salvador Petit , Dept. of Computer Engineering, Universitat Politècnica de València, Spain
Maria E. Gomez , Dept. of Computer Engineering, Universitat Politècnica de València, Spain
pp. 455

Student research poster: A scalable general purpose system for large-scale graph processing (Abstract)

Jiawen Sun , Queen's University Belfast, University Road, UK BT7 1LR
Hans Vandierendonck , Queen's University Belfast, University Road, UK BT7 1LR
Dimitrios S. Nikolopoulos , Queen's University Belfast, University Road, UK BT7 1LR
pp. 456

Student research poster: Software out-of-order execution for in-order architectures (Abstract)

Kim-Anh Tran , Department of Information Technology, Uppsala University, Sweden
pp. 458

Integrating algorithmic parameters into benchmarking and design space exploration in 3D scene understanding (Abstract)

Bruno Bodin , University of Edinburgh, United Kingdom
Luigi Nardi , Imperial College London, United Kingdom
M. Zeeshan Zia , Imperial College London, United Kingdom
Harry Wagstaff , University of Edinburgh, United Kingdom
Govind Sreekar Shenoy , University of Edinburgh, United Kingdom
Murali Emani , Lawrence Livermore National Laboratory, United States of America
John Mawer , University of Manchester, United Kingdom
Christos Kotselidis , University of Manchester, United Kingdom
Andy Nisbet , University of Manchester, United Kingdom
Mikel Lujan , University of Manchester, United Kingdom
Bjorn Franke , University of Edinburgh, United Kingdom
Paul H. J. Kelly , Imperial College London, United Kingdom
Michael O'Boyle , University of Edinburgh, United Kingdom
pp. 57-69

Fusion of parallel array operations (Abstract)

Mads R. B. Kristensen , Niels Bohr Institute, University of Copenhagen, Denmark
Simon A. F. Lund , Niels Bohr Institute, University of Copenhagen, Denmark
Troels Blum , Niels Bohr Institute, University of Copenhagen, Denmark
James Avery , Dept. of Computer Science, University of Copenhagen, Denmark
pp. 71-85

Reduction drawing: Language constructs and polyhedral compilation for reductions on GPUs (Abstract)

Chandan Reddy , INRIA and École Normale Supérieure, Paris, France
Michael Kruse , INRIA and École Normale Supérieure, Paris, France
Albert Cohen , INRIA and École Normale Supérieure, Paris, France
pp. 87-97

Resource conscious reuse-driven tiling for GPUs (Abstract)

Prashant Singh Rawat , Computer Science and Engineering, The Ohio State University, United States of America
Changwan Hong , Computer Science and Engineering, The Ohio State University, United States of America
Mahesh Ravishankar , Nvidia Corporation, Redmond, Washington, United States of America
Vinod Grover , Nvidia Corporation, Redmond, Washington, United States of America
Louis-Noel Pouchet , Computer Science and Engineering, The Ohio State University, United States of America
Atanas Rountev , Computer Science and Engineering, The Ohio State University, United States of America
P. Sadayappan , Computer Science and Engineering, The Ohio State University, United States of America
pp. 99-111
92 ms
(Ver 3.3 (11022016))