The Community for Technology Leaders
2014 23rd International Conference on Parallel Architecture and Compilation (PACT) (2014)
Edmonton, Canada
Aug. 23, 2014 to Aug. 27, 2014
ISBN: 978-1-5090-6607-0
TABLE OF CONTENTS

Front matters (Abstract)

pp. i-xv

Keynote: Internet of mobile things: Challenges and opportunities (Abstract)

Klara Nahrstedt , University of Illinois at Urbana-Champaign, USA
pp. 1

Virtues and limitations of commodity hardware transactional memory (Abstract)

Nuno Diegues , INESC-ID, Instituto Superior Técnico, Universidade de Lisboa
Paolo Romano , INESC-ID, Instituto Superior Técnico, Universidade de Lisboa
Luis Rodrigues , INESC-ID, Instituto Superior Técnico, Universidade de Lisboa
pp. 3-14

Cooperative cache scrubbing (Abstract)

Jennifer B. Sartor , Ghent University, Belgium
Wim Heirman , Intel ExaScience Lab, Belgium
Stephen M. Blackburn , Australian National University, Australia
Lieven Eeckhout , Ghent University, Belgium
Kathryn S. McKinley , Microsoft Research, Washington, USA
pp. 15-26

KLA: A new algorithmic paradigm for parallel graph computations (Abstract)

Harshvardhan , Parasol Laboratory, Deptartment of Computer Science and Engineering, Texas A&M University
Adam Fidel , Parasol Laboratory, Deptartment of Computer Science and Engineering, Texas A&M University
Nancy M. Amato , Parasol Laboratory, Deptartment of Computer Science and Engineering, Texas A&M University
Lawrence Rauchwerger , Parasol Laboratory, Deptartment of Computer Science and Engineering, Texas A&M University
pp. 27-38

Tiling and optimizing time-iterated computations over periodic domains (Abstract)

Uday Bondhugula , Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560012 India
Vinayaka Bandishti , Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560012 India
Albert Cohen , INRIA and École Normale, Supérieure, 45 rue d'Ulm, Paris 75005, France
Guillain Potron , École Normale Supérieure and Indian Institute of Science, 45 rue d'Ulm, Paris 75005, France
Nicolas Vasilache , Reservoir Labs, 632 Broadway, New York, NY 10012, USA
pp. 39-50

ATCache: Reducing DRAM cache latency via a small SRAM tag cache (Abstract)

Cheng-Chieh Huang , Institute of Computing Systems Architecture, University of Edinburgh
Vijay Nagarajan , Institute of Computing Systems Architecture, University of Edinburgh
pp. 51-60

SpongeDirectory: Flexible sparse directories utilizing multi-level memristors (Abstract)

Lunkai Zhang , State Key Laboratory of Computer Architecture, ICT, Chinese Academy of Sciences
Dmitri Strukov , Electrical and Computer Engineering, UC Santa Barbara
Hebatallah Saadeldeen , Department of Computer Science, UC Santa Barbara
Dongrui Fan , State Key Laboratory of Computer Architecture, ICT, Chinese Academy of Sciences
Mingzhe Zhang , Key Lab of Intelligent Information Processing, ICT, Chinese Academy of Sciences
Diana Franklin , Department of Computer Science, UC Santa Barbara
pp. 61-73

EFetch: Optimizing instruction fetch for event-driven web applications (Abstract)

Gaurav Chadha , University of Michigan, Ann Arbor
Scott Mahlke , University of Michigan, Ann Arbor
Satish Narayanasamy , University of Michigan, Ann Arbor
pp. 75-86

XStream: Cross-core spatial streaming based MLC prefetchers for parallel applications in CMPs (Abstract)

Biswabandan Panda , Department of Computer Science & Engineering, Indian Institute of Technology Madras, Chennai, India
Shankar Balachandran , Department of Computer Science & Engineering, Indian Institute of Technology Madras, Chennai, India
pp. 87-98

What is the cost of weak determinism? (Abstract)

Cedomir Segulja , The Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto
Tarek S. Abdelrahman , The Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto
pp. 99-111

ILP and TLP in shared memory applications: A limit study (Abstract)

Ehsan Fatehi , Electrical and Computer Engineering, Texas A&M University
Paul V. Gratz , Electrical and Computer Engineering, Texas A&M University
pp. 113-125

Versatile and scalable parallel histogram construction (Abstract)

Wookeun Jung , Department of Computer Science and Engineering, Seoul National University, Seoul 151-744, Korea
Jongsoo Park , Parallel Computing Lab, Intel Corporation, 2200 Mission College Blvd., Santa Clara, California 95054, USA
Jaejin Lee , Department of Computer Science and Engineering, Seoul National University, Seoul 151-744, Korea
pp. 127-138

Bitwise data parallelism in regular expression matching (Abstract)

Robert D. Cameron , School of Computing Science, Simon Fraser University, Surrey, British Columbia
Thomas C. Shermer , School of Computing Science, Simon Fraser University, Surrey, British Columbia
Arrvindh Shriraman , School of Computing Science, Simon Fraser University, Surrey, British Columbia
Kenneth S. Herdy , School of Computing Science, Simon Fraser University, Surrey, British Columbia
Dan Lin , School of Computing Science, Simon Fraser University, Surrey, British Columbia
Benjamin R. Hull , School of Computing Science, Simon Fraser University, Surrey, British Columbia
Meng Lin , School of Computing Science, Simon Fraser University, Surrey, British Columbia
pp. 139-150

Adaptive heterogeneous scheduling for integrated GPUs (Abstract)

Rashid Kaleem , Dept. of Computer Science, University of Texas at Austin
Rajkishore Barik , Intel Labs, Santa Clara, CA
Tatiana Shpeisman , Intel Labs, Santa Clara, CA
Chunling Hu , Intel Labs, Santa Clara, CA
Brian T. Lewis , Intel Labs, Santa Clara, CA
Keshav Pingali , Dept. of Computer Science, University of Texas at Austin
pp. 151-162

Warp-aware trace scheduling for GPUs (Abstract)

James A. Jablin , Brown University, Dept. of Computer Science
Thomas B. Jablin , University of Illinois at Urbana-Champaign, Dept. of Electrical and Computer Engineering
Onur Mutlu , Carnegie Mellon University, Dept. of Electrical and Computer Engineering
Maurice Herlihy , Brown University, Dept. of Computer Science
pp. 163-174

CAWS: Criticality-aware warp scheduling for GPGPU workloads (Abstract)

Shin-Ying Lee , Computer Science and Engineering, School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ 85281
Carole-Jean Wu , Computer Science and Engineering, School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ 85281
pp. 175-186

Invyswell: A hybrid transactional memory for Haswell's restricted transactional memory (Abstract)

Irina Calciu , Brown University, Providence, RI, USA
Justin Gottschlich , Intel Labs, Santa Clara, CA, USA
Tatiana Shpeisman , Intel Labs, Santa Clara, CA, USA
Maurice Herlihy , Brown University, Providence, RI, USA
Gilles Pokam , Intel Labs, Santa Clara, CA, USA
pp. 187-199

Consolidated conflict detection for hardware transactional memory (Abstract)

Lihang Zhao , Information Sciences Institute, University of Southern California, Marina del Rey, CA 90292
Jeffrey Draper , Information Sciences Institute, University of Southern California, Marina del Rey, CA 90292
pp. 201-212

DeSTM: Harnessing determinism in STMs for application development (Abstract)

Santosh Pande , Georgia Institute of Technology, Atlanta, Georgia, USA
Ada Gavrilovska , Georgia Institute of Technology, Atlanta, Georgia, USA
Kaushik Ravichandran , Georgia Institute of Technology, Atlanta, Georgia, USA
pp. 213-224

PATS: Pattern aware scheduling and power gating for GPGPUs (Abstract)

Qiumin Xu , Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA
Murali Annavaram , Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA
pp. 225-236

Heterogeneous microarchitectures trump voltage scaling for low-power cores (Abstract)

Andrew Lukefahr , Advanced Computer Architecture Laboratory, Ann Arbor, MI, USA
Shruti Padmanabha , Advanced Computer Architecture Laboratory, Ann Arbor, MI, USA
Reetuparna Das , Advanced Computer Architecture Laboratory, Ann Arbor, MI, USA
Ronald Dreslinski , Advanced Computer Architecture Laboratory, Ann Arbor, MI, USA
Thomas F. Wenisch , Advanced Computer Architecture Laboratory, Ann Arbor, MI, USA
Scott Mahlke , Advanced Computer Architecture Laboratory, Ann Arbor, MI, USA
pp. 237-249

RCS: Runtime resource and core scaling for power-constrained multi-core processors (Abstract)

Hamid Reza Ghasemi , University of Wisconsin-Madison, U.S.A.
Nam Sung Kim , University of Wisconsin-Madison, U.S.A.
pp. 251-262

Realm: An event-based low-level runtime for distributed memory architectures (Abstract)

Alex Aiken , Stanford University, Stanford, CA
Michael Bauer , Stanford University, Stanford, CA
Sean Treichler , Stanford University, Stanford, CA
pp. 263-275

kMAF: Automatic kernel-level management of thread and data affinity (Abstract)

Matthias Diener , Informatics Institute, UFRGS, Porto Alegre, Brazil
Eduardo H. M. Cruz , Informatics Institute, UFRGS, Porto Alegre, Brazil
Philippe O. A. Navaux , Informatics Institute, UFRGS, Porto Alegre, Brazil
Anselm Busse , Communication and Operating Systems Group, TU Berlin, Berlin, Germany
Hans-Ulrich Heis , Communication and Operating Systems Group, TU Berlin Berlin, Germany
pp. 277-288

Shuffling: A framework for lock contention aware thread scheduling for multicore multiprocessor systems (Abstract)

Kishore Kumar , Department of Computer Science and Engineering, University of California, Riverside, Riverside, USA 92521
Pusukuri Rajiv , Department of Computer Science and Engineering, University of California, Riverside, Riverside, USA 92521
Gupta Laxmi , Department of Computer Science and Engineering, University of California, Riverside, Riverside, USA 92521
N. Bhuyan , Department of Computer Science and Engineering, University of California, Riverside, Riverside, USA 92521
pp. 289-300

Keynote: Domain-specific models for innovation in analytics (Abstract)

Bob Blainey , Hardware Acceleration Laboratory, IBM Software Group, Markham, Ontario, Canada
pp. 301

OpenTuner: An extensible framework for program autotuning (Abstract)

Jason Ansel , Massachusetts Institute of Technology, Cambridge, MA
Shoaib Kamil , Massachusetts Institute of Technology, Cambridge, MA
Kalyan Veeramachaneni , Massachusetts Institute of Technology, Cambridge, MA
Jonathan Ragan-Kelley , Massachusetts Institute of Technology, Cambridge, MA
Jeffrey Bosboom , Massachusetts Institute of Technology, Cambridge, MA
Una-May O'Reilly , Massachusetts Institute of Technology, Cambridge, MA
Saman Amarasinghe , Massachusetts Institute of Technology, Cambridge, MA
pp. 303-315

Velociraptor: An embedded compiler toolkit for numerical programs targeting CPUs and GPUs (Abstract)

Rahul Garg , School of Computer Science, McGill University, Montreal, Canada
Laurie Hendren , School of Computer Science, McGill University, Montreal, Canada
pp. 317-329

Memory scheduling towards high-throughput cooperative heterogeneous computing (Abstract)

Hao Wang , The University of Wisconsin-Madison, WI, U.S.A.
Ripudaman Singh , The University of Wisconsin-Madison, WI, U.S.A.
Michael J. Schulte , Advanced Micro Devices, TX, U.S.A.
Nam Sung Kim , The University of Wisconsin-Madison, WI, U.S.A.
pp. 331-341

Bounded memory scheduling of dynamic task graphs (Abstract)

Dragos Sbirlea , Rice University
Zoran Budimlic , Rice University
Vivek Sarkar , Rice University
pp. 343-355

Trading cache hit rate for memory performance (Abstract)

Wei Ding , Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania, USA
Mahmut Kandemir , Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania, USA
Diana Guttman , Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania, USA
Adwait Jog , Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania, USA
Chita R. Das , Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania, USA
Praveen Yedlapalli , Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania, USA
pp. 357-368

COLORIS: A dynamic cache partitioning system using page coloring (Abstract)

Ying Ye , Computer Science Department, Boston University, Boston, MA, USA
Richard West , Computer Science Department, Boston University, Boston, MA, USA
Zhuoqun Cheng , Computer Science Department, Boston University, Boston, MA, USA
Ye Li , Computer Science Department, Boston University, Boston, MA, USA
pp. 381-392

ArrayTool: A lightweight profiler to guide array regrouping (Abstract)

Xu Liu , Department of Computer Science, Rice University, Houston, TX, USA
Kamal Sharma , Department of Computer Science, Rice University, Houston, TX, USA
John Mellor-Crummey , Department of Computer Science, Rice University, Houston, TX, USA
pp. 405-415

Design for scalability in enterprise SSDs (Abstract)

Arash Tavakkol , HPCAN Lab, Computer Engineering Department, Sharif University of Technology, Tehran, Iran
Mohammad Arjomand , HPCAN Lab, Computer Engineering Department, Sharif University of Technology, Tehran, Iran
Hamid Sarbazi-Azad , HPCAN Lab, Computer Engineering Department, Sharif University of Technology, Tehran, Iran
pp. 417-429

D2MA: Accelerating coarse-grained data transfer for GPUs (Abstract)

D. Anoushe Jamshidi , Advanced Computer Architecture Laboratory, University of Michigan - Ann Arbor, MI
Mehrzad Samadi , Advanced Computer Architecture Laboratory, University of Michigan - Ann Arbor, MI
Scott Mahlke , Advanced Computer Architecture Laboratory, University of Michigan - Ann Arbor, MI
pp. 431-442

VAST: The illusion of a large memory space for GPUs (Abstract)

Janghaeng Lee , Advanced Computer Architecture Laboratory, University of Michigan - Ann Arbor, MI, USA
Mehrzad Samadi , Advanced Computer Architecture Laboratory, University of Michigan - Ann Arbor, MI, USA
Scott Mahlke , Advanced Computer Architecture Laboratory, University of Michigan - Ann Arbor, MI, USA
pp. 443-454

Automatic optimization of thread-coarsening for graphics processors (Abstract)

Alberto Magni , School of Informatics, University of Edinburgh, United Kingdom
Christophe Dubach , School of Informatics, University of Edinburgh, United Kingdom
Michael O'Boyle , School of Informatics, University of Edinburgh, United Kingdom
pp. 455-466

Automatic execution of single-GPU computations across multiple GPUs (Abstract)

Javier Cabezas , Barcelona Supercomputing Center
Lluis Vilanova , Barcelona Supercomputing Center
Isaac Geladeno , NVIDIA Corporation
Thomas B. Jablin , University of Illinois
Nacho Navarro , Barcelona Supercomputing Center
Wen-mei Hwu , University of Illinois
pp. 467-468

LCA: A memory link and cache-aware co-scheduling approach for CMPs (Abstract)

Alexandros-Herodotos Haritatos , School of ECE, NTUA
Georgios Goumas , School of ECE, NTUA
Nikos Anastopoulos , School of ECE, NTUA
Konstantinos Nikas , School of ECE, NTUA
Kornilios Kourtis , Dept. of Computer Science, ETH
Nectarios Koziris , School of ECE, NTUA
pp. 469-470

A run-time power manager exploiting software parallelism (Abstract)

Simon Holmbacka , Turku Centre for Computer Science - TUCS, Turku, FIN
Sebastien Lafond , Department of Information Technologies, Abo Akademi University, Turku, FIN
Johan Lilius , Department of Information Technologies, Åbo Akademi University, Turku, FIN
pp. 471-472

Graph-based performance accounting for chip multiprocessor memory systems (Abstract)

Magnus Jahre , Norwegian University of Science and Technology (NTNU), NO-7491 Trondheim, Norway
pp. 473-474

SQRL: Hardware accelerator for collecting software data structures (Abstract)

Snehasish Kumar , School of Computing Science, Simon Fraser University
Arrvindh Shriraman , School of Computing Science, Simon Fraser University
Dan Lin , School of Computing Science, Simon Fraser University
Jordon Phillips , School of Computing Science, Simon Fraser University
pp. 475-476

Optimizing stencil code via locality of computation (Abstract)

Yulong Luo , State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Guangming Tan , State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
pp. 477-478

ADHA: Automatic data layout framework for heterogeneous architectures (Abstract)

Deepak Majeti , Rice University
Kuldeep S. Meel , Rice University
Rajkishore Barik , Intel Labs
Vivek Sarkar , Rice University
pp. 479-480

Active learning accelerated automatic heuristic construction for parallel program mapping (Abstract)

William F. Ogilvie , University of Edinburgh
Pavlos Petoumenos , University of Edinburgh
Zheng Wang , Lancaster University
Hugh Leather , University of Edinburgh
pp. 481-482

Preemptive thread block scheduling with online structural runtime prediction for concurrent GPGPU kernels (Abstract)

Sreepathi Pai , The University of Texas at Austin, Austin, Texas
R. Govindarajan , Indian Institute of Science, Bangalore, India
Matthew J. Thazhuthaveetil , Indian Institute of Science, Bangalore, India
pp. 483-484

Using STT-RAM to enable energy-efficient near-threshold chip multiprocessors (Abstract)

Xiang Pan , Computer Science and Engineering, The Ohio State University
Radu Teodorescu , Computer Science and Engineering, The Ohio State University
pp. 485-486

Protection and utilization in shared cache through rationing (Abstract)

Raj Parihar , Dept. of Electrical & Computer Engineering, Rochester, NY 14627, USA
Jacob Brock , Dept. of Computer Science, University of Rochester, Rochester, NY 14627, USA
Chen Ding , Dept. of Computer Science, University of Rochester, Rochester, NY 14627, USA
Michael C. Huang , Dept. of Electrical & Computer Engineering, Rochester, NY 14627, USA
pp. 487-488

Automatic parallelism through macro dataflow in high-level array languages (Abstract)

Pushkar Ratnalikar , School of Informatics and Computing, Indiana University, Bloomington, IN
Arun Chauhan , School of Informatics and Computing, Indiana University, Bloomington, IN
pp. 489-490

A runtime support mechanism for fast mode switching of a self-morphing core for power efficiency (Abstract)

Sudarshan Srinivasan , ECE Department, University of Massachusetts Amherst, MA, USA
Nithesh Kurella , ECE Department, University of Massachusetts Amherst, MA, USA
Israel Koren , ECE Department, University of Massachusetts Amherst, MA, USA
Sandip Kundu , ECE Department, University of Massachusetts Amherst, MA, USA
Rance Rodrigues , NVIDIA Beaverton, Oregon, USA
pp. 491-492

Rollback-free value prediction with approximate loads (Abstract)

Bradley Thwaites , Georgia Institute of Technology
Gennady Pekhimenko , Carnegie Mellon University
Hadi Esmaeilzadeh , Georgia Institute of Technology
Amir Yazdanbakhsh , Georgia Institute of Technology
Jongse Park , Georgia Institute of Technology
Girish Mururu , Georgia Institute of Technology
Onur Mutlu , Carnegie Mellon University
Todd Mowry , Carnegie Mellon University
pp. 493-494

Measuring flexibility in single-ISA heterogeneous processors (Abstract)

Erik Tomusk , University of Edinburgh, UK
Christophe Dubach , University of Edinburgh, UK
Michael O'Boyle , University of Edinburgh, UK
pp. 495-496

SM-centric transformation: Circumventing hardware restrictions for flexible GPU scheduling (Abstract)

Bo Wu , The College of William and Mary, Virginia, USA
Guoyang Chen , The College of William and Mary, Virginia, USA
Dong Li , Oak Ridge National Laboratory, Tennessee, USA
Xipeng Shen , The College of William and Mary, Virginia, USA
Jeffrey S. Vetter , Oak Ridge National Laboratory, Tennessee, USA
pp. 497-498

An event-based language for dynamic binary translation frameworks (Abstract)

Serguei Makarov , University of Toronto
Angela Demke Brown , University of Toronto
Ashvin Goel , University of Toronto
pp. 499-500

Improving performance of streaming applications with filtering and control messages (Abstract)

Peng Li , Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO 63130
Jeremy Buhler , Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO 63130
pp. 501-502

Stratified sampling for even workload partitioning (Abstract)

Jeeva Paudel , University of Alberta, Edmonton, Canada
Jose Nelson Amaral , University of Alberta, Edmonton, Canada
pp. 503-504

Design of a hybrid MPI-CUDA benchmark suite for CPU-GPU clusters (Abstract)

Tejaswi Agarwal , University of Missouri - Columbia
Michela Becchi , University of Missouri - Columbia
pp. 505-506

Data remapping for an energy efficient burst chop in DRAM memory systems (Abstract)

Sudharsan Jagathrakshakan , Indian Institute of Technology, Madras
Venkata Kalyan Tavva , Indian Institute of Technology, Madras
Madhu Mutyam , Indian Institute of Technology, Madras
pp. 507-508

From petascale to the pocket: Adaptively scaling parallel programs for mobile SoCs (Abstract)

Adam Fidel , Parasol Laboratory, Department of Computer Science and Engineering, Texas A&M University
Nancy M. Amato , Parasol Laboratory, Department of Computer Science and Engineering, Texas A&M University
Lawrence Rauchwerger , Parasol Laboratory, Department of Computer Science and Engineering, Texas A&M University
pp. 511-512

Coarrays in GNU Fortran (Abstract)

Alessandro Fanfarillo , University of Rome, Tor Vergata, Italy
Tobias Burnus , Munich, Germany
Valeria Cardellini , University of Rome, Tor Vergata, Italy
Salvatore Filippone , University of Rome Tor Vergata, Italy
Dan Nagle , NCAR, Boulder, Colorado
Damian Rouson , Sourcery Inc., Oakland, California
pp. 513-514

Processing big data graphs on memory-restricted systems (Abstract)

Harshvardhan , Parasol Laboratory, Department of Computer Science and Engineering, Texas A&M University
Nancy M. Amato , Parasol Laboratory, Department of Computer Science and Engineering, Texas A&M University
Lawrence Rauchwerger , Parasol Laboratory, Department of Computer Science and Engineering, Texas A&M University
pp. 517-518

Author index (Abstract)

pp. 519-520
87 ms
(Ver 3.3 (11022016))