The Community for Technology Leaders
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (2013)
Edinburgh, United Kingdom United Kingdom
Sept. 7, 2013 to Sept. 11, 2013
ISSN: 1089-795X
ISBN: 978-1-4799-1018-2
TABLE OF CONTENTS

[Front cover] (PDF)

pp. i

Copyright (PDF)

pp. ii

General chairs' welcome message (PDF)

Michael O'Boyle , University of Edinburgh, UK
Christian Fensch , University of Edinburgh, UK
pp. iii

INSPIRE: the insieme parallel intermediate representation (PDF)

Herbert Jordan , Institute of Computer Science / University of Innsbruck, Innsbruck, Austria
Simone Pellegrini , Institute of Computer Science / University of Innsbruck, Innsbruck, Austria
Peter Thoman , Institute of Computer Science / University of Innsbruck, Innsbruck, Austria
Klaus Kofler , Institute of Computer Science / University of Innsbruck, Innsbruck, Austria
Thomas Fahringer , Institute of Computer Science / University of Innsbruck, Innsbruck, Austria
pp. iv

Parallel flow-sensitive pointer analysis by graph-rewriting (PDF)

Vaivaswatha Nagaraj , Indian Institute of Science, Bangalore, India
R. Govindarajan , Indian Institute of Science, Bangalore, India
pp. v-viii

Interprocedural strength reduction of critical sections in explicitly-parallel programs (PDF)

Rajkishore Barik , Intel, Santa Clara, CA, USA
Jisheng Zhao , Rice University, Houston, TX, USA
Vivek Sarkar , Rice University, Houston, TX, USA
pp. ix-xi

ThermOS: system support for dynamic thermal management of chip multi-processors (PDF)

Filippo Sironi , Politecnico di Milano, Milano, Italy
Martina Maggio , Lund University, Lund, Sweden
Riccardo Cattaneo , Politecnico di Milano, Milano, Italy
Giovanni Francesco Del Nero , Politecnico di Milano, Milano, Italy
Donatella Sciuto , Politecnico di Milano, Milano, Italy
Marco Domenico Santambrogio , Politecnico di Milano, Milano, Italy
pp. xii

Coordinated power-performance optimization in manycores (Abstract)

David Kuck , Intel Corp., Hillsboro, OR, USA
pp. 1

APOGEE: adaptive prefetching on GPUs for energy efficiency (Abstract)

Per Stenstrom , Dept. of Comput. Sci. & Eng., Chalmers Univ. of Technol., Goteborg, Sweden
pp. 5

Parallel frame rendering: trading responsiveness for energy on a mobile GPU (Abstract)

Herbert Jordan , Inst. of Comput. Sci., Univ. of Innsbruck, Innsbruck, Austria
Simone Pellegrini , Inst. of Comput. Sci., Univ. of Innsbruck, Innsbruck, Austria
Peter Thoman , Inst. of Comput. Sci., Univ. of Innsbruck, Innsbruck, Austria
Klaus Kofler , Inst. of Comput. Sci., Univ. of Innsbruck, Innsbruck, Austria
Thomas Fahringer , Inst. of Comput. Sci., Univ. of Innsbruck, Innsbruck, Austria
pp. 7-17

Exploring hybrid memory for GPU energy efficiency through software-hardware co-design (Abstract)

Vaivaswatha Nagaraj , Indian Inst. of Sci., Bangalore, India
R. Govindarajan , Indian Inst. of Sci., Bangalore, India
pp. 19-28

S-CAVE: effective SSD caching to improve virtual machine storage performance (Abstract)

Rajkishore Barik , Intel Labs., Santa Clara, CA, USA
Jisheng Zhao , Rice, Houston, TX, USA
Vivek Sarkar , Rice, Houston, TX, USA
pp. 29-40

Writeback-aware bandwidth partitioning for multi-core systems with PCM (Abstract)

Filippo Sironi , Politec. di Milano, Milan, Italy
Martina Maggio , Lund Univ., Lund, Sweden
Riccardo Cattaneo , Politec. di Milano, Milan, Italy
Giovanni F. Del Nero , Politec. di Milano, Milan, Italy
Donatella Sciuto , Politec. di Milano, Milan, Italy
Marco D. Santambrogio , Politec. di Milano, Milan, Italy
pp. 41-50

L1-bandwidth aware thread allocation in multicore SMT processors (Abstract)

Hiroshi Sasaki , Kyushu Univ., Fukuoka, Japan
Satoshi Imamura , Kyushu Univ., Fukuoka, Japan
Koji Inoue , Kyushu Univ., Fukuoka, Japan
pp. 51-61

A unified view of non-monotonic core selection and application steering in heterogeneous chip multiprocessors (Abstract)

Arunachalam Annamalai , Dept. of Electr. & Comput. Eng., Univ. of Massachusetts at Amherst, Amherst, MA, USA
Rance Rodrigues , Dept. of Electr. & Comput. Eng., Univ. of Massachusetts at Amherst, Amherst, MA, USA
Israel Koren , Dept. of Electr. & Comput. Eng., Univ. of Massachusetts at Amherst, Amherst, MA, USA
Sandip Kundu , Dept. of Electr. & Comput. Eng., Univ. of Massachusetts at Amherst, Amherst, MA, USA
pp. 63-72

Memory-centric system interconnect design with hybrid memory cubes (Abstract)

Ankit Sethia , Adv. Comput. Archit. Lab., Univ. of Michigan - Ann Arbor, Ann Arbor, MI, USA
Ganesh Dasika , ARM R&D, Austin, TX, USA
Mehrzad Samadi , Adv. Comput. Archit. Lab., Univ. of Michigan - Ann Arbor, Ann Arbor, MI, USA
Scott Mahlke , Adv. Comput. Archit. Lab., Univ. of Michigan - Ann Arbor, Ann Arbor, MI, USA
pp. 73-82

Neither more nor less: optimizing thread-level parallelism for GPGPUs (Abstract)

Jose-Maria Arnau , Comput. Archit. Dept., Univ. Politec. de Catalunya, Barcelona, Spain
Joan-Manuel Parcerisa , Comput. Archit. Dept., Univ. Politec. de Catalunya, Barcelona, Spain
Polychronis Xekalakis , Intel Labs., Intel Corp., Hillsboro, CA, USA
pp. 83-92

SMT-centric power-aware thread placement in chip multiprocessors (Abstract)

Bin Wang , Auburn Univ., Auburn, AL, USA
Bo Wu , Coll. of William & Mary, Williamsburg, VA, USA
Dong Li , Oak Ridge Nat. Lab., Oak Ridge, TN, USA
Xipeng Shen , Coll. of William & Mary, Williamsburg, VA, USA
Weikuan Yu , Auburn Univ., Auburn, AL, USA
Yizheng Jiao , Auburn Univ., Auburn, AL, USA
Jeffrey S. Vetter , Oak Ridge Nat. Lab., Oak Ridge, TN, USA
pp. 93-102

Fairness-aware scheduling on single-ISA heterogeneous multi-cores (Abstract)

Tian Luo , Ohio State Univ., Columbus, OH, USA
Siyuan Ma , Ohio State Univ., Columbus, OH, USA
Rubao Lee , Ohio State Univ., Columbus, OH, USA
Xiaodong Zhang , Ohio State Univ., Columbus, OH, USA
Deng Liu , VMware Inc., Palo Alto, CA, USA
Li Zhou , Facebook Inc., Menlo Park, CA, USA
pp. 103-112

DANBI: dynamic scheduling of irregular stream programs for many-core systems (Abstract)

Miao Zhou , Dept. of Comput. Sci., Univ. of Pittsburgh, Pittsburgh, PA, USA
Yu Du , Dept. of Comput. Sci., Univ. of Pittsburgh, Pittsburgh, PA, USA
Bruce R. Childers , Dept. of Comput. Sci., Univ. of Pittsburgh, Pittsburgh, PA, USA
Rami Melhem , Dept. of Comput. Sci., Univ. of Pittsburgh, Pittsburgh, PA, USA
Daniel Mosse , Dept. of Comput. Sci., Univ. of Pittsburgh, Pittsburgh, PA, USA
pp. 113-122

An empirical model for predicting cross-core performance interference on multicore processors (Abstract)

Josue Feliu , Dept. of Comput. Eng. (DISCA), Univ. Politec. de Valencia, Val`encia, Spain
Julio Sahuquillo , Dept. of Comput. Eng. (DISCA), Univ. Politec. de Valencia, Val`encia, Spain
Salvador Petit , Dept. of Comput. Eng. (DISCA), Univ. Politec. de Valencia, Val`encia, Spain
Jose Duato , Dept. of Comput. Eng. (DISCA), Univ. Politec. de Valencia, Val`encia, Spain
pp. 123-132

Jigsaw: scalable software-defined caches (Abstract)

Sandeep Navada , CPU Design Center, Qualcomm, Raleigh, NC, USA
Niket K. Choudhary , CPU Design Center, Qualcomm, Raleigh, NC, USA
Salil V. Wadhavkar , CPU Design Center, Qualcomm, Raleigh, NC, USA
Eric Rotenberg , Dept. of Electr. & Comput. Eng., North Carolina State Univ., Raleigh, NC, USA
pp. 133-144

Managing shared last-level cache in a heterogeneous multicore processor (Abstract)

Gwangsun Kim , KAIST, Daejeon, South Korea
John Kim , KAIST, Daejeon, South Korea
Jung Ho Ahn , Seoul Nat. Univ., Seoul, South Korea
Jaeha Kim , Seoul Nat. Univ., Seoul, South Korea
pp. 145-155

Reshaping cache misses to improve row-buffer locality in multicore systems (Abstract)

Onur Kayiran , Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
Adwait Jog , Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
Mahmut T. Kandemir , Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
Chita R. Das , Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
pp. 157-166

Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems (Abstract)

Augusto Vega , IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
Alper Buyuktosunoglu , IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
Pradip Bose , IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
pp. 167-176

Starchart: hardware and software optimization using recursive partitioning regression trees (Abstract)

Kenzo Van Craeynest , Ghent Univ., Ghent, Belgium
Shoaib Akram , Ghent Univ., Ghent, Belgium
Wim Heirman , Ghent Univ., Ghent, Belgium
Aamer Jaleel , VSSAD, Intel Corp., Hillsboro, OR, USA
Lieven Eeckhout , Ghent Univ., Ghent, Belgium
pp. 177-187

RSVM: a region-based software virtual memory for GPU (Abstract)

Changwoo Min , Sungkyunkwan Univ., Suwon, South Korea
Young Ik Eom , Sungkyunkwan Univ., Suwon, South Korea
pp. 189-200

The case for a scalable coherence protocol for complex on-chip cache hierarchies in many core systems (Abstract)

Jiacheng Zhao , Inst. of Comput. Technol., Beijing, China
Xiaobing Feng , SKL Comput. Archit., Inst. of Comput. Technol., Beijing, China
Huimin Cui , SKL Comput. Archit., Inst. of Comput. Technol., Beijing, China
Youliang Yan , Shannon Lab., Huawei Technol. Co., Ltd., Shenzhen, China
Jingling Xue , Sch. of Comput. Sci. & Eng., Univ. of New South Wales, Sydney, NSW, Australia
Wensen Yang , Shannon Lab., Huawei Technol. Co., Ltd., Shenzhen, China
pp. 201-212

Meeting midway: improving CMP performance with memory-side prefetching (Abstract)

Nathan Beckmann , Massachusetts Inst. of Technol., Cambridge, MA, USA
Daniel Sanchez , Massachusetts Inst. of Technol., Cambridge, MA, USA
pp. 213-224

Building expressive, area-efficient coherence directories (Abstract)

Vineeth Mekkat , Dept. of Comput. Sci. & Eng., Univ. of Minnesota, Minneapolis, MN, USA
Anup Holey , Dept. of Comput. Sci. & Eng., Univ. of Minnesota, Minneapolis, MN, USA
Pen-Chung Yew , Dept. of Comput. Sci. & Eng., Univ. of Minnesota, Minneapolis, MN, USA
Antonia Zhai , Dept. of Comput. Sci. & Eng., Univ. of Minnesota, Minneapolis, MN, USA
pp. 225-234

Traffic steering between a low-latency unswitched TL ring and a high-throughput switched on-chip interconnect (Abstract)

Wei Ding , Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
Jun Liu , Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
Mahmut Kandemir , Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
Mary Jane Irwin , Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
pp. 235-244

McRouter: multicast within a router for high performance network-on-chips (Abstract)

Janghaeng Lee , Adv. Comput. Archit. Lab., Univ. of Michigan, Ann Arbor, MI, USA
Mehrzad Samadi , Adv. Comput. Archit. Lab., Univ. of Michigan, Ann Arbor, MI, USA
Yongjun Park , Adv. Comput. Archit. Lab., Univ. of Michigan, Ann Arbor, MI, USA
Scott Mahlke , Adv. Comput. Archit. Lab., Univ. of Michigan, Ann Arbor, MI, USA
pp. 245-255

Concurrent predicates: a debugging technique for every parallel programmer (Abstract)

Wenhao Jia , Princeton Univ., Princeton, NJ, USA
Kelly A. Shaw , Univ. of Richmond, Richmond, CA, USA
Margaret Martonosi , Princeton Univ., Princeton, NJ, USA
pp. 257-267

Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG (Abstract)

Feng Ji , Dept. of Comput. Sci., North Carolina State Univ., Raleigh, NC, USA
Heshan Lin , Dept. of Comput. Sci., Virginia Tech, Blacksburg, VA, USA
Xiaosong Ma , Dept. of Comput. Sci., North Carolina State Univ., Raleigh, NC, USA
pp. 269-278

Vectorization past dependent branches through speculation (Abstract)

Lucia G. Menezo , Univ. of Cantabria, Santander, Spain
Valentin Puente , Univ. of Cantabria, Santander, Spain
Jose Angel Gregorio , Univ. of Cantabria, Santander, Spain
pp. 279-288

Automatic vectorization of tree traversals (Abstract)

Praveen Yedlapalli , Pennsylvania State Univ., University Park, PA, USA
Jagadish Kotra , Pennsylvania State Univ., University Park, PA, USA
Emre Kultursay , Pennsylvania State Univ., University Park, PA, USA
Mahmut Kandemir , Pennsylvania State Univ., University Park, PA, USA
Chita R. Das , Pennsylvania State Univ., University Park, PA, USA
Anand Sivasubramaniam , Pennsylvania State Univ., University Park, PA, USA
pp. 289-298

Generating efficient data movement code for heterogeneous architectures with distributed-memory (Abstract)

Lei Fang , Dept. of ISEE, Zhejiang Univ., Hangzhou, China
Peng Liu , Dept. of ISEE, Zhejiang Univ., Hangzhou, China
Qi Hu , Dept. of ISEE, Zhejiang Univ., Hangzhou, China
Michael C. Huang , Dept. of ECE, Univ. of Rochester, Rochester, NY, USA
Guofan Jiang , IBM China Syst. & Technol. Lab., Shanghai, China
pp. 299-308

Automatic OpenCL work-group size selection for multicore CPUs (Abstract)

Jungju Oh , Sch. of Comput. Sci., Georgia Inst. of Technol., Atlanta, GA, USA
Alenka Zajic , Sch. of Electr. & Comput. Eng., Georgia Inst. of Technol., Atlanta, GA, USA
Milos Prvulovic , Sch. of Comput. Sci., Georgia Inst. of Technol., Atlanta, GA, USA
pp. 309-318

TCPT: thread criticality-driven prefetcher throttling (Abstract)

Yuan He , Univ. of Tokyo, Tokyo, Japan
Hiroshi Sasaki , Kyushu Univ., Fukuoka, Japan
Shinobu Miwa , Univ. of Tokyo, Tokyo, Japan
Hiroshi Nakamura , Univ. of Tokyo, Tokyo, Japan
pp. 319-329

Do inputs matter?: using data-dependence profiling to evaluate thread level speculation in BG/Q (Abstract)

Justin Gottschlich , Intel Corp., Hillsboro, OR, USA
Gilles Pokam , Intel Corp., Hillsboro, OR, USA
Cristiano Pereira , Intel Corp., Hillsboro, OR, USA
Youfeng Wu , Intel Corp., Hillsboro, OR, USA
pp. 331-340

Can lock-free and combining techniques co-exist?: a novel approach on concurrent queue (Abstract)

Venkatraman Govindaraju , Dept. of Comput. Sci., Univ. of Wisconsin - Madison, Madison, WI, USA
Tony Nowatzki , Dept. of Comput. Sci., Univ. of Wisconsin - Madison, Madison, WI, USA
Karthikeyan Sankaralingam , Dept. of Comput. Sci., Univ. of Wisconsin - Madison, Madison, WI, USA
pp. 341-351

Task sampling: computer architecture simulation in the many-core era (Abstract)

Majedul Haque Sujon , Dept. of Comput. Sci., Univ. of TX at San Antonio, San Antonio, TX, USA
R. Clint Whaley , Sch. of EE & CS, Louisiana State Univ., Baton Rouge, LA, USA
Qing Yi , Dept. of Comput. Sci., Univ. of Colorado, Colorado Springs, CO, USA
pp. 353-362

PS-cache: an energy-efficient cache design for chip multiprocessors (Abstract)

Youngjoon Jo , Sch. of Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN, USA
Michael Goldfarb , Sch. of Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN, USA
Milind Kulkarni , Sch. of Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN, USA
pp. 363-374

Dynamic memory access monitoring based on tagged memory (Abstract)

Roshan Dathathri , Dept. of Comput. Sci. & Autom., Indian Inst. of Sci., Bangalore, India
Chandan Reddy , Dept. of Comput. Sci. & Autom., Indian Inst. of Sci., Bangalore, India
Thejas Ramashekar , Dept. of Comput. Sci. & Autom., Indian Inst. of Sci., Bangalore, India
Uday Bondhugula , Dept. of Comput. Sci. & Autom., Indian Inst. of Sci., Bangalore, India
pp. 375-386

Exposing ILP in custom hardware with a dataflow compiler IR (Abstract)

Sangmin Seo , ManyCoreSoft, Seoul, South Korea
Jun Lee , Dept. of Comput. Sci. & Eng., Seoul Nat. Univ., Seoul, South Korea
Gangwon Jo , Dept. of Comput. Sci. & Eng., Seoul Nat. Univ., Seoul, South Korea
Jaejin Lee , Dept. of Comput. Sci. & Eng., Seoul Nat. Univ., Seoul, South Korea
pp. 387-397

TCPT - Thread criticality-driven prefetcher throttling (Abstract)

Biswabandan Panda , Dept. of CSE, Indian Institute of Technology, Madras, India
Shankar Balachandran , Dept. of CSE, Indian Institute of Technology, Madras, India
pp. 399

Do inputs matter? using data-dependence profiling to evaluate thread level speculation in BG/Q (Abstract)

Arnamoy Bhattacharyya , Department of Computing Science, University of Alberta, Edmonton, Canada
pp. 401

Can lock-free and combining techniques co-exist? A novel approach on concurrent queue (Abstract)

Changwoo Min , Sungkyunkwan University, Korea
Young Ik Eom , Sungkyunkwan University, Korea
pp. 403

Task sampling: Computer architecture simulation in the many-core era (Abstract)

Thomas Grass , Universitat Politècnica de Catalunya (BarcelonaTech) and Barcelona Supercomputing Center, 08034, Spain
pp. 405

PS-cache: An energy-efficient cache design for chip multiprocessors (Abstract)

Joan J. Valls , Department of Computer Engineering, Universitat Politècnica de València (Spain)
Alberto Ros , Dept. de Ingeniería y Tecnología de Computadores, Universidad de Murcia (Spain)
Julio Sahuquillo , Department of Computer Engineering, Universitat Politècnica de València (Spain)
Maria E. Gomez , Department of Computer Engineering, Universitat Politècnica de València (Spain)
pp. 407

Dynamic memory access monitoring based on tagged memory (Abstract)

Mikhail Gorelov , Moscow Institute of Physics and Technology/MCST, Moscow, Russia
Lev Mukhanov , MCST, Moscow, Russia
pp. 409

Exposing ILP in custom hardware with a dataflow compiler IR (Abstract)

Ali Mustafa Zaidi , Computer Laboratory, University of Cambridge, CB3 0FD, UK
pp. 411

Author index (PDF)

pp. 412
110 ms
(Ver 3.1 (10032016))