The Community for Technology Leaders
2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT) (2010)
Vienna, Austria
Sept. 11, 2010 to Sept. 15, 2010
ISBN: 978-1-5090-5032-1
TABLE OF CONTENTS

Frontmatters (Abstract)

pp. c1

Towards a science of parallel programming (Abstract)

Keshav Pingali , Dept. of Computer Science, The University of Texas at Austin, USA
pp. 3-4

Raising the level of many-core programming with compiler technology - meeting a grand challenge (Abstract)

Wen-mei Hwu , Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, USA
pp. 5

Power and thermal characterization of POWER6 system (Abstract)

Victor Jimenez , Barcelona Supercomputing Center Barcelona, Spain
Francisco J. Cazorla , Barcelona Supercomputing Center Barcelona, Spain
Roberto Gioiosa , Barcelona Supercomputing Center Barcelona, Spain
Mateo Valero , Barcelona Supercomputing Center Barcelona, Spain
Carlos Boneti , Schlumberger BRGC, Rio de Janeiro, Brazil
Eren Kursun , IBM T.J. Watson Research Center, Yorktown Heights, USA
Chen-Yong Cher , IBM T.J. Watson Research Center, Yorktown Heights, USA
Canturk Isci , IBM T.J. Watson Research Center, Yorktown Heights, USA
Alper Buyuktosunoglu , IBM T.J. Watson Research Center, Yorktown Heights, USA
Pradip Bose , IBM T.J. Watson Research Center, Yorktown Heights, USA
pp. 7-18

System-level Max POwer (SYMPO) - a systematic approach for escalating system-level power consumption using synthetic benchmarks (Abstract)

Karthik Ganesan , ECE Department, University of Texas at Austin, USA
Jungho Jo , ECE Department, University of Texas at Austin, USA
W. Lloyd Bircher , ECE Department, University of Texas at Austin, USA
Dimitris Kaseridis , ECE Department, University of Texas at Austin, USA
Zhibin Yu , ECE Department, University of Texas at Austin, USA
Lizy K John , ECE Department, University of Texas at Austin, USA
pp. 19-28

Scalable thread scheduling and global power management for heterogeneous many-core architectures (Abstract)

Jonathan A. Winter , Google Inc., Mountain View, CA, USA
David H. Albonesi , Computer Systems Laboratory, Cornell University, Ithaca, NY, USA
Christine A. Shoemaker , CEE, Applied Math, & ORIE, Cornell University, Ithaca, NY, USA
pp. 29-39

Dynamically managed multithreaded reconfigurable architectures for chip multiprocessors (Abstract)

Matthew A. Watkins , Computer Systems Laboratory, Cornell University, Ithaca, NY, USA
David H. Albonesi , Computer Systems Laboratory, Cornell University, Ithaca, NY, USA
pp. 41-52

Accelerating multicore reuse distance analysis with sampling and parallelization (Abstract)

Derek L. Schuff , Purdue University, West Lafayette, IN 47907, USA
Milind Kulkarni , Purdue University, West Lafayette, IN 47907, USA
Vijay S. Pai , Purdue University, West Lafayette, IN 47907, USA
pp. 53-63

Simple and fast biased locks (Abstract)

Nalini Vasudevan , Columbia University, New York, USA
Kedar S. Namjoshi , Bell Laboratories, Murray Hill, NJ, USA
Stephen A. Edwards , Columbia University, New York, USA
pp. 65-73

Avoiding deadlock avoidance (Abstract)

Hari K. Pyla , Center for High-End Computing Systems, Department of Computer Science, Virginia Tech, Blacksburg, United States
Srinidhi Varadarajan , Center for High-End Computing Systems, Department of Computer Science, Virginia Tech, Blacksburg, United States
pp. 75-85

DAFT: Decoupled acyclic fault tolerance (Abstract)

Yun Zhang , Computer Science Department, Princeton University, NJ 08540, USA
Jae W. Lee , Parakinetics Inc., Princeton, NJ 08542, USA
Nick P. Johnson , Computer Science Department, Princeton University, NJ 08540, USA
David I. August , Computer Science Department, Princeton University, NJ 08540, USA
pp. 87-97

WayPoint: Scaling coherence to 1000-core architectures (Abstract)

John H. Kelm , University of Illinois at Urbana-Champaign, 61801, USA
Matthew R. Johnson , University of Illinois at Urbana-Champaign, 61801, USA
Steven S. Lumetta , University of Illinois at Urbana-Champaign, 61801, USA
Sanjay J. Patel , University of Illinois at Urbana-Champaign, 61801, USA
pp. 99-109

Subspace snooping: Filtering snoops with operating system support (Abstract)

Daehoon Kim , Dept. of Computer Science, KAIST, Korea
Jeongseob Ahn , Dept. of Computer Science, KAIST, Korea
Jaehong Kim , Dept. of Computer Science, KAIST, Korea
Jaehyuk Huh , Dept. of Computer Science, KAIST, Korea
pp. 111-122

Proximity coherence for chip multiprocessors (Abstract)

Nick Barrow-Williams , Computer Laboratory, University of Cambridge, CB3 0FD, UK
Christian Fensch , School of Informatics, University of Edinburgh, EH8 9AB, UK
Simon Moore , Computer Laboratory, University of Cambridge, CB3 0FD, UK
pp. 123-134

SPACE: Sharing pattern-based directory coherence for multicore scalability (Abstract)

Hongzhou Zhao , Department of Computer Science, University of Rochester, USA
Arrvindh Shriraman , Department of Computer Science, University of Rochester, USA
Sandhya Dwarkadas , Department of Computer Science, University of Rochester, USA
pp. 135-146

Feedback-directed pipeline parallelism (Abstract)

M. Aater Suleman , The University of Texas at Austin, USA
Moinuddin K. Qureshi , IBM T. J. Watson Research Center, USA
Khubaib , The University of Texas at Austin, USA
Yale N. Patt , The University of Texas at Austin, USA
pp. 147-156

Scalable hardware support for conditional parallelization (Abstract)

Zheng Li , INRIA Saclay, Orsay, France
Jose Duato , Polytechnic University of Valencia, Spain
Olivier Certner , ST Microelectronics & INRIA Saclay, Orsay, France
Olivier Temam , INRIA Saclay, Orsay, France
pp. 157-168

Reducing task creation and termination overhead in explicitly parallel programs (Abstract)

Jisheng Zhao , Dept. of CS, Rice University, 6100 Main St, Houston TX, USA
Jun Shirako , Dept. of CS, Rice University, 6100 Main St, Houston TX, USA
Vivek Sarkar , Dept. of CS, Rice University, 6100 Main St, Houston TX, USA
V. Krishna Nandivada , IBM India Research Laboratory, EGL, Bangalore, 560071, India
pp. 169-180

MEDICS: Ultra-portable processing for medical image reconstruction (Abstract)

Ganesh Dasika , Advanced Computer Architecture Laboratory, University of Michigan - Ann Arbor, USA
Ankit Sethia , Advanced Computer Architecture Laboratory, University of Michigan - Ann Arbor, USA
Vincentius Robby , Advanced Computer Architecture Laboratory, University of Michigan - Ann Arbor, USA
Trevor Mudge , Advanced Computer Architecture Laboratory, University of Michigan - Ann Arbor, USA
Scott Mahlke , Advanced Computer Architecture Laboratory, University of Michigan - Ann Arbor, USA
pp. 181-192

An OpenCL framework for heterogeneous multicores with local memory (Abstract)

Jaejin Lee , School of Computer Science and Engineering, Seoul National University, 151-744, Korea
Jungwon Kim , School of Computer Science and Engineering, Seoul National University, 151-744, Korea
Sangmin Seo , School of Computer Science and Engineering, Seoul National University, 151-744, Korea
Seungkyun Kim , School of Computer Science and Engineering, Seoul National University, 151-744, Korea
Jungho Park , School of Computer Science and Engineering, Seoul National University, 151-744, Korea
Honggyu Kim , School of Computer Science and Engineering, Seoul National University, 151-744, Korea
Thanh Tuan Dao , School of Computer Science and Engineering, Seoul National University, 151-744, Korea
Yongjin Cho , School of Computer Science and Engineering, Seoul National University, 151-744, Korea
Sung Jong Seo , Samsung Electronics Co., Nongseo-dong, Giheung-gu, Yongin-si, Geonggi-do 446-712, Korea
Seung Hak Lee , Samsung Electronics Co., Nongseo-dong, Giheung-gu, Yongin-si, Geonggi-do 446-712, Korea
Seung Mo Cho , Samsung Electronics Co., Nongseo-dong, Giheung-gu, Yongin-si, Geonggi-do 446-712, Korea
Hyo Jung Song , Samsung Electronics Co., Nongseo-dong, Giheung-gu, Yongin-si, Geonggi-do 446-712, Korea
Sang-Bum Suh , Samsung Electronics Co., Nongseo-dong, Giheung-gu, Yongin-si, Geonggi-do 446-712, Korea
Jong-Deok Choi , Samsung Electronics Co., Nongseo-dong, Giheung-gu, Yongin-si, Geonggi-do 446-712, Korea
pp. 193-204

Twin Peaks: A software platform for heterogeneous computing on general-purpose and graphics processors (Abstract)

Jayanth Gummaraju , Compute Research Group, Advanced Micro Devices Inc., Sunnyvale, California, USA
Ben Sander , Compute Research Group, Advanced Micro Devices Inc., Sunnyvale, California, USA
Laurent Morichetti , Compute Research Group, Advanced Micro Devices Inc., Sunnyvale, California, USA
Benedict R. Gaster , Compute Research Group, Advanced Micro Devices Inc., Sunnyvale, California, USA
Michael Houston , Compute Research Group, Advanced Micro Devices Inc., Sunnyvale, California, USA
Bixia Zheng , Compute Research Group, Advanced Micro Devices Inc., Sunnyvale, California, USA
pp. 205-215

MapCG: Writing parallel program portable between CPU and GPU (Abstract)

Chuntao Hong , Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing China
Dehao Chen , Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing China
Wenguang Chen , Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing China
Weimin Zheng , Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing China
Haibo Lin , China Research Lab of IBM, Beijing, China
pp. 217-226

Adaptive spatiotemporal node selection in dynamic networks (Abstract)

Pradip Hari , Dept. of Computer Science, Rutgers University, USA
John B. P. McCabe , Dept. of Computer Science, Rutgers University, USA
Jonathan Banafato , Dept. of Computer Science, Rutgers University, USA
Marcus Henry , Dept. of Computer Science, Rutgers University, USA
Kevin Ko , Dept. of Electrical Engineering, Princeton University, USA
Emmanouil Koukoumidis , Dept. of Electrical Engineering, Princeton University, USA
Ulrich Kremer , Dept. of Computer Science, Rutgers University, USA
Margaret Martonosi , Dept. of Electrical Engineering, Princeton University, USA
Li-Shiuan Peh , Dept. of Electrical Engineering and Computer Science, MIT, USA
pp. 227-236

On mitigating memory bandwidth contention through bandwidth-aware scheduling (Abstract)

Di Xu , Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Chenggang Wu , Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Pen-Chung Yew , Department of Computer Science and Engineering, University of Minnesota at Twin-Cities, USA
pp. 237-247

AKULA: A toolset for experimenting and developing thread placement algorithms on multicore systems (Abstract)

Sergey Zhuravlev , School of Computing Science, Simon Fraser University, Vancouver, Canada
Sergey Blagodurov , School of Computing Science, Simon Fraser University, Vancouver, Canada
Alexandra Fedorova , School of Computing Science, Simon Fraser University, Vancouver, Canada
pp. 249-259

Criticality-driven superscalar design space exploration (Abstract)

Sandeep Navada , Department of Electrical and Computer Engineering, North Carolina State University, USA
Niket K. Choudhary , Department of Electrical and Computer Engineering, North Carolina State University, USA
Eric Rotenberg , Department of Electrical and Computer Engineering, North Carolina State University, USA
pp. 261-272

A programmable parallel accelerator for learning and classification (Abstract)

Srihari Cadambi , NEC Laboratories America, Inc., 4 Independence Way, Princeton NJ 08540. USA
Abhinandan Majumdar , NEC Laboratories America, Inc., 4 Independence Way, Princeton NJ 08540. USA
Michela Becchi , NEC Laboratories America, Inc., 4 Independence Way, Princeton NJ 08540. USA
Srimat Chakradhar , NEC Laboratories America, Inc., 4 Independence Way, Princeton NJ 08540. USA
Hans Peter Graf , NEC Laboratories America, Inc., 4 Independence Way, Princeton NJ 08540. USA
pp. 273-283

Discovering and understanding performance bottlenecks in transactional applications (Abstract)

Ferad Zyulkyarov , BSC-Microsoft Research Centre, USA
Srdjan Stipic , BSC-Microsoft Research Centre, USA
Tim Harris , Microsoft Research, USA
Osman S. Unsal , BSC-Microsoft Research Centre, USA
Adrian Cristal , BSC-Microsoft Research Centre, USA
Ibrahim Hur , BSC-Microsoft Research Centre, USA
Mateo Valero , BSC-Microsoft Research Centre, USA
pp. 285-294

Efficient sequential consistency using conditional fences (Abstract)

Changhui Lin , CSE Department, University of California, Riverside 92521, USA
Vijay Nagarajan , School of Informatics, University of Edinburgh, United Kingdom
Rajiv Gupta , CSE Department, University of California, Riverside 92521, USA
pp. 295-306

Partitioning streaming parallelism for multi-cores: A machine learning based approach (Abstract)

Zheng Wang , Institute for Computing Systems Architecture, School of Informatics, The University of Edinburgh, UK
Michael F.P. O'Boyle , Institute for Computing Systems Architecture, School of Informatics, The University of Edinburgh, UK
pp. 307-318

Handling the problems and opportunities posed by multiple on-chip memory controllers (Abstract)

Manu Awasthi , School of Computing, University of Utah, USA
David Nellans , School of Computing, University of Utah, USA
Kshitij Sudan , School of Computing, University of Utah, USA
Rajeev Balasubramonian , School of Computing, University of Utah, USA
Al Davis , School of Computing, University of Utah, USA
pp. 319-330

Design and implementation of the PLUG architecture for programmable and efficient network lookups (Abstract)

Amit Kumar , University of Wisconsin-Madison, USA
Lorenzo De Carli , University of Wisconsin-Madison, USA
Sung Jin Kim , University of Wisconsin-Madison, USA
Marc de Kruijf , University of Wisconsin-Madison, USA
Karthikeyan Sankaralingam , University of Wisconsin-Madison, USA
Cristian Estan , NetLogic Microsystems, USA
Somesh Jha , University of Wisconsin-Madison, USA
pp. 331-341

A model for fusion and code motion in an automatic parallelizing compiler (Abstract)

Uday Bondhugula , Advanced Compiler Technologies, IBM T.J. Watson Research Center, Yorktown Heights, New York, USA
Sanjeeb Dash , Business Analytics and Mathematical Sciences, IBM T.J. Watson Research Center, Yorktown Heights, New York, USA
Oktay Gunluk , Business Analytics and Mathematical Sciences, IBM T.J. Watson Research Center, Yorktown Heights, New York, USA
Lakshminarayanan Renganarayanan , Advanced Compiler Technologies, IBM T.J. Watson Research Center, Yorktown Heights, New York, USA
pp. 343-352

Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems (Abstract)

Gregory Diamos , School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, 30332-0250, USA
Andrew Kerr , School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, 30332-0250, USA
Sudhakar Yalamanchili , School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, 30332-0250, USA
Nathan Clark , College of Computing, Georgia Institute of Technology, Atlanta, 30332-0250, USA
pp. 353-364

An empirical characterization of stream programs and its implications for language and compiler design (Abstract)

William Thies , Microsoft Research India
Saman Amarasinghe , Massachusetts Institute of Technology, USA
pp. 365-376

Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information (Abstract)

Georgios Tournavitis , Institute for Computing Systems Architecture, School of Informatics, University of Edinburgh, Scotland, United Kingdom
Bjorn Franke , Institute for Computing Systems Architecture, School of Informatics, University of Edinburgh, Scotland, United Kingdom
pp. 377-388

The paralax infrastructure: Automatic parallelization with a helping hand (Abstract)

Hans Vandierendonck , Dept. of Electronics and Information Systems, Ghent University, Belgium
Sean Rul , Dept. of Electronics and Information Systems, Ghent University, Belgium
Koen De Bosschere , Dept. of Electronics and Information Systems, Ghent University, Belgium
pp. 389-399

AM++: A generalized active message framework (Abstract)

Jeremiah J. Willcock , Indiana University, 150 S. Woodlawn Ave, Bloomington, IN 47401, USA
Torsten Hoefler , University of Illinois at Urbana-Champaign, 1205 W. Clark St., 61801, USA
Nicholas G. Edmonds , Indiana University, 150 S. Woodlawn Ave, Bloomington, IN 47401, USA
Andrew Lumsdaine , Indiana University, 150 S. Woodlawn Ave, Bloomington, IN 47401, USA
pp. 401-410

Using memory mapping to support cactus stacks in work-stealing runtime systems (Abstract)

I-Ting Angelina Lee , MIT CSAIL, 32 Vassar Street, Cambridge, MA 02139, USA
Silas Boyd-Wickizer , MIT CSAIL, 32 Vassar Street, Cambridge, MA 02139, USA
Zhiyi Huang , MIT CSAIL, 32 Vassar Street, Cambridge, MA 02139, USA
Charles E. Leiserson , MIT CSAIL, 32 Vassar Street, Cambridge, MA 02139, USA
pp. 411-420

Speculative-Aware Execution: A simple and efficient technique for utilizing multi-cores to improve single-thread performance (Abstract)

Rania H. Mameesh , Department of Information Engineering, University of Siena, Italy
Manoj Franklin , Department of Electrical and Computer Engineering, College Park, University of Maryland, USA
pp. 421-430

The potential of using dynamic information flow analysis in data value prediction (Abstract)

Walid J. Ghandour , Electrical and Computer Engineering Department, American University of Beirut, Lebanon 1107 2020
Haitham Akkary , Electrical and Computer Engineering Department, American University of Beirut, Lebanon 1107 2020
Wes Masri , Electrical and Computer Engineering Department, American University of Beirut, Lebanon 1107 2020
pp. 431-442

Efficient Runahead Threads (Abstract)

Tanausu Ramirez , Dept. Arquitectura de Computadores, Universitat Politecnica de Catalunya, Barcelona, Spain
Alex Pajuelo , Dept. Arquitectura de Computadores, Universitat Politecnica de Catalunya, Barcelona, Spain
Oliverio J. Santana , Dept. Informática y Sistemas, University of Las Palmas de Gran Canaria, Las Palmas de GC, Spain
Onur Mutlu , Dept. of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
Mateo Valero , DAC - UPC, BSC - Centro Nacional, Supercomputación, Barcelona, Spain
pp. 443-452

Energy efficient speculative threads: Dynamic thread allocation in same-ISA heterogeneous multicore systems (Abstract)

Yangchun Luo , University of Minnesota, Minneapolis, 55455, USA
Venkatesan Packirisamy , NVIDIA Corporation, Santa Clara, CA 95051, USA
Wei-Chung Hsu , National Chiao Tung University, Hsinchu, Taiwan
Antonia Zhai , University of Minnesota, Minneapolis, 55455, USA
pp. 453-464

SWEL: Hardware cache coherence protocols to map shared data onto shared caches (Abstract)

Seth H. Pugsley , School of Computing, University of Utah, USA
Josef B. Spjut , School of Computing, University of Utah, USA
David W. Nellans , School of Computing, University of Utah, USA
Rajeev Balasubramonian , School of Computing, University of Utah, USA
pp. 465-475

ATAC: A 1000-core cache-coherent processor with on-chip optical network (Abstract)

George Kurian , Massachusetts Institute of Technology, Cambridge, MA 02139, USA
Jason E. Miller , Massachusetts Institute of Technology, Cambridge, MA 02139, USA
James Psota , Massachusetts Institute of Technology, Cambridge, MA 02139, USA
Jonathan Eastep , Massachusetts Institute of Technology, Cambridge, MA 02139, USA
Jifeng Liu , Massachusetts Institute of Technology, Cambridge, MA 02139, USA
Jurgen Michel , Massachusetts Institute of Technology, Cambridge, MA 02139, USA
Lionel C. Kimerling , Massachusetts Institute of Technology, Cambridge, MA 02139, USA
Anant Agarwal , Massachusetts Institute of Technology, Cambridge, MA 02139, USA
pp. 477-488

Using dead blocks as a virtual victim cache (Abstract)

Samira Khan , Univ. of Texas at San Antonio, USA
Doug Burger , Microsoft Research, USA
Daniel A. Jimenez , Univ. of Texas at San Antonio, USA
Babak Falsafi , EPFL, Switzerland
pp. 489-500

Compiler-assisted data distribution for chip multiprocessors (Abstract)

Yong Li , Department of ECE, University of Pittsburgh, Benedum Hall, PA, 15261, USA
Rami Melhem , Department of CS, University of Pittsburgh, Sennott Square, PA, 15260, USA
Ahmed Abousamra , Department of CS, University of Pittsburgh, Sennott Square, PA, 15260, USA
Alex K. Jones , Department of ECE, University of Pittsburgh, Benedum Hall, PA, 15261, USA
pp. 501-512

Data layout transformation exploiting memory-level parallelism in structured grid many-core applications (Abstract)

I-Jui Sung , Center for Reliable and High-Performance Computing, University of Illinois at Urbana-Champaign, USA
John A. Stratton , Center for Reliable and High-Performance Computing, University of Illinois at Urbana-Champaign, USA
Wen-Mei W. Hwu , Center for Reliable and High-Performance Computing, University of Illinois at Urbana-Champaign, USA
pp. 513-522

Tiled-MapReduce: Optimizing resource usages of data-parallel applications on multicore with tiling (Abstract)

Rong Chen , Parallel Processing Institute, Fudan University, China
Haibo Chen , Parallel Processing Institute, Fudan University, China
Binyu Zang , Parallel Processing Institute, Fudan University, China
pp. 523-534

On-chip network design considerations for compute accelerators (Abstract)

Ali Bakhoda , University of British Columbia, Department of Electrical and Computer Engineering, Vancouver, Canada
John Kim , KAIST, Department of Computer Science, Daejeon, Korea
Tor M. Aamodt , University of British Columbia, Department of Electrical and Computer Engineering, Vancouver, Canada
pp. 535-536

Believe it or not! multi-core CPUs can match GPU performance for a FLOP-intensive application! (Abstract)

Rajesh Bordawekar , IBM Watson Research Center, Hawthorne, NY 10532, USA
Uday Bondhugula , IBM Watson Research Center, Yorktown Heights, NY 10598, USA
Ravi Rao , IBM Watson Research Center, Yorktown Heights, NY 10598, USA
pp. 537-538

Ordered and unordered algorithms for parallel breadth first search (Abstract)

M. Amber Hassaan , Department of Electrical and Computer Engineering, University of Texas at Austin, USA
Martin Burtscher , Institute for Computational Engineering and Sciences, University of Texas at Austin, USA
Keshav Pingali , Department of Computer Science, University of Texas at Austin, USA
pp. 539-540

Moths: Mobile threads for On-Chip Networks (Abstract)

Matthew Misler , Dept. of Electrical and Computer Engineering, University of Toronto, Ontario, Canada M5S 3G4
Natalie Enright Jerger , Dept. of Electrical and Computer Engineering, University of Toronto, Ontario, Canada M5S 3G4
pp. 541-542

Improving speculative loop parallelization via selective squash and speculation reuse (Abstract)

Santhosh Sharma Ananthramu , Indian Institute of Technology, Kanpur, India
Deepak Majeti , Dept. of Comp. Sci., Rice University, USA
Sanjeev Kumar Aggarwal , Indian Institute of Technology, Kanpur, India
Mainak Chaudhuri , Indian Institute of Technology, Kanpur, India
pp. 543-544

Revisiting sorting for GPGPU stream architectures (Abstract)

Duane G. Merrill , Department of Computer Science, University of Virginia, USA
Andrew S. Grimshaw , Department of Computer Science, University of Virginia, USA
pp. 545-546

Analyzing cache performance bottlenecks of STM applications and addressing them with compiler's help (Abstract)

Sandya Mannarswamy , Indian Institute of Science & HP India
R. Govindarajan , Indian Institute of Science, Bangalore, India
pp. 547-548

An intra-tile cache set balancing scheme (Abstract)

Mohammad Hammoud , Department of Computer Science, University of Pittsburgh, PA, USA
Sangyeun Cho , Department of Computer Science, University of Pittsburgh, PA, USA
Rami G. Melhem , Department of Computer Science, University of Pittsburgh, PA, USA
pp. 549-550

StatCC: A statistical cache contention model (Abstract)

David Eklov , Uppsala University, Department of Information Technology, P.O. Box 337, SE-751 05, Sweden
David Black-Schaffer , Uppsala University, Department of Information Technology, P.O. Box 337, SE-751 05, Sweden
Erik Hagersten , Uppsala University, Department of Information Technology, P.O. Box 337, SE-751 05, Sweden
pp. 551-552

An integer programming framework for optimizing shared memory use on GPUs (Abstract)

Wenjing Ma , Dept. of Computer Science and Engineering, The Ohio State University, Columbus, USA
Gagan Agrawal , Dept. of Computer Science and Engineering, The Ohio State University, Columbus, USA
pp. 553-554

Exploiting subtrace-level parallelism in clustered processors (Abstract)

Rafael Ubal , Dept. of Computing Engineering (DISCA), Universidad Politécnica de Valencia, Camino de Vera, s/n, 46021, Spain
Julio Sahuquillo , Dept. of Computing Engineering (DISCA), Universidad Politécnica de Valencia, Camino de Vera, s/n, 46021, Spain
Salvador Petit , Dept. of Computing Engineering (DISCA), Universidad Politécnica de Valencia, Camino de Vera, s/n, 46021, Spain
Pedro Lopez , Dept. of Computing Engineering (DISCA), Universidad Politécnica de Valencia, Camino de Vera, s/n, 46021, Spain
Jose Duato , Dept. of Computing Engineering (DISCA), Universidad Politécnica de Valencia, Camino de Vera, s/n, 46021, Spain
pp. 555-556

A case for NUMA-aware contention management on multicore systems (Abstract)

Sergey Blagodurov , School of Computing Science, Simon Fraser University, Vancouver, BC, Canada
Alexandra Fedorova , School of Computing Science, Simon Fraser University, Vancouver, BC, Canada
Sergey Zhuravlev , School of Computing Science, Simon Fraser University, Vancouver, BC, Canada
Ali Kamali , School of Computing Science, Simon Fraser University, Vancouver, BC, Canada
pp. 557-558

DMATiler: Revisiting loop tiling for direct memory access (Abstract)

Haibo Lin , IBM Research - China, Beijing, China
Tong Chen , IBM Watson Research Center, Yorktown Heights, NY, USA
Tao Liu , IBM Research - China, Beijing, China
Lakshminarayanan Renganarayana , IBM Watson Research Center, Yorktown Heights, NY, USA
Huoding Li , IBM Systems & Technology Group, Beijing, China
John Kevin O'Brien , IBM Watson Research Center, Yorktown Heights, NY, USA
pp. 559-560

Scaling of the PARSEC benchmark inputs (Abstract)

Christian Bienia , Department of Computer Science, Princeton University, USA
Kai Li , Department of Computer Science, Princeton University, USA
pp. 561-562

Online cache modeling for commodity multicore processors (Abstract)

Richard West , Boston University, MA 02215, USA
Carl A. Waldspurger , VMware, Inc., Palo Alto, CA 94304, USA
Puneet Zaroo , VMware, Inc., Palo Alto, CA 94304, USA
Xiao Zhang , University of Rochester, NY 14627-0226, USA
pp. 563-564

NoC-aware cache design for chip multiprocessors (Abstract)

Ahmed K. Abousamra , University of Pittsburgh, Computer Science Department, PA, USA
Rami G. Melhem , University of Pittsburgh, Computer Science Department, PA, USA
Alex K. Jones , University of Pittsburgh, Electrical and Computer Engineering Department, PA, USA
pp. 565-566

A software-SVM-based transactional memory for multicore accelerator architectures with local memory (Abstract)

Jun Lee , School of Computer Science and Engineering, Seoul National University, Korea
Sangmin Seo , School of Computer Science and Engineering, Seoul National University, Korea
Jaejin Lee , School of Computer Science and Engineering, Seoul National University, Korea
pp. 567-568

NUcache: A multicore cache organization based on Next-Use distance (Abstract)

R Manikantan , Indian Institute of Science, Bangalore, India
Kaushik Rajan , Microsoft Research India
R Govindarajan , Indian Institute of Science, Bangalore, India
pp. 569-570

CoreGenesis: Erasing core boundaries for robust and configurable performance (Abstract)

Shantanu Gupta , Advanced Computer Architecture Laboratory, University of Michigan - Ann Arbor, USA
Shuguang Feng , Advanced Computer Architecture Laboratory, University of Michigan - Ann Arbor, USA
Amin Ansari , Advanced Computer Architecture Laboratory, University of Michigan - Ann Arbor, USA
Ganesh Dasika , Advanced Computer Architecture Laboratory, University of Michigan - Ann Arbor, USA
Scott Mahlke , Advanced Computer Architecture Laboratory, University of Michigan - Ann Arbor, USA
pp. 571-572

Automatic vector instruction selection for dynamic compilation (Abstract)

Rajkishore Barik , Computer Science Department, Rice University, 6100 Main Street, Houston, TX 77005, USA
Jisheng Zhao , Computer Science Department, Rice University, 6100 Main Street, Houston, TX 77005, USA
Vivek Sarkar , Computer Science Department, Rice University, 6100 Main Street, Houston, TX 77005, USA
pp. 573-574

Approximating age-based arbitration in on-chip networks (Abstract)

Michael M. Lee , KAIST, Daejeon, Korea
John Kim , KAIST, Daejeon, Korea
Dennis Abts , Google Inc., Madison, WI, USA
Michael Marty , Google Inc., Madison, WI, USA
Jae W. Lee , Parakinetics Inc., Princeton, NJ, USA
pp. 575-576

Author index (Abstract)

pp. 577-578
94 ms
(Ver 3.3 (11022016))