The Community for Technology Leaders
2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT) (2012)
Minneapolis, MN, USA
Sept. 19, 2012 to Sept. 23, 2012
ISBN: 978-1-5090-6609-4
TABLE OF CONTENTS

[Front matter] (PDF)

pp. c1

The changing role of supercomputing (Abstract)

Peter J. Ungaro , Cray Inc., Seattle, Washington, USA
pp. 1

Power-aware multi-core simulation for early design stage hardware/software co-optimization (Abstract)

Wim Heirman , ELIS Department, Ghent University, Belgium
Souradip Sarkar , ELIS Department, Ghent University, Belgium
Trevor E. Carlson , ELIS Department, Ghent University, Belgium
Ibrahim Hur , Intel, Leuven, Belgium
Lieven Eeckhout , ELIS Department, Ghent University, Belgium
pp. 3-12

PGCapping: Exploiting power gating for power capping and core lifetime balancing in CMPs (Abstract)

Kai Ma , Department of Electrical and Computer Engineering, The Ohio State University, Columbus, 43210, USA
Xiaorui Wang , Department of Electrical and Computer Engineering, The Ohio State University, Columbus, 43210, USA
pp. 13-22

Power-efficient time-sensitive mapping in heterogeneous systems (Abstract)

Cong Liu , University of North Carolina at Chapel Hill, Dept. of Computer Science, USA
Jian Li , IBM Austin Research Laboratory, USA
Wei Huang , IBM Austin Research Laboratory, USA
Juan Rubio , IBM Austin Research Laboratory, USA
Evan Speight , IBM Austin Research Laboratory, USA
Felix Xiaozhu Lin , Rice University, Dept. of Computer Science, USA
pp. 23-32

Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme (Abstract)

Sreepathi Pai , Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore, India
R. Govindarajan , Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore, India
Matthew J. Thazhuthaveetil , Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore, India
pp. 33-42

Riposte: A trace-driven compiler and parallel VM for vector code in R (Abstract)

Justin Talbot , Stanford University, USA
Zachary DeVito , Stanford University, USA
Pat Hanrahan , Stanford University, USA
pp. 43-51

Auto-parallelizing stateful distributed streaming applications (Abstract)

Scott Schneider , IBM Thomas J. Watson Research Center, IBM Research, Hawthorne, New York, 10532, USA
Martin Hirzel , IBM Thomas J. Watson Research Center, IBM Research, Hawthorne, New York, 10532, USA
Bugra Gedik , Department of Computer Engineering, Bilkent University, Ankara, 06800, Turkey
Kun-Lung Wu , IBM Thomas J. Watson Research Center, IBM Research, Hawthorne, New York, 10532, USA
pp. 53-64

PEPON: Performance-aware hierarchical power budgeting for NoC based multicores (Abstract)

Akbar Sharifi , Department of CSE, The Pennsylvania State University, University Park, 16802, USA
Asit K. Mishra , Department of CSE, The Pennsylvania State University, University Park, 16802, USA
Shekhar Srikantaiah , Department of CSE, The Pennsylvania State University, University Park, 16802, USA
Mahmut Kandemir , Department of CSE, The Pennsylvania State University, University Park, 16802, USA
Chita R. Das , Department of CSE, The Pennsylvania State University, University Park, 16802, USA
pp. 65-74

XPoint cache: Scaling existing bus-based coherence protocols for 2D and 3D many-core systems (Abstract)

Ronald G. Dreslinski , Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, 48109, USA
Thomas Manville , Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, 48109, USA
Korey Sewell , Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, 48109, USA
Reetuparna Das , Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, 48109, USA
Nathaniel Pinckney , Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, 48109, USA
Sudhir Satpathy , Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, 48109, USA
David Blaauw , Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, 48109, USA
Dennis Sylvester , Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, 48109, USA
Trevor Mudge , Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, 48109, USA
pp. 75-85

APCR: An adaptive physical channel regulator for On-Chip Interconnects (Abstract)

Lei Wang , Department of Computer Science and Engineering, Texas A&M University, College Station, USA
Poornachandran Kumar , Department of Computer Science and Engineering, Texas A&M University, College Station, USA
Ki Hwan Yum , Department of Computer Science and Engineering, Texas A&M University, College Station, USA
Eun Jung Kim , Department of Computer Science and Engineering, Texas A&M University, College Station, USA
pp. 87-96

Pointy: A hybrid pointer prefetcher for managed runtime systems (Abstract)

Ioana Burcea , IBM Research T.J. Watson, USA
Livio Soares , IBM Research T.J. Watson, USA
Andreas Moshovos , University of Toronto, Canada
pp. 97-106

Scalability-based manycore partitioning (Abstract)

Hiroshi Sasaki , Kyushu University, 744 Motooka Nishi-ku, Fukuoka, Japan
Teruo Tanimoto , The University of Tokyo, 7-3-1 Bunkyo-ku, Hongo, Japan
Koji Inoue , Kyushu University, 744 Motooka Nishi-ku, Fukuoka, Japan
Hiroshi Nakamura , The University of Tokyo, 7-3-1 Bunkyo-ku, Hongo, Japan
pp. 107-116

Optimizing datacenter power with memory system levers for guaranteed Quality-of-Service (Abstract)

Kshitij Sudan , University of Utah, USA
Sadagopan Srinivasan , Intel Corporation, USA
Rajeev Balasubramonian , University of Utah, USA
Ravi Iyer , Intel Corporation, USA
pp. 117-126

Evaluation of Blue Gene/Q hardware support for transactional memories (Abstract)

Amy Wang , IBM Toronto Software Lab., Markham, ON, Canada
Matthew Gaudet , Dep. of Computing Science, University of Alberta, Edmonton, Canada
Peng Wu , IBM Research, Yorktown, NY, USA
Jose Nelson Amaral , Dep. of Computing Science, University of Alberta, Edmonton, Canada
Martin Ohmacht , IBM Research, Yorktown, NY, USA
Christopher Barton , IBM Toronto Software Lab., Markham, ON, Canada
Raul Silvera , IBM Toronto Software Lab., Markham, ON, Canada
Maged Michael , IBM Research, Yorktown, NY, USA
pp. 127-136

Making data prefetch smarter: Adaptive prefetching on POWER7 (Abstract)

Victor Jimenez , Barcelona Supercomputing Center, Spain
Roberto Gioiosa , Pacific Northwest National Laboratory, Richland, WA, USA
Francisco J. Cazorla , Spanish National Research, Council and Barcelona, Supercomputing Center, Spain
Alper Buyuktosunoglu , IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
Pradip Bose , IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
Francis P. O'Connell , IBM Systems and Technology Group, Austin, TX, USA
pp. 137-146

Enhancing performance optimization of multicore chips and multichip nodes with data structure metrics (Abstract)

Ashay Rane , Texas Advanced Computing Center, The University of Texas at Austin, USA
James Browne , Department of Computer Science, The University of Texas at Austin, USA
pp. 147-156

Compiling to avoid communication (Abstract)

Kathy Yelick , University of California at Berkeley, Lawrence Berkeley National Laboratory, USA
pp. 157

Visualizing transactional memory (Abstract)

Justin E. Gottschlich , Intel Corporation, Programming Systems Lab, USA
Maurice P. Herlihy , Brown University, Dept. of Computer Science, Canada
Gilles A. Pokam , Intel Corporation, Programming Systems Lab, USA
Jeremy G. Siek , University of Colorado-Boulder, Dept. of Electrical and Computer Engineering, USA
pp. 159-170

Sandboxing transactional memory (Abstract)

Luke Dalessandro , University of Rochester, Department of Computer Science, USA
Michael L. Scott , University of Rochester, Department of Computer Science, USA
pp. 171-179

Transactional prefetching: Narrowing the window of contention in Hardware Transactional Memory (Abstract)

Anurag Negi , Chalmers University of Technology, Sweden
Adria Armejach , Barcelona Supercomputing Center, Spain
Adrian Cristal , Barcelona Supercomputing Center, Spain
Osman S. Unsal , Barcelona Supercomputing Center, Spain
Per Stenstrom , Chalmers University of Technology, Sweden
pp. 181-190

RISE: Improving the streaming processors reliability against soft errors in GPGPUs (Abstract)

Jingweijia Tan , Department of Electrical Engineering and Computer Science University of Kansas Lawrence, 66045 USA
Xin Fu , Department of Electrical Engineering and Computer Science University of Kansas Lawrence, 66045 USA
pp. 191-200

Chrysalis analysis: Incorporating synchronization arcs in dataflow-analysis-based parallel monitoring (Abstract)

Michelle L. Goodstein , Carnegie Mellon University, USA
Shimin Chen , HP Labs China, China
Phillip B. Gibbons , Intel Labs Pittsburgh, USA
Michael A. Kozuch , Intel Labs Pittsburgh, USA
Todd C. Mowry , Carnegie Mellon University, USA
pp. 201-212

Probabilistic diagnosis of performance faults in large-scale parallel applications (Abstract)

Ignacio Laguna , Purdue University, School of Electrical and Computer Engineering, West Lafayette, IN 47907, USA
Dong H. Ahn , Lawrence Livermore National Laboratory, Computation Directorate, CA 94550, USA
Bronis R. de Supinski , Lawrence Livermore National Laboratory, Computation Directorate, CA 94550, USA
Saurabh Bagchi , Purdue University, School of Electrical and Computer Engineering, West Lafayette, IN 47907, USA
Todd Gamblin , Lawrence Livermore National Laboratory, Computation Directorate, CA 94550, USA
pp. 213-222

Top500 versus sustained performance - the top problems with the TOP500 list - and what to do about them (Abstract)

William Kramer , National Center for Supercomputing Applications, University of Illinois, 1205 W Clark Street, Urbana, 61821, USA
pp. 223-230

Practically Private: Enabling high performance CMPs through compiler-assisted data classification (Abstract)

Yong Li , Department of ECE, University of Pittsburgh, PA, 15261, USA
Rami Melhem , Department of CS, University of Pittsburgh, PA, 15260, USA
Alex K. Jones , Department of ECE, University of Pittsburgh, PA, 15261, USA
pp. 231-240

Complexity-effective multicore coherence (Abstract)

Alberto Ros , Department of Computer Engineering, University of Murcia, Spain
Stefanos Kaxiras , Department of Information Technology, Uppsala University, Sweden
pp. 241-251

HaLock: Hardware-assisted lock contention detection in multithreaded applications (Abstract)

Yongbing Huang , State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Zehan Cui , State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Licheng Chen , State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Wenli Zhang , State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Yungang Bao , State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Mingyu Chen , State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
pp. 253-262

Runtime detection and optimization of collective communication patterns (Abstract)

Torsten Hoefler , Department of Computer Science, ETH Zurich, Switzerland
Timo Schneider , University of Illinois at Urbana-Champaign, USA
pp. 263-272

Coalition Threading: Combining traditional and non-traditional parallelism to maximize scalability (Abstract)

Md Kamruzzaman , Computer Science and Engineering, University of California, San Diego, USA
Steven Swanson , Computer Science and Engineering, University of California, San Diego, USA
Dean M. Tullsen , Computer Science and Engineering, University of California, San Diego, USA
pp. 273-282

Shared memory multiplexing: A novel way to improve GPGPU throughput (Abstract)

Yi Yang , North Carolina State University, Raleigh, USA
Ping Xiang , North Carolina State University, Raleigh, USA
Mike Mantor , Advanced Micro Devices, Orlando, FL, USA
Norm Rubin , Advanced Micro Devices, Boxborough, MA, USA
Huiyang Zhou , North Carolina State University, Raleigh, USA
pp. 283-292

Introducing Hierarchy-awareness in replacement and bypass algorithms for last-level caches (Abstract)

Mainak Chaudhuri , Indian Institute of Technology, Kanpur 208016, India
Jayesh Gaur , Intel Architecture Group, Bangalore 560103, India
Nithiyanandan Bashyam , Intel Architecture Group, Bangalore 560103, India
Sreenivas Subramoney , Intel Architecture Group, Bangalore 560103, India
Joseph Nuzman , Intel Architecture Group, Haifa 31015, Israel
pp. 293-304

Efficient techniques for predicting cache sharing and throughput (Abstract)

Andreas Sandberg , Uppsala University, Sweden
David Black-Schaffer , Uppsala University, Sweden
Erik Hagersten , Uppsala University, Sweden
pp. 305-314

Optimal bypass monitor for high performance last-level caches (Abstract)

Lingda Li , Microprocessor Research and Development Center, Peking University, Beijing, China
Dong Tong , Microprocessor Research and Development Center, Peking University, Beijing, China
Zichao Xie , Microprocessor Research and Development Center, Peking University, Beijing, China
Junlin Lu , Microprocessor Research and Development Center, Peking University, Beijing, China
Xu Cheng , Microprocessor Research and Development Center, Peking University, Beijing, China
pp. 315-324

Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads (Abstract)

Vijay Sathish , The University of Wisconsin-Madison, U.S.A.
Michael J. Schulte , Advanced Micro Devices, TX, U.S.A.
Nam Sung Kim , The University of Wisconsin-Madison, U.S.A.
pp. 325-334

Multi2Sim: A simulation framework for CPU-GPU computing (Abstract)

Rafael Ubal , Electrical and Computer Engineering Dept., Northeastern University, 360 Huntington Ave., Boston, MA 02115, USA
Byunghyun Jang , Computer and Information Science Dept., University of Mississippi, P. O. Box 1848, University, 38677, USA
Perhaad Mistry , Electrical and Computer Engineering Dept., Northeastern University, 360 Huntington Ave., Boston, MA 02115, USA
Dana Schaa , Electrical and Computer Engineering Dept., Northeastern University, 360 Huntington Ave., Boston, MA 02115, USA
David Kaeli , Electrical and Computer Engineering Dept., Northeastern University, 360 Huntington Ave., Boston, MA 02115, USA
pp. 335-344

A yoke of oxen and a thousand chickens for heavy lifting graph processing (Abstract)

Abdullah Gharaibeh , Department of Electrical and Computer Engineering, The University of British Columbia, Canada
Lauro Beltrao Costa , Department of Electrical and Computer Engineering, The University of British Columbia, Canada
Elizeu Santos-Neto , Department of Electrical and Computer Engineering, The University of British Columbia, Canada
Matei Ripeanu , Department of Electrical and Computer Engineering, The University of British Columbia, Canada
pp. 345-354

The evicted-address filter: A unified mechanism to address both cache pollution and thrashing (Abstract)

Vivek Seshadri , Carnegie Mellon University, USA
Onur Mutlu , Carnegie Mellon University, USA
Michael A Kozuch , Intel Labs Pittsburgh, USA
Todd C Mowry , Carnegie Mellon University, USA
pp. 355-366

A software memory partition approach for eliminating bank-level interference in multicore systems (Abstract)

Lei Liu , State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Zehan Cui , State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Mingjie Xing , State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Yungang Bao , State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Mingyu Chen , State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Chengyong Wu , State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
pp. 367-375

Base-delta-immediate compression: Practical data compression for on-chip caches (Abstract)

Gennady Pekhimenko , Carnegie Mellon University, USA
Vivek Seshadri , Carnegie Mellon University, USA
Onur Mutlu , Carnegie Mellon University, USA
Michael A. Kozuch , Intel Labs Pittsburgh, USA
Phillip B. Gibbons , Intel Labs Pittsburgh, USA
Todd C. Mowry , Carnegie Mellon University, USA
pp. 377-388

Hardware acceleration in the IBM PowerEN processor: architecture and performance (Abstract)

Anil Krishna , Systems and Technology Group, IBM, USA
Timothy Heil , Microsoft IEB, UK
Nicholas Lindberg , Milwaukee Institute, USA
Farnaz Toussi , Systems and Technology Group, IBM, USA
Steven VanderWiel , Systems and Technology Group, IBM, USA
pp. 389-399

Workload and power budget partitioning for single-chip heterogeneous processors (Abstract)

Hao Wang , The University of Wisconsin-Madison, U.S.A.
Vijay Sathish , The University of Wisconsin-Madison, U.S.A.
Ripudaman Singh , The University of Wisconsin-Madison, U.S.A.
Michael J. Schulte , Advanced Micro Devices, TX, U.S.A.
Nam Sung Kim , The University of Wisconsin-Madison, U.S.A.
pp. 401-410

Database analytics acceleration using FPGAs (Abstract)

Bharat Sukhwani , IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598, USA
Hong Min , IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598, USA
Mathew Thoennes , IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598, USA
Parijat Dube , IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598, USA
Balakrishna Iyer , IBM Santa Teresa Lab, 555 Bailey Ave, San Jose, CA 95141, USA
Bernard Brezzo , IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598, USA
Donna Dillenberger , IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598, USA
Sameh Asaad , IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598, USA
pp. 411-420

LumiNOC: A power-efficient, high-performance, photonic network-on-chip for future parallel architectures (Abstract)

Cheng Li , Department of Electrical and Computer Engineering, Texas A&M University, College Station, 77843-3128, USA
Mark Browning , Department of Electrical and Computer Engineering, Texas A&M University, College Station, 77843-3128, USA
Paul V. Gratz , Department of Electrical and Computer Engineering, Texas A&M University, College Station, 77843-3128, USA
Samuel Palermo , Department of Electrical and Computer Engineering, Texas A&M University, College Station, 77843-3128, USA
pp. 421-422

Acceleration of bulk memory operations in a heterogeneous multicore architecture (Abstract)

JongHyuk Lee , University of Houston, TX 77004, USA
Ziyi Liu , University of Houston, TX 77004, USA
Xiaonan Tian , University of Houston, TX 77004, USA
Dong Hyuk Woo , Intel Labs, Santa Clara, CA 95054, USA
Weidong Shi , University of Houston, TX 77004, USA
Dainis Boumber , University of Houston, TX 77004, USA
pp. 423-424

Integrating nanophotonics in GPU microarchitecture (Abstract)

Nilanjan Goswami , University of Florida, Gainesville, USA
Zhongqi Li , University of Florida, Gainesville, USA
Ajit Verma , University of Florida, Gainesville, USA
Ramkumar Shankar , University of Florida, Gainesville, USA
Tao Li , University of Florida, Gainesville, USA
pp. 425-426

Branch and data herding: Reducing control and memory divergence for error-tolerant GPU applications (Abstract)

John Sartori , University of Illinois at Urbana-Champaign, USA
Rakesh Kumar , University of Illinois at Urbana-Champaign, USA
pp. 427-428

Layout-oblivious optimization for matrix computations (Abstract)

Huimin Cui , SKL Computer Architecture, Institute of Computing Technology, CAS, Beijing, China
Qing Yi , Depart. of Computer Science, University of Texas at San Antonio, USA
Jingling Xue , School of Computer Science and Engineering, University of New South Wales, Sydney, Australia
Xiaobing Feng , SKL Computer Architecture, Institute of Computing Technology, CAS, Beijing, China
pp. 429-430

Boost.SIMD: Generic programming for portable SIMDization (Abstract)

Pierre Esterie , LRI, Université Paris-Sud XI, Orsay, France
Mathias Gaunard , Metascale, Orsay, France
Joel Falcou , LRI, Université Paris-Sud XI, Orsay, France
Jean-Thierry Lapreste , IP (Institut Pascal), Université, Blaise Pascal, Clermont-Ferrand, France
Brigitte Rozoy , LRI, Université Paris-Sud XI, Orsay, France
pp. 431-432

Speculative parallelization needs rigor (Abstract)

Zhijia Zhao , Computer Science Department, College of William and Mary, VA, USA
Bo Wu , Computer Science Department, College of William and Mary, VA, USA
Xipeng Shen , Computer Science Department, College of William and Mary, VA, USA
pp. 433-434

Supporting stateful tasks in a dataflow graph (Abstract)

Vladimir Gajinov , Universitat Politecnica de Catalunya, Spain
Srdjan Stipic , Universitat Politecnica de Catalunya, Spain
Osman S. Unsal , Barcelona Supercomputing Center, Spain
Tim Harris , Microsoft Research Cambridge, Spain
Eduard Ayguade , Universitat Politecnica de Catalunya, Spain
Adrian Cristal , Artificial Intelligence Research Institute, Spain
pp. 435-436

MaSiF: Machine learning guided auto-tuning of Parallel Skeletons (Abstract)

Alexander Collins , University of Edinburgh, School of Informatics, Scotland
Christian Fensch , University of Edinburgh, School of Informatics, Scotland
Hugh Leather , University of Edinburgh, School of Informatics, Scotland
pp. 437-438

TMNOC: A case of HTM and NoC Co-design for increased energy efficiency and concurrency (Abstract)

Lihang Zhao , Information Sciences Institute, University of Southern California, Marina del Rey, 90242, USA
Woojin Choi , Information Sciences Institute, University of Southern California, Marina del Rey, 90242, USA
Jeff Draper , Information Sciences Institute, University of Southern California, Marina del Rey, 90242, USA
pp. 439-440

Application-aware prefetch prioritization in on-chip networks (Abstract)

Nachiappan Chidambaram Nachiappan , The Pennsylvania State University, USA
Asit K. Mishra , Intel Corp., USA
Mahmut Kandemir , The Pennsylvania State University, USA
Anand Sivasubramaniam , The Pennsylvania State University, USA
Onur Mutlu , Carnegie Mellon University, USA
Chita R. Das , The Pennsylvania State University, USA
pp. 441-442

ReCaP: A Region-Based cure for the common cold cache (Abstract)

Jason Zebchuk , University of Toronto, Canada
Harold W. Cain , IBM T.J. Watson Research Center, USA
Vijayalakshmi Srinivasan , IBM T.J. Watson Research Center, USA
Andreas Moshovos , University of Toronto, USA
pp. 443-444

Power-efficient computing for compute-intensive GPGPU applications (Abstract)

Syed Zohaib Gilani , University of Wisconsin-Madison, USA
Nam Sung Kim , University of Wisconsin-Madison, USA
Michael Schulte , AMD Research, Austin, USA
pp. 445-446

Off-chip access localization for NoC-based multicores (Abstract)

Wei Ding , The Pennsylvania State University, USA
Mahmut Kandemir , The Pennsylvania State University, USA
Yuanrui Zhang , Intel, USA
Emre Kultursay , The Pennsylvania State University, USA
pp. 447-448

Many-thread aware instruction-level parallelism: Architecting shader cores for GPU computing (Abstract)

Ping Xiang , Dept. of Electrical and Computer Engineering, North Carolina State University, Raleigh, USA
Yi Yang , Dept. of Electrical and Computer Engineering, North Carolina State University, Raleigh, USA
Mike Mantor , Graphics Products Group, AMD Inc., Orlando, FL, USA
Norm Rubin , Graphics Products Group, AMD Inc., Orlando, FL, USA
Huiyang Zhou , Dept. of Electrical and Computer Engineering, North Carolina State University, Raleigh, USA
pp. 449-450

PS-Dir: A scalable two-level directory cache (Abstract)

Joan J. Valls , Department of Computer Engineering, Universitat Politècnica de València (Spain)
Alberto Ros , Dept. de Ingeniería y Tecnología de Computadores, Universidad de Murcia (Spain)
Julio Sahuquillo , Department of Computer Engineering, Universitat Politècnica de València (Spain)
Maria E. Gomez , Department of Computer Engineering, Universitat Politècnica de València (Spain)
Jose Duato , Department of Computer Engineering, Universitat Politècnica de València (Spain)
pp. 451-452

Inference and declaration of independence: Impact on deterministic task parallelism (Abstract)

Foivos S. Zakkak , Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, Crete, Greece
Dimitrios Chasapis , Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, Crete, Greece
Polyvios Pratikakis , Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, Crete, Greece
Angelos Bilas , Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, Crete, Greece
Dimitrios S. Nikolopoulos , School of EEECS, Queen's University of Belfast, Northern Ireland, UK
pp. 453-454

Application-to-core mapping policies to reduce memory interference in multi-core systems (Abstract)

Reetuparna Das , University of Michigan, USA
Rachata Ausavarungnirun , Carnegie Mellon University, USA
Onur Mutlu , Carnegie Mellon University, USA
Akhilesh Kumar , Intel Labs, USA
Mani Azimi , Intel Labs, USA
pp. 455-456

Bandwidth Bandit: Quantitative characterization of memory contention (Abstract)

David Eklov , Uppsala University, Department of Information Technology, Sweden
Nikos Nikoleris , Uppsala University, Department of Information Technology, Sweden
David Black-Schaffer , Uppsala University, Department of Information Technology, Sweden
Erik Hagersten , Uppsala University, Department of Information Technology, Sweden
pp. 457-458

Speculative dynamic vectorization for HW/SW codesigned processors (Abstract)

Rakesh Kumar , Dept. of Computer Architecture, Universitat Politècnica de Catalunya, 08034 Barcelona, Spain
Alejandro Martinez , Intel Barcelona Research Center, Intel Labs, 08034, BSpain
Antonio Gonzalez , Dept. of Computer Architecture, Universitat Politècnica de Catalunya, 08034 Barcelona, Spain
pp. 459-460

Fine-grained parallel traversals of irregular data structures (Abstract)

Bin Ren , Dept. of Computer Science and Engineering, The Ohio State University, USA
Gagan Agrawal , Dept. of Computer Science and Engineering, The Ohio State University, USA
James R. Larus , Microsoft Research, USA
Todd Mytkowicz , Microsoft Research, USA
Tomi Poutanen , Microsoft Research, USA
Wolfram Schulte , Microsoft Research, USA
pp. 461-462

High-performance analysis of filtered semantic graphs (Abstract)

Aydin Buluc , Lawrence Berkeley National Laboratory, USA
Armando Fox , University of California at Berkeley, USA
John R. Gilbert , University of California at Santa Barbara, USA
Shoaib Kamil , University of California at Berkeley, USA
Adam Lugowski , University of California at Berkeley, USA
Leonid Oliker , Lawrence Berkeley National Laboratory, USA
Samuel Williams , Lawrence Berkeley National Laboratory, USA
pp. 463-464

Energy-efficient cache partitioning for future CMPs (Abstract)

Karthik T. Sundararajan , School of Informatics, University of Edinburgh, Scotland
Timothy M. Jones , Computer Laboratory, University of Cambridge, UK
Nigel P. Topham , School of Informatics, University of Edinburgh, Scotland
pp. 465-466

A low-overhead dynamic optimization framework for multicores (Abstract)

Christopher W. Fletcher , Massachusetts Institute of Technology; Cambridge, USA
Rachael Harding , Massachusetts Institute of Technology; Cambridge, USA
Omer Khan , University of Connecticut; Storrs, USA
Srinivas Devadas , Massachusetts Institute of Technology; Cambridge, USA
pp. 467-468

Making it practical and effective: Fast and precise May-Happen-in-Parallel analysis (Abstract)

Congming Chen , State Key Laboratory of Computer, Architecture, CAS, Beijing, China
Wei Huo , State Key Laboratory of Computer, Architecture, CAS, Beijing, China
Xiaobing Feng , State Key Laboratory of Computer, Architecture, CAS, Beijing, China
pp. 469-470

Mileage-based Contention Management in Transactional Memory (Abstract)

Woojin Choi , Information Sciences Institute / University of Southern California, Marina del Rey, 90242, USA
Lihang Zhao , Information Sciences Institute / University of Southern California, Marina del Rey, 90242, USA
Jeff Draper , Information Sciences Institute / University of Southern California, Marina del Rey, 90242, USA
pp. 471

System-level power-performance efficiency modeling for emergent GPU architectures (Abstract)

Shuaiwen Song , Virginia Tech, CS Department, KWII, Blacksburg, 24060, USA
Kirk W. Cameron , Virginia Tech, CS Department, KWII, Blacksburg, 24060, USA
pp. 473

Transactional event profiling in a best-effort hardware transactional memory system (Abstract)

Matthew Gaudet , Dept. of Computer Science, University of Alberta, Edmonton, Canada
Jose Nelson Amaral , Dept. of Computer Science, University of Alberta, Edmonton, Canada
pp. 475

Transparent runtime deadlock elimination (Abstract)

Hari. K. Pyla , Virginia Tech, Blacksburg, Virginia, United States
Srinidhi Varadarajan , Virginia Tech, Blacksburg, Virginia, United States
pp. 477

Design of a storage processing unit (Abstract)

Peng Li , University of Minnesota, Minneapolis, 55455, USA
Kevin Gomez , Seagate Technology, Shakopee, MN, 55379, USA
David J. Lilja , University of Minnesota, Minneapolis, 55455, USA
pp. 479

SkipCache: Miss-rate aware cache management (Abstract)

Raghavendra K , PACE Laboratory, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai-36, India
Tripti S Warrier , PACE Laboratory, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai-36, India
Madhu Mutyam , PACE Laboratory, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai-36, India
pp. 481

Hardware prefetchers for emerging parallel applications (Abstract)

Biswabandan Panda , Computer Architecture and Systems Lab, CSE Department, IIT Madras, India
Shankar Balachandran , Computer Architecture and Systems Lab, CSE Department, IIT Madras, India
pp. 485

Strategies based on green policies to the grid resource allocation (Abstract)

Fabio Coutinho , Federal University of Rio de Janeiro/UFRJ/PESC, PO. Box 68511, 21941-972, Rio de Janeiro/RJ, Brazil
Luis Alfredo V. de Carvalho , Federal University of Rio de Janeiro/UFRJ/PESC, PO. Box 68511, 21941-972, Rio de Janeiro/RJ, Brazil
pp. 487

Energy-efficient workload mapping in heterogeneous systems with multiple types of resources (Abstract)

Cong Liu , University of North Carolina at Chapel Hill, Dept. of Computer Science, USA
pp. 491

Phase-based scheduling and thread migration for heterogeneous multicore processors (Abstract)

Lina Sawalha , School of Electrical and Computer Engineering, The University of Oklahoma, USA
Ronald D. Barnes , School of Electrical and Computer Engineering, The University of Oklahoma, USA
pp. 493

Author index (PDF)

pp. 495-496
94 ms
(Ver 3.3 (11022016))