The Community for Technology Leaders
2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (2017)
Santa Rosa, CA
April 24, 2017 to April 25, 2017
ISBN: 978-1-5386-3891-0
TABLE OF CONTENTS

Machine learning for performance and power modeling/prediction (PDF)

Lizy Kurian John , Department of Electrical and Computer Engineering, The University of Texas at Austin
pp. 1-2

Sharing the instruction cache among lean cores on an asymmetric CMP for HPC applications (PDF)

Ugljesa Milic , Barcelona Supercomputing Center
Alejandro Rico , ARM Inc.
Paul Carpenter , Barcelona Supercomputing Center
Alex Ramirez , Google
pp. 3-12

PMAL: Enabling lightweight adaptation of legacy file systems on persistent memory systems (PDF)

Hyunsub Song , School of Electrical and Computer Engineering, UNIST (Ulsan National Institute of Science and Technology)
Young Je Moon , School of Electrical and Computer Engineering, UNIST (Ulsan National Institute of Science and Technology)
Se Kwon Lee , School of Electrical and Computer Engineering, UNIST (Ulsan National Institute of Science and Technology)
Sam H. Noh , School of Electrical and Computer Engineering, UNIST (Ulsan National Institute of Science and Technology)
pp. 33-42

Chai: Collaborative heterogeneous applications for integrated-architectures (Abstract)

Juan Gomez-Luna , Universidad de Córdoba
Izzat El Hajj , University of Illinois at Urbana-Champaign
Li-Wen Chang , University of Illinois at Urbana-Champaign
Victor Garcia-Flores , Universitat Politècnica de Catalunya
Simon Garcia de Gonzalo , University of Illinois at Urbana-Champaign
Thomas B. Jablin , University of Illinois at Urbana-Champaign
Antonio J. Pena , Barcelona Supercomputing Center
Wen-mei Hwu , University of Illinois at Urbana-Champaign
pp. 43-54

Performance analysis of CNN frameworks for GPUs (PDF)

Heehoon Kim , Department of Computer Science and Engineering, Seoul National University, Korea
Hyoungwook Nam , College of Liberal Studies, Seoul National University, Korea
Wookeun Jung , Department of Computer Science and Engineering, Seoul National University, Korea
Jaejin Lee , Department of Computer Science and Engineering, Seoul National University, Korea
pp. 55-64

GaaS workload characterization under NUMA architecture for virtualized GPU (PDF)

Huixiang Chen , IDEAL Lab, University of Florida, Gainesville, Florida
Meng Wang , IDEAL Lab, University of Florida, Gainesville, Florida
Yang Hu , IDEAL Lab, University of Florida, Gainesville, Florida
Mingcong Song , IDEAL Lab, University of Florida, Gainesville, Florida
Tao Li , IDEAL Lab, University of Florida, Gainesville, Florida
pp. 65-76

Fast IPC estimation for performance projections using proxy suites and decision trees (PDF)

Kanishka Lahiri , Advanced Micro Devices, Bangalore
Subhash Kunnoth , Advanced Micro Devices, Bangalore
pp. 77-86

Accurate address streams for LLC and beyond (SLAB): A methodology to enable system exploration (PDF)

Reena Panda , The University of Texas at Austin
Xinnian Zheng , The University of Texas at Austin
Lizy Kurian John , The University of Texas at Austin
pp. 87-96

Clone morphing: Creating new workload behavior from existing applications (PDF)

Yipeng Wang , Department of Electrical and Computer Engineering, North Carolina State University
Amro Awad , Department of Electrical and Computer Engineering, North Carolina State University
Yan Solihin , Department of Electrical and Computer Engineering, North Carolina State University
pp. 97-108

Service capacity measurement by redlining with live production traffic (PDF)

Susie Xia , 2029 Stierlin Ct, Mountain View, CA 94043, USA
Zhenyun Zhuang , 2029 Stierlin Ct, Mountain View, CA 94043, USA
Anant Rao , 2029 Stierlin Ct, Mountain View, CA 94043, USA
Haricharan Ramachandra , 2029 Stierlin Ct, Mountain View, CA 94043, USA
Yi Feng , 2029 Stierlin Ct, Mountain View, CA 94043, USA
Ramya Pasumarti , 2029 Stierlin Ct, Mountain View, CA 94043, USA
pp. 123-124

Predicting memory page stability and its application to memory deduplication and live migration (PDF)

Karim Elghamrawy , Computer Science Department, University of California, Santa Barbara, Santa Barbara, CA
Diana Franklin , Computer Science Department, University of Chicago, Chicago, IL
Frederic T. Chong , Computer Science Department, University of Chicago, Chicago, IL
pp. 125-126

Analyzing OpenCL 2.0 workloads using a heterogeneous CPU-GPU simulator (PDF)

Min-Yih Hsu , National Tsing Hua University
Li Wang , National Taiwan University
Shao-Chung Wang , National Tsing Hua University
Kun-Chih Chen , National Sun Yat-Sen University
Po-Han Wang , National Taiwan University
Hsiang-Yun Cheng , Academia Sinica
Yi-Chung Lee , National Taiwan University
Sheng-Jie Shu , National Chiao Tung University
Chun-Chieh Yang , National Tsing Hua University
Ren-Wei Tsai , National Chiao Tung University
Li-Chen Kan , National Tsing Hua University
Chao-Lin Lee , National Tsing Hua University
Tzu-Chieh Yu , National Taiwan University
Rih-Ding Peng , National Taiwan University
Chia-Lin Yang , National Taiwan University
Yuan-Shin Hwang , National Taiwan University of Science and Technology
Jenq-Kuen Lee , National Tsing Hua University
Shiao-Li Tsao , National Chiao Tung University
Ming Ouhyoung , National Taiwan University
pp. 127-128

Microarchitecture level reliability comparison of modern GPU designs: First findings (PDF)

Alessandro Vallero , Politecnico di Torino, Italy
Stefano Di Carlo , Politecnico di Torino, Italy
Sotiris Tselonis , University of Athens, Greece
Dimitris Gizopoulos , University of Athens, Greece
pp. 129-130

DARTS: Performance-counter driven sampling using binary translators (PDF)

Rajesh Kumar , Advanced Micro Devices, Bangalore
Suchita Pati , Advanced Micro Devices, Bangalore
Kanishka Lahiri , Advanced Micro Devices, Bangalore
pp. 131-132

Docker characterization on high performance SSDs (PDF)

Qiumin Xu , University of Southern California
Manu Awasthi , IIT-Gandhinagar
Janki Bhimani , Northeastern University
Jingpei Yang , Samsung
Murali Annavaram , University of Southern California
pp. 133-134

A taxonomy of out-of-order instruction commit (PDF)

Mehdi Alipour , Department of Information Technology, Uppsala University, Sweden
Trevor E. Carlson , Department of Information Technology, Uppsala University, Sweden
Stefanos Kaxiras , Department of Information Technology, Uppsala University, Sweden
pp. 135-136

PTAT: An efficient and precise tool for collecting detailed TLB miss traces (PDF)

Jiutian Zhang , State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences
Yuhang Liu , State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences
Xiaojing Zhu , State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences
Yuan Ruan , State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences
Mingyu Chen , State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences
pp. 137-138

Proxy benchmarks for emerging big-data workloads (PDF)

Reena Panda , The University of Texas at Austin
Lizy Kurian John , The University of Texas at Austin
pp. 139-140

MaxSim: A simulation platform for managed applications (PDF)

Andrey Rodchenko , School of Computer Science, The University of Manchester, UK
Christos Kotselidis , School of Computer Science, The University of Manchester, UK
Andy Nisbet , School of Computer Science, The University of Manchester, UK
Antoniu Pop , School of Computer Science, The University of Manchester, UK
Mikel Lujan , School of Computer Science, The University of Manchester, UK
pp. 141-152

dist-gem5: Distributed simulation of computer clusters (PDF)

Alian Mohammad , University of Illinois, Urbana-Champaign
Umur Darbaz , University of Illinois, Urbana-Champaign
Gabor Dozsa , ARM Ltd., Cambridge, UK
Stephan Diestelhorst , ARM Ltd., Cambridge, UK
Daehoon Kim , University of Illinois, Urbana-Champaign
Nam Sung Kim , University of Illinois, Urbana-Champaign
pp. 153-162

Prefetching for cloud workloads: An analysis based on address patterns (PDF)

Jiajun Wang , The University of Texas at Austin
Reena Panda , The University of Texas at Austin
Lizy Kurian John , The University of Texas at Austin
pp. 163-172

Toolbox for exploration of energy-efficient event processors for human-computer interaction (PDF)

Tayyar Rzayev , Computer Systems Laboratory, Cornell University, Ithaca, NY, USA
David H. Albonesi , Computer Systems Laboratory, Cornell University, Ithaca, NY, USA
Francois Guimbretiere , 2Information Science Department, Cornell University, Ithaca, NY, USA
Rajit Manohar , 3Computer Systems Laboratory, Yale University, New Haven, CT, USA
Jaeyeon Kihm , 2Information Science Department, Cornell University, Ithaca, NY, USA
pp. 173-184

HW/SW co-designed processors: Challenges, design choices and a simulation infrastructure for evaluation (PDF)

Rakesh Kumar , University of Edinburgh, UK
Jose Cano , University of Edinburgh, UK
Demos Pavlouz , 11pets
Enric Gibertx , Pharmacelera
Antonio Gonzalez , Universitat Politècnica de Catalunya, Spain
pp. 185-194

OpenSMART: Single-cycle multi-hop NoC generator in BSV and Chisel (PDF)

Hyoukjun Kwon , School of Computer Science, Georgia Institute of Technology
Tushar Krishna , School of Electrical and Computer Engineering, Georgia Institute of Technology
pp. 195-204

StressRight: Finding the right stress for accurate in-development system evaluation (PDF)

Jaewon Lee , Department of Computer Science and Engineering, POSTECH
Hanhwi Jang , Department of Computer Science and Engineering, POSTECH
Jae-eon Jo , Department of Computer Science and Engineering, POSTECH
Gyu-hyeon Lee , Electrical and Computer Engineering, Seoul National University
Jangwoo Kim , Electrical and Computer Engineering, Seoul National University
pp. 205-216

SimBench: A portable benchmarking methodology for full-system simulators (PDF)

Harry Wagstaff , University of Edinburgh
Bruno Bodin , University of Edinburgh
Tom Spink , University of Edinburgh
Bjorn Franke , University of Edinburgh
pp. 217-226

Treelogy: A benchmark suite for tree traversals (PDF)

Nikhil Hegde , School of Electrical and Computer Engineering, Purdue University, West Lafayette
Jianqiao Liu , School of Electrical and Computer Engineering, Purdue University, West Lafayette
Kirshanthan Sundararajah , School of Electrical and Computer Engineering, Purdue University, West Lafayette
Milind Kulkarni , School of Electrical and Computer Engineering, Purdue University, West Lafayette
pp. 227-238

Evaluating and mitigating bandwidth bottlenecks across the memory hierarchy in GPUs (PDF)

Saumay Dublish , University of Edinburgh
Vijay Nagarajan , University of Edinburgh
Nigel Topham , University of Edinburgh
pp. 239-248

Exploring GPU performance, power and energy-efficiency bounds with Cache-aware Roofline Modeling (PDF)

Andre Lopes , INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal
Frederico Pratas , Imagination Technologies Ltd., London, United Kingdom
Leonel Sousa , INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal
Aleksandar Ilic , INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal
pp. 259-268

Multi2Sim Kepler: A detailed architectural GPU simulator (PDF)

Xun Gong , Dept. of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115
Rafael Ubal , Dept. of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115
David Kaeli , Dept. of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115
pp. 269-278
92 ms
(Ver 3.3 (11022016))