The Community for Technology Leaders
2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (2015)
Philadelphia, PA, USA
March 29, 2015 to March 31, 2015
ISBN: 978-1-4799-1957-4
TABLE OF CONTENTS

Front cover (PDF)

pp. c1

Copyright (PDF)

pp. ii

Table of contents (PDF)

pp. iii-v

Critical-path candidates: scalable performance modeling for MPI workloads (Abstract)

Jian Chen , Intel Corporation, Hillsboro, OR, USA
Russell M. Clapp , Intel Corporation, Hillsboro, OR, USA
pp. 1-10

DELPHI: a framework for RTL-based architecture design evaluation using DSENT models (Abstract)

Michael K. Papamichael , Carnegie Mellon University
Cagla Cakir , Carnegie Mellon University
Chen Suny Chia-Hsin , Massachusetts Institute of Technology
Owen Cheny , Massachusetts Institute of Technology
James C. Ho , Massachusetts Institute of Technology
Ken Mai , Carnegie Mellon University
Li-Shiuan Pehy , Massachusetts Institute of Technology
Vladimir Stojanovic , University of California, Berkeley
pp. 11-20

Micro-architecture independent analytical processor performance and power modeling (Abstract)

Sam Van den Steen , Department of Electronics and Information Systems, Ghent University, Belgium
Sander De Pestel , Department of Electronics and Information Systems, Ghent University, Belgium
Moncef Mechri , Department of Information Technology, Uppsala University, Sweden
Stijn Eyerman , Department of Electronics and Information Systems, Ghent University, Belgium
Trevor Carlson , Department of Information Technology, Uppsala University, Sweden
David Black-Schaffer , Department of Information Technology, Uppsala University, Sweden
Erik Hagersten , Department of Information Technology, Uppsala University, Sweden
Lieven Eeckhout , Department of Electronics and Information Systems, Ghent University, Belgium
pp. 32-41

Graph Processing Platforms at Scale: Practices and Experiences (Abstract)

Seung-Hwan Lim , Computational Sciences and Engineering Division, Oak Ridge National Laboratory
Sangkeun Lee , Computational Sciences and Engineering Division, Oak Ridge National Laboratory
Gautam Ganesh , Computational Sciences and Engineering Division, Oak Ridge National Laboratory
Tyler C. Brown , Computational Sciences and Engineering Division, Oak Ridge National Laboratory
Sreenivas R. Sukumar , Computational Sciences and Engineering Division, Oak Ridge National Laboratory
pp. 42-51

Graph-matching-based simulation-region selection for multiple binaries (Abstract)

Charles Yount , Intel Corporation
Harish Patil , Intel Corporation
Mohammad S. Islam , Univ. of Texas, San Antonio
, Univ. of Texas, Austin
pp. 52-61

A modeling framework for reuse distance-based estimation of cache performance (Abstract)

Xiaoyue Pan , Department of Information Technology, Uppsala University
Bengt Jonsson , Department of Information Technology, Uppsala University
pp. 62-71

Multi-program benchmark definition (Abstract)

Adam N. Jacobvitz , Department of Electrical and Computer Engineering Duke University, Durham, North Carolina, USA
Andrew D. Hilton , Department of Electrical and Computer Engineering Duke University, Durham, North Carolina, USA
Daniel J. Sorin , Department of Electrical and Computer Engineering Duke University, Durham, North Carolina, USA
pp. 72-82

Precise computer comparisons via statistical resampling methods (Abstract)

Bin Li , Louisiana State University
Shaoming Chen , Louisiana State University
Lu Peng , Louisiana State University
pp. 83-92

Pairminer: mining for paired functions in Kernel extensions (Abstract)

Hu-Qiu Liu , Tsinghua National Laboratory for Information Science and Technology Department of Computer Science and Technology, Tsinghua University, China
Jia-Ju Bai , Tsinghua National Laboratory for Information Science and Technology Department of Computer Science and Technology, Tsinghua University, China
Yu-Ping Wang , Tsinghua National Laboratory for Information Science and Technology Department of Computer Science and Technology, Tsinghua University, China
Zhe Bian , Tsinghua National Laboratory for Information Science and Technology Department of Computer Science and Technology, Tsinghua University, China
Shi-Min Hu , Tsinghua National Laboratory for Information Science and Technology Department of Computer Science and Technology, Tsinghua University, China
pp. 93-101

Self-monitoring overhead of the Linux perf_ event performance counter interface (Abstract)

Vincent M. Weaver , Electrical and Computer Engineering University of Maine
pp. 102-111

Hierarchical cycle accounting: a new method for application performance tuning (Abstract)

Andrzej Nowak , CERN openlab and EPFL, Switzerland
David Levinthal , Microsoft, WA, USA
Willy Zwaenepoel , EPFL, Switzerland
pp. 112-123

Revisiting symbiotic job scheduling (Abstract)

Stijn Eyerman , Ghent University, Belgium
Pierre Michaud , INRIA Rennes, France
Wouter Rogiest , Ghent University, Belgium
pp. 124-134

Micro-architecture independent branch behavior characterization (Abstract)

Sander De Pestel , Department of Electronics and Information Systems, Ghent University, Belgium
, Department of Electronics and Information Systems, Ghent University, Belgium
, Department of Electronics and Information Systems, Ghent University, Belgium
pp. 135-144

Non-volatile memory host controller interface performance analysis in high-performance I/O systems (Abstract)

Amro Awad , North Carolina State University
Brett Kettering , Los Alamos National Laboratory
Yan Solihin , North Carolina State University
pp. 145-154

Analyzing graphics processor unit (GPU) instruction set architectures (Abstract)

Kothiya Mayank , Department of Electrical and Computer Engineering, North Carolina State University
Hongwen Dai , Department of Electrical and Computer Engineering, North Carolina State University
Jizeng Wei , School of Computer Science and Technology, Tianjin University
Huiyang Zhou , Department of Electrical and Computer Engineering, North Carolina State University
pp. 155-156

ARACompiler: a prototyping flow and evaluation framework for accelerator-rich architectures (Abstract)

Yu-Ting Chen , Computer Science Department, University of California, Los Angeles, CA, USA
Jason Cong , Computer Science Department, University of California, Los Angeles, CA, USA
Bingjun Xiao , Computer Science Department, University of California, Los Angeles, CA, USA
pp. 157-158

Can RDMA benefit online data processing workloads on memcached and MySQL? (Abstract)

Dipti Shankar , Department of Computer Science and Engineering, The Ohio State University
Xiaoyi Lu , Department of Computer Science and Engineering, The Ohio State University
Jithin Jose , Department of Computer Science and Engineering, The Ohio State University
Md. Wasi-ur-Rahman , Department of Computer Science and Engineering, The Ohio State University
Nusrat Islam , Department of Computer Science and Engineering, The Ohio State University
Dhabaleswar K. Panda , Department of Computer Science and Engineering, The Ohio State University
pp. 159-160

Characterization and cross-platform analysis of high-throughput accelerators (Abstract)

Keitaro Oka , Kyushu University
Wenhao Jia , Princeton University
Margaret Martonosi , Princeton University
Koji Inoue , Kyushu University
pp. 161-162

Eliminating on-chip traffic waste: are we there yet? (Abstract)

Robert Smolinski , University of Illinois at Urbana-Champaign
Rakesh Komuravelli , University of Illinois at Urbana-Champaign
Hyojin Sung , University of Illinois at Urbana-Champaign
Sarita V. Adve , University of Illinois at Urbana-Champaign
pp. 163-164

Estimation-based profiling for code placement optimization in sensor network programs (Abstract)

Lipeng Wan , University of Tennessee, Knoxville, TN, US
Qing Cao , University of Tennessee, Knoxville, TN, US
Wenjun Zhou , University of Tennessee, Knoxville, TN, US
pp. 165-166

Factors affecting scalability of multithreaded Java applications on manycore systems (Abstract)

Junjie Qian , Department of Computer Science & Engineering, University of Nebraska-Lincoln
Du Li , School of Computer Science, Carnegie Mellon University
Witawas Srisa-an , Department of Computer Science & Engineering, University of Nebraska-Lincoln
Hong Jiang , Department of Computer Science & Engineering, University of Nebraska-Lincoln
Sharad Seth , Department of Computer Science & Engineering, University of Nebraska-Lincoln
pp. 167-168

On latency in GPU throughput microarchitectures (Abstract)

Michael Andersch , Technische Universitat Berlin
Jan Lucas , Technische Universitat Berlin
Mauricio A. LvLvarez-Mesa , Technische Universitat Berlin
Ben Juurlink , Technische Universitat Berlin
pp. 169-170

An updated performance comparison of virtual machines and Linux containers (Abstract)

Wes Felter , IBM Research, Austin, TX
Alexandre Ferreira , IBM Research, Austin, TX
Ram Rajamony , IBM Research, Austin, TX
Juan Rubio , IBM Research, Austin, TX
pp. 171-172

Nyami: a synthesizable GPU architectural model for general-purpose and graphics-specific workloads (Abstract)

Jeff Bush , San Jose, California
Philip Dexter , Dept. of Computer Science
Timothy N. Miller , Dept. of Computer Science
, Dept. of Electrical & Computer Engineering Binghamton University
pp. 173-182

DRAW: investigating benefits of adaptive fetch group size on GPU (Abstract)

Myung Kuk Yoon , School of Electrical and Electronic Engineering Yonsei University, Seoul, Republic of Korea
Seung Hun Kim , School of Electrical and Electronic Engineering Yonsei University, Seoul, Republic of Korea
Won Woo Ro , School of Electrical and Electronic Engineering Yonsei University, Seoul, Republic of Korea
Deokho , School of Electrical and Electronic Engineering Yonsei University, Seoul, Republic of Korea
, School of Electrical and Electronic Engineering Yonsei University, Seoul, Republic of Korea
, School of Electrical and Electronic Engineering Yonsei University, Seoul, Republic of Korea
pp. 183-192

DNOC: an accurate and fast virtual channel and deflection routing network-on-chip simulator (Abstract)

Gadi Oxman , School of Electrical Engineering Tel Aviv University
Shlomo Weiss , School of Electrical Engineering Tel Aviv University
pp. 193-202

Performance evaluation of a DySER FPGA prototype system spanning the compiler, microarchitecture, and hardware implementation (Abstract)

Chen-Han Hoy , Qualcomm, University of Wisconsin-Madison
Venkatraman Govindarajuz , Oracle, University of Wisconsin-Madison
Tony Nowatzki , University of Wisconsin-Madison
Ranjini Nagaraju , ARM, University of Wisconsin-Madison
Zachary Marzecy , Qualcomm, University of Wisconsin-Madison
Preeti Agarwal , Intel, University of Wisconsin-Madison
Chris Frericks , Samsung, University of Wisconsin-Madison
Ryan Cofell , University of Wisconsin-Madison
Karthikeyan Sankaralingam , University of Wisconsin-Madison
pp. 203-214

Mosaic: cross-platform user-interaction record and replay for the fragmented android ecosystem (Abstract)

Matthew Halpern , Dept. of Electrical and Computer Engineering The University of Texas at Austin
Yuhao Zhu , Dept. of Electrical and Computer Engineering The University of Texas at Austin
Ramesh Peri , Software and Services Group Intel Corporation
Vijay Janapa Reddi , Dept. of Electrical and Computer Engineering The University of Texas at Austin
pp. 215-224

A study of mobile device utilization (Abstract)

Cao Gao , University of Michigan, Ann Arbor
Anthony Gutierrez , University of Michigan, Ann Arbor
Madhav Rajan , Arizona State University
Ronald G. Dreslinski , University of Michigan, Ann Arbor
Trevor Mudge , University of Michigan, Ann Arbor
Carole-Jean Wu , Arizona State University
pp. 225-234

A full-system approach to analyze the impact of next-generation mobile flash storage (Abstract)

Rene De Jong , Research & Development ARM ltd. Cambridge, United Kingdom
, Research & Development ARM ltd. Cambridge, United Kingdom
pp. 235-244

QTrace: a framework for customizable full system instrumentation (Abstract)

Xin Tong , University of Toronto
Andreas Moshovos , University of Toronto
pp. 245-255

Pydgin: generating fast instruction set simulators from simple architecture descriptions with meta-tracing JIT compilers (Abstract)

Derek Lockhart , School of Electrical and Computer Engineering, Cornell University, Ithaca, NY
Berkin Ilbeyi , School of Electrical and Computer Engineering, Cornell University, Ithaca, NY
Christopher Batten , School of Electrical and Computer Engineering, Cornell University, Ithaca, NY
pp. 256-267

Reciprocal abstraction for computer architecture co-simulation (Abstract)

Michael Moeng , Computer Science Department University of Pittsburgh
Alex Jones , Electrical and Computer Engineering Department University of Pittsburgh
Rami Melhem , Computer Science Department University of Pittsburgh
pp. 268-277

Synchrotrace: synchronization-aware architecture-agnostic traces for light-weight multicore simulation (Abstract)

Siddharth Nilakantan , Department of Electrical and Computer Engineering Drexel University, Philadelphia, PA USA
Karthik Sangaiah , Department of Electrical and Computer Engineering Drexel University, Philadelphia, PA USA
Ankit More , Department of Electrical and Computer Engineering Drexel University, Philadelphia, PA USA
Giordano Salvadory , Department of Computer and Information Science University of Pennsylvania, Philadelphia, PA USA
Baris Taskin , Department of Electrical and Computer Engineering Drexel University, Philadelphia, PA USA
Mark Hempstead , Department of Electrical and Computer Engineering Drexel University, Philadelphia, PA USA
pp. 278-287

Performance and energy evaluation of data prefetching on intel Xeon Phi (Abstract)

Diana Guttman , Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA
Mahmut Taylan Kandemir , Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA
, Intel Corporation, Santa Clara, CA, USA
, Intel Corporation, Santa Clara, CA, USA
pp. 288-297

Emulating cache organizations on real hardware using performance cloning (Abstract)

Yipeng Wang , Department of Electrical and Computer Engineering North Carolina State University
Yan Solihin , Department of Electrical and Computer Engineering North Carolina State University
pp. 298-307

Prometheus: scalable and accurate emulation of task-based applications on many-core systems (Abstract)

Gokcen Kestor , High Performance Computing Pacific Northwest National Laboratory Richland, WA
Roberto Gioiosa , High Performance Computing Pacific Northwest National Laboratory Richland, WA
Daniel Chavarrıa-Miranda , High Performance Computing Pacific Northwest National Laboratory Richland, WA
pp. 308-317

Analyzing communication models for distributed thread-collaborative processors in terms of energy and time (Abstract)

Benjamin Klenk , University of Heidelberg Institute of Computer Engineering Heidelberg, Germany
Lena Oden , Fraunhofer Institute for Industrial Mathematics Competence Center High Performance Computing Kaiserslautern, Germany
Holger Froning , University of Heidelberg Institute of Computer Engineering Heidelberg, Germany
pp. 318-327

Characterization and analysis of a web search benchmark (Abstract)

Zacharias Hadjilambrou , Computer Science, University of Cyprus, Nicosia, Cyprus
Marios Kleanthous , Computer Science, University of Cyprus, Nicosia, Cyprus
Yanos Sazeides , Computer Science, University of Cyprus, Nicosia, Cyprus
pp. 328-337

Author index (PDF)

pp. 339-340
904 ms
(Ver 3.3 (11022016))