The Community for Technology Leaders
2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA) (2013)
Shenzhen, China China
Feb. 23, 2013 to Feb. 27, 2013
ISSN: 1530-0897
ISBN: 978-1-4673-5585-8
TABLE OF CONTENTS

Power struggles: Revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures (Abstract)

E. Blem , Univ. of Wisconsin - Madison, Madison, WI, USA
J. Menon , Univ. of Wisconsin - Madison, Madison, WI, USA
K. Sankaralingam , Univ. of Wisconsin - Madison, Madison, WI, USA
pp. 1-12

High-performance and energy-efficient mobile web browsing on big/little systems (Abstract)

Yuhao Zhu , Dept. of Electr. & Comput. Eng., Univ. of Texas at Austin, Austin, TX, USA
V. J. Reddi , Dept. of Electr. & Comput. Eng., Univ. of Texas at Austin, Austin, TX, USA
pp. 13-24

Skinflint DRAM system: Minimizing DRAM chip writes for low power (Abstract)

Yebin Lee , Dept. of Comput. Sci., Korea Adv. Inst. of Sci. & Technol. (KAIST), Daejeon, South Korea
Soontae Kim , Dept. of Comput. Sci., Korea Adv. Inst. of Sci. & Technol. (KAIST), Daejeon, South Korea
Seokin Hong , Dept. of Comput. Sci., Korea Adv. Inst. of Sci. & Technol. (KAIST), Daejeon, South Korea
Jongmin Lee , Dept. of Comput. Sci., Korea Adv. Inst. of Sci. & Technol. (KAIST), Daejeon, South Korea
pp. 25-34

Enabling distributed generation powered sustainable high-performance data center (Abstract)

Chao Li , Dept. of Electr. & Comput. Eng., Univ. of Florida, Gainesville, FL, USA
Ruijin Zhou , Dept. of Electr. & Comput. Eng., Univ. of Florida, Gainesville, FL, USA
Tao Li , Dept. of Electr. & Comput. Eng., Univ. of Florida, Gainesville, FL, USA
pp. 35-46

A group-commit mechanism for ROB-based processors implementing the X86 ISA (Abstract)

F. Afram , Dept. of Comput. Sci., State Univ. of New York at Binghamton, Binghamton, NY, USA
Hui Zeng , Dept. of Comput. Sci., State Univ. of New York at Binghamton, Binghamton, NY, USA
K. Ghose , Dept. of Comput. Sci., State Univ. of New York at Binghamton, Binghamton, NY, USA
pp. 47-58

Store-Load-Branch (SLB) predictor: A compiler assisted branch prediction for data dependent branches (Abstract)

M. U. Farooq , Dept. of Electr. & Comput. Eng., Univ. of Texas at Austin, Austin, TX, USA
K. Khubaib , Dept. of Electr. & Comput. Eng., Univ. of Texas at Austin, Austin, TX, USA
L. K. John , Dept. of Electr. & Comput. Eng., Univ. of Texas at Austin, Austin, TX, USA
pp. 59-70

Two level bulk preload branch prediction (Abstract)

J. Bonanno , Syst. & Technol. Group, IBM, Yorktown Heights, NY, USA
A. Collura , Syst. & Technol. Group, IBM, Yorktown Heights, NY, USA
D. Lipetz , Syst. & Technol. Group, IBM, Yorktown Heights, NY, USA
U. Mayer , Syst. & Technol. Group, IBM, Yorktown Heights, NY, USA
B. Prasky , Syst. & Technol. Group, IBM, Yorktown Heights, NY, USA
A. Saporito , Syst. & Technol. Group, IBM, Yorktown Heights, NY, USA
pp. 71-82

RECAP: A region-based cure for the common cold (cache) (Abstract)

J. Zebchuk , Dept. of Electr. & Comput. Eng., Univ. of Toronto, Toronto, ON, Canada
H. W. Cain , T.J. Watson Res. Center, IBM, Yorktown Heights, NY, USA
Xin Tong , Dept. of Electr. & Comput. Eng., Univ. of Toronto, Toronto, ON, Canada
V. Srinivasan , T.J. Watson Res. Center, IBM, Yorktown Heights, NY, USA
A. Moshovos , Dept. of Electr. & Comput. Eng., Univ. of Toronto, Toronto, ON, Canada
pp. 83-94

Application-to-core mapping policies to reduce memory system interference in multi-core systems (Abstract)

R. Das , Univ. of Michigan, Ann Arbor, MI, USA
R. Ausavarungnirun , Carnegie Mellon Univ., Pittsburgh, PA, USA
O. Mutlu , Carnegie Mellon Univ., Pittsburgh, PA, USA
A. Kumar , Intel Labs., Hillsboro, OR, USA
M. Azimi , Intel Labs., Hillsboro, OR, USA
pp. 107-118

ECM: Effective Capacity Maximizer for high-performance compressed caching (Abstract)

Seungcheol Baek , Georgia Inst. of Technol., Atlanta, GA, USA
Hyung Gyu Lee , Daegu Univ., Gyeongsan, South Korea
C. Nicopoulos , Univ. of Cyprus, Nicosia, Cyprus
Junghee Lee , Georgia Inst. of Technol., Atlanta, GA, USA
Jongman Kim , Georgia Inst. of Technol., Atlanta, GA, USA
pp. 131-142

Modeling performance variation due to cache sharing (Abstract)

A. Sandberg , Dept. of Inf. Technol., Uppsala Univ., Uppsala, Sweden
A. Sembrant , Dept. of Inf. Technol., Uppsala Univ., Uppsala, Sweden
E. Hagersten , Dept. of Inf. Technol., Uppsala Univ., Uppsala, Sweden
D. Black-Schaffer , Dept. of Inf. Technol., Uppsala Univ., Uppsala, Sweden
pp. 155-166

Cost effective data center servers (Abstract)

Rui Hou , Inst. of Comput. Technol., Beijing, China
Tao Jiang , Inst. of Comput. Technol., Beijing, China
Liuhang Zhang , Inst. of Comput. Technol., Beijing, China
Pengfei Qi , Inst. of Comput. Technol., Beijing, China
Jianbo Dong , Inst. of Comput. Technol., Beijing, China
Haibin Wang , Shannon Lab., Huawei Technol. Co., Ltd., Xi'an, China
Xiongli Gu , Shannon Lab., Huawei Technol. Co., Ltd., Xi'an, China
Shujie Zhang , Shannon Lab., Huawei Technol. Co., Ltd., Xi'an, China
pp. 179-187

Optimizing Google's warehouse scale computers: The NUMA experience (Abstract)

Lingjia Tang , Univ. of California, San Diego, La Jolla, CA, USA
J. Mars , Univ. of California, San Diego, La Jolla, CA, USA
pp. 188-197

Runnemede: An architecture for Ubiquitous High-Performance Computing (Abstract)

N. P. Carter , Intel Labs., Hillsboro, OH, USA
A. Agrawal , Intel Labs., Hillsboro, OH, USA
S. Borkar , Intel Labs., Hillsboro, OH, USA
R. Cledat , Intel Labs., Hillsboro, OH, USA
H. David , Intel Labs., Hillsboro, OH, USA
D. Dunning , Intel Labs., Hillsboro, OH, USA
J. Fryman , Intel Labs., Hillsboro, OH, USA
I. Ganev , Intel Labs., Hillsboro, OH, USA
R. A. Golliver , Intel Labs., Hillsboro, OH, USA
R. Knauerhase , Intel Labs., Hillsboro, OH, USA
R. Lethin , Reservoir Labs., New York, NY, USA
B. Meister , Reservoir Labs., New York, NY, USA
A. K. Mishra , Intel Labs., Hillsboro, OH, USA
W. R. Pinfold , Intel Labs., Hillsboro, OH, USA
J. Teller , Intel Labs., Hillsboro, OH, USA
J. Torrellas , Univ. of Illinois at Urbana-Champaign, Champaign, IL, USA
N. Vasilache , Reservoir Labs., New York, NY, USA
G. Venkatesh , Intel Labs., Hillsboro, OH, USA
J. Xu , Intel Labs., Hillsboro, OH, USA
pp. 198-209

Exploring high-performance and energy proportional interface for phase change memory systems (Abstract)

Zhongqi Li , Dept. of Electr. & Comput. Eng., Univ. of Florida, Gainesville, FL, USA
Ruijin Zhou , Dept. of Electr. & Comput. Eng., Univ. of Florida, Gainesville, FL, USA
Tao Li , Dept. of Electr. & Comput. Eng., Univ. of Florida, Gainesville, FL, USA
pp. 210-221

Architecture support for guest-transparent VM protection from untrusted hypervisor and physical attacks (Abstract)

Yubin Xia , Inst. of Parallel & Distrib. Syst., Shanghai Jiao Tong Univ., Shanghai, China
Yutao Liu , Inst. of Parallel & Distrib. Syst., Shanghai Jiao Tong Univ., Shanghai, China
Haibo Chen , Inst. of Parallel & Distrib. Syst., Shanghai Jiao Tong Univ., Shanghai, China
pp. 246-257

SCRAP: Architecture for signature-based protection from Code Reuse Attacks (Abstract)

M. Kayaalp , Comput. Sci. Dept., Binghamton Univ., Binghamton, NY, USA
T. Schmitt , Comput. Sci. Dept., Binghamton Univ., Binghamton, NY, USA
J. Nomani , Comput. Sci. Dept., Binghamton Univ., Binghamton, NY, USA
D. Ponomarev , Comput. Sci. Dept., Binghamton Univ., Binghamton, NY, USA
N. Abu-Ghazaleh , Comput. Sci. Dept., Binghamton Univ., Binghamton, NY, USA
pp. 258-269

Adaptive Reliability Chipkill Correct (ARCC) (Abstract)

Xun Jian , Univ. of Illinois at Urbana Champaign, Champaign, IL, USA
R. Kumar , Univ. of Illinois at Urbana Champaign, Champaign, IL, USA
pp. 270-281

Accelerating write by exploiting PCM asymmetries (Abstract)

Jianhui Yue , Electr. & Comput. Eng., Univ. of Maine, Orono, ME, USA
Yifeng Zhu , Electr. & Comput. Eng., Univ. of Maine, Orono, ME, USA
pp. 282-293

Sonic Millip3De: A massively parallel 3D-stacked accelerator for 3D ultrasound (Abstract)

R. Sampson , Dept. of EECS, Univ. of Michigan, Ann Arbor, MI, USA
Ming Yang , Sch. of ECEE, Arizona State Univ., Tempe, AZ, USA
Siyuan Wei , Sch. of ECEE, Arizona State Univ., Tempe, AZ, USA
C. Chakrabarti , Sch. of ECEE, Arizona State Univ., Tempe, AZ, USA
T. F. Wenisch , Dept. of EECS, Univ. of Michigan, Ann Arbor, MI, USA
pp. 318-329

Power-efficient computing for compute-intensive GPGPU applications (Abstract)

S. Z. Gilani , Univ. of Wisconsin-Madison, Madison, WI, USA
Nam Sung Kim , Univ. of Wisconsin-Madison, Madison, WI, USA
M. J. Schulte , Adv. Micro Devices, TX, USA
pp. 330-341

Power-performance co-optimization of throughput core architecture using resistive memory (Abstract)

N. Goswami , Dept. of Electr. & Comput. Eng., Univ. of Florida, Gainesville, FL, USA
Bingyi Cao , Dept. of Electr. & Comput. Eng., Univ. of Florida, Gainesville, FL, USA
Tao Li , Dept. of Electr. & Comput. Eng., Univ. of Florida, Gainesville, FL, USA
pp. 342-353

Breaking the on-chip latency barrier using SMART (Abstract)

T. Krishna , Comput. Sci. & Artificial Intell. Lab. (CSAIL), Massachusetts Inst. of Technol., Cambridge, MA, USA
Chia-Hsin Owen Chen , Comput. Sci. & Artificial Intell. Lab. (CSAIL), Massachusetts Inst. of Technol., Cambridge, MA, USA
Woo Cheol Kwon , Comput. Sci. & Artificial Intell. Lab. (CSAIL), Massachusetts Inst. of Technol., Cambridge, MA, USA
Li-Shiuan Peh , Comput. Sci. & Artificial Intell. Lab. (CSAIL), Massachusetts Inst. of Technol., Cambridge, MA, USA
pp. 378-389

TS-Router: On maximizing the Quality-of-Allocation in the On-Chip Network (Abstract)

Yuan-Ying Chang , Dept. of Comput. Sci., Nat. Tsing Hua Univ., Hsinchu, Taiwan
Yoshi Shih-Chieh Huang , Dept. of Comput. Sci., Nat. Tsing Hua Univ., Hsinchu, Taiwan
M. Poremba , Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
V. Narayanan , Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
Yuan Xie , Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
C. King , Dept. of Comput. Sci., Nat. Tsing Hua Univ., Hsinchu, Taiwan
pp. 390-399

Refrint: Intelligent refresh to minimize power in on-chip multiprocessor cache hierarchies (Abstract)

A. Agrawal , Univ. of Illinois at Urbana-Champaign, Champaign, IL, USA
P. Jain , Univ. of Illinois at Urbana-Champaign, Champaign, IL, USA
A. Ansari , Univ. of Illinois at Urbana-Champaign, Champaign, IL, USA
J. Torrellas , Univ. of Illinois at Urbana-Champaign, Champaign, IL, USA
pp. 400-411

Warped register file: A power efficient register file for GPGPUs (Abstract)

M. Abdel-Majeed , Electr. Eng. Dept., Univ. of Southern California, Los Angeles, CA, USA
M. Annavaram , Electr. Eng. Dept., Univ. of Southern California, Los Angeles, CA, USA
pp. 412-423

Disintegrated control for energy-efficient and heterogeneous memory systems (Abstract)

Tae Jun Ham , Duke Univ., Durham, NC, USA
B. K. Chelepalli , Duke Univ., Durham, NC, USA
Neng Xue , Duke Univ., Durham, NC, USA
B. C. Lee , Duke Univ., Durham, NC, USA
pp. 424-435

ESESC: A fast multicore simulator using Time-Based Sampling (Abstract)

E. K. Ardestani , Dept. of Comput. Eng., Univ. of California Santa Cruz, Santa Cruz, CA, USA
J. Renau , Dept. of Comput. Eng., Univ. of California Santa Cruz, Santa Cruz, CA, USA
pp. 448-459

How to implement effective prediction and forwarding for fusable dynamic multicore architectures (Abstract)

B. Robatmili , Qualcomm Res. Silicon Valley, CA, USA
Dong Li , Univ. of Texas at Austin, Austin, TX, USA
H. Esmaeilzadeh , Univ. of Washington, Seattle, WA, USA
S. W. Keckler , Univ. of Texas at Austin, Austin, TX, USA
pp. 460-471

Bridging the semantic gap: Emulating biological neuronal behaviors with simple digital neurons (Abstract)

A. Nere , Univ. of Wisconsin-Madison, Madison, WI, USA
A. Hashmi , Univ. of Wisconsin-Madison, Madison, WI, USA
M. Lipasti , Univ. of Wisconsin-Madison, Madison, WI, USA
G. Tononi , Univ. of Wisconsin-Madison, Madison, WI, USA
pp. 472-483

Layout-conscious random topologies for HPC off-chip interconnects (Abstract)

M. Koibuchi , Nat. Inst. of Inf., Tokyo, Japan
I. Fujiwara , Nat. Inst. of Inf., Tokyo, Japan
H. Matsutani , Keio Univ., Yokohama, Japan
H. Casanova , Univ. of Hawai'i at Manoa, Honolulu, HI, USA
pp. 484-495

Scaling towards kilo-core processors with asymmetric high-radix topologies (Abstract)

N. Abeyratne , Adv. Comput. Archit. Lab., Univ. of Michigan, Ann Arbor, MI, USA
R. Das , Adv. Comput. Archit. Lab., Univ. of Michigan, Ann Arbor, MI, USA
Qingkun Li , Adv. Comput. Archit. Lab., Univ. of Michigan, Ann Arbor, MI, USA
K. Sewell , Adv. Comput. Archit. Lab., Univ. of Michigan, Ann Arbor, MI, USA
B. Giridhar , Adv. Comput. Archit. Lab., Univ. of Michigan, Ann Arbor, MI, USA
R. G. Dreslinski , Adv. Comput. Archit. Lab., Univ. of Michigan, Ann Arbor, MI, USA
D. Blaauw , Adv. Comput. Archit. Lab., Univ. of Michigan, Ann Arbor, MI, USA
T. Mudge , Adv. Comput. Archit. Lab., Univ. of Michigan, Ann Arbor, MI, USA
pp. 496-507

Energy-efficient interconnect via Router Parking (Abstract)

A. Samih , Intel Archit. Group, Intel Corp., Austin, TX, USA
Ren Wang , Intel Labs., Hillsboro, OR, USA
A. Krishna , CPU Res., Qualcomm Inc., Raleigh, NC, USA
C. Maciocco , Intel Labs., Hillsboro, OR, USA
C. Tai , Intel Labs., Hillsboro, OR, USA
Y. Solihin , North Carolina State Univ., Raleigh, NC, USA
pp. 508-519

In-network traffic regulation for Transactional Memory (Abstract)

Lihang Zhao , Inf. Sci. Inst., Univ. of Southern California, Los Angeles, CA, USA
Woojin Choi , Inf. Sci. Inst., Univ. of Southern California, Los Angeles, CA, USA
Lizhong Chen , Ming Hsieh Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
J. Draper , Inf. Sci. Inst., Univ. of Southern California, Los Angeles, CA, USA
pp. 520-531

Macho: A failure model-oriented adaptive cache architecture to enable near-threshold voltage scaling (Abstract)

T. Mahmood , Dept. of Inf. & Commun. Eng, Korea Adv. Inst. of Sci. & Technol., Daejeon, South Korea
Soontae Kim , Dept. of Inf. & Commun. Eng, Korea Adv. Inst. of Sci. & Technol., Daejeon, South Korea
Seokin Hong , Dept. of Comput. Sci., Korea Adv. Inst. of Sci. & Technol., Daejeon, South Korea
pp. 532-541

EnergySmart: Toward energy-efficient manycores for Near-Threshold Computing (Abstract)

U. R. Karpuzcu , Univ. of Illinois Urbana-Champaign, Urbana, IL, USA
A. Sinkar , Univ. of Wisconsin-Madison, Madison, WI, USA
Nam Sung Kim , Univ. of Wisconsin-Madison, Madison, WI, USA
J. Torrellas , Univ. of Illinois Urbana-Champaign, Urbana, IL, USA
pp. 542-553

Rainbow: Efficient memory dependence recording with high replay parallelism for relaxed memory model (Abstract)

Xuehai Qian , Univ. of Illinois Urbana-Champaign, Urbana, IL, USA
He Huang , AMD Products (China) Co., Ltd., China
B. Sahelices , Univ. de Valladolid, Valladolid, Spain
Depei Qian , Beihang Univ., Beijing, China
pp. 554-565

High-speed formal verification of heterogeneous coherence hierarchies (Abstract)

J. G. Beu , Georgia Inst. of Technol., Atlanta, GA, USA
J. A. Poovey , Georgia Inst. of Technol., Atlanta, GA, USA
E. R. Hein , Georgia Inst. of Technol., Atlanta, GA, USA
T. M. Conte , Georgia Inst. of Technol., Atlanta, GA, USA
pp. 566-577

Cache coherence for GPU architectures (Abstract)

I. Singh , Univ. of British Columbia, Vancouver, BC, Canada
A. Shriraman , Simon Fraser Univ., Burnaby, BC, Canada
W. W. L. Fung , Univ. of British Columbia, Vancouver, BC, Canada
M. O'Connor , Adv. Micro Devices, Inc. (AMD), USA
T. M. Aamodt , Stanford Univ., Stanford, CA, USA
pp. 578-590

The dual-path execution model for efficient GPU control flow (Abstract)

Minsoo Rhu , Electr. & Comput. Eng. Dept., Univ. of Texas at Austin, Austin, TX, USA
M. Erez , Electr. & Comput. Eng. Dept., Univ. of Texas at Austin, Austin, TX, USA
pp. 591-602

A multiple SIMD, multiple data (MSMD) architecture: Parallel execution of dynamic and static SIMD fragments (Abstract)

Yaohua Wang , Sch. of Comput. Sci., Nat. Univ. of Defense Technol., Changsha, China
Shuming Chen , Sch. of Comput. Sci., Nat. Univ. of Defense Technol., Changsha, China
Jianghua Wan , Sch. of Comput. Sci., Nat. Univ. of Defense Technol., Changsha, China
Jiayuan Meng , Dept. of Comput. Sci., Univ. of Virginia, Charlottesville, VA, USA
Kai Zhang , Sch. of Comput. Sci., Nat. Univ. of Defense Technol., Changsha, China
Wei Liu , Sch. of Comput. Sci., Nat. Univ. of Defense Technol., Changsha, China
Xi Ning , Sch. of Comput. Sci., Nat. Univ. of Defense Technol., Changsha, China
pp. 603-614

Tiered-latency DRAM: A low latency and low cost DRAM architecture (Abstract)

Donghyuk Lee , Carnegie Mellon Univ., Pittsburgh, PA, USA
Yoongu Kim , Carnegie Mellon Univ., Pittsburgh, PA, USA
V. Seshadri , Carnegie Mellon Univ., Pittsburgh, PA, USA
Jamie Liu , Carnegie Mellon Univ., Pittsburgh, PA, USA
L. Subramanian , Carnegie Mellon Univ., Pittsburgh, PA, USA
O. Mutlu , Carnegie Mellon Univ., Pittsburgh, PA, USA
pp. 615-626

A case for Refresh Pausing in DRAM memory systems (Abstract)

P. Nair , Sch. of Electr. & Comput. Eng., Georgia Inst. of Technol., Atlanta, GA, USA
Chia-Chen Chou , Sch. of Electr. & Comput. Eng., Georgia Inst. of Technol., Atlanta, GA, USA
M. K. Qureshi , Sch. of Electr. & Comput. Eng., Georgia Inst. of Technol., Atlanta, GA, USA
pp. 627-638

MISE: Providing performance predictability and improving fairness in shared main memory systems (Abstract)

L. Subramanian , Carnegie Mellon Univ., Pittsburgh, PA, USA
V. Seshadri , Carnegie Mellon Univ., Pittsburgh, PA, USA
Yoongu Kim , Carnegie Mellon Univ., Pittsburgh, PA, USA
B. Jaiyen , Carnegie Mellon Univ., Pittsburgh, PA, USA
O. Mutlu , Carnegie Mellon Univ., Pittsburgh, PA, USA
pp. 639-650
89 ms
(Ver 3.3 (11022016))