Search For:

Displaying 1-18 out of 18 total
A Synergetic Approach to Throughput Computing on x86-Based Multicore Desktops
Found in: IEEE Software
By Chi-Keung Luk,Ryan Newton,William Hasenplaugh,Mark Hampton,Geoff Lowney
Issue Date:January 2011
pp. 39-50
In the era of multicores, many applications that require substantial computing power and data crunching can now run on desktop PCs. However, to achieve the best possible performance, developers must write applications in a way that exploits both parallelis...
 
SD3: A Scalable Approach to Dynamic Data-Dependence Profiling
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Minjang Kim, Hyesoon Kim, Chi-Keung Luk
Issue Date:December 2010
pp. 535-546
As multicore processors are deployed in mainstream computing, the need for software tools to help parallelize programs is increasing dramatically. Data-dependence profiling is an important technique to exploit parallelism in programs. More specifically, ma...
 
Analyzing Parallel Programs with Pin
Found in: Computer
By Moshe (Maury) Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi-Keung Luk, Gail Lyons, Harish Patil, Ady Tal
Issue Date:March 2010
pp. 34-41
No summary available.
 
Asim: A Performance Model Framework
Found in: Computer
By Joel Emer, Pritpal Ahuja, Eric Borch, Artur Klauser, Chi-Keung Luk, Srilatha Manne, Shubhendu S. Mukherjee, Harish Patil, Steven Wallace, Nathan Binkert, Roger Espasa, Toni Juan
Issue Date:February 2002
pp. 68-76
<p>The longevity and usefulness of a microprocessor performance modelhas historically depended on the model writer's skills and discipline. However,at Compaq the models became extremely complex and unmanageablebecause designers lacked a structured wa...
 
Tolerating Memory Latency through Software-Controlled Pre-Execution in Simultaneous Multithreading Processors
Found in: Computer Architecture, International Symposium on
By Chi-Keung Luk
Issue Date:July 2001
pp. 0040
Abstract: Hardly predictable data addresses in many irregular applications have rendered prefetching ineffective. In many cases, the only accurate way to predict these addresses is to directly execute the code that generates them. As multithreaded architec...
 
SD3: An Efficient Dynamic Data-Dependence Profiling Mechanism
Found in: IEEE Transactions on Computers
By Minjang Kim,Nagesh B. Lakshminarayana,Hyesoon Kim,Chi-Keung Luk
Issue Date:December 2013
pp. 2516-2530
As multicore processors are deployed in mainstream computing, the need for software tools to help parallelize programs is increasing dramatically. Data-dependence profiling is an important program analysis technique to exploit parallelism in serial program...
 
Ispike: A Post-link Optimizer for the Intel®Itanium®Architecture
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By Chi-Keung Luk, Robert Muth, Harish Patil, Robert Cohn, Geoff Lowney
Issue Date:March 2004
pp. 15
Ispike is post-link optimizer developed for the Intel?Itanium Processor Family (IPF) processors. The IPF architecture poses both opportunities and challenges to post-link optimizations. IPF offers a rich set of performance counters to collect detailed prof...
 
Memory Forwarding: Enabling Aggressive Layout Optimizations by Guaranteeing the Safety of Data Relocation
Found in: Computer Architecture, International Symposium on
By Chi-Keung Luk, Todd C. Mowry
Issue Date:May 1999
pp. 0088
By optimizing data layout at run-time, we can potentially enhance the performance of caches by actively creating spatial locality, facilitating prefetching, and avoiding cache conflicts and false sharing. Unfortunately, it is extremely difficult to guarant...
 
Automatic Compiler-Inserted Prefetching for Pointer-Based Applications
Found in: IEEE Transactions on Computers
By Chi-Keung Luk, Todd C. Mowry
Issue Date:February 1999
pp. 134-141
<p><b>Abstract</b>—As the disparity between processor and memory speeds continues to grow, memory latency is becoming an increasingly important performance bottleneck. While software-controlled prefetching is an attractive technique for t...
 
Understanding Why Correlation Profiling Improves the Predictability of Data Cache Misses in Nonnumeric Applications
Found in: IEEE Transactions on Computers
By Todd C. Mowry, Chi-Keung Luk
Issue Date:April 2000
pp. 369-384
<p><b>Abstract</b>— Latency-tolerance techniques offer the potential for bridging the ever-increasing speed gap between the memory subsystem and today's high-performance processors. However, to fully exploit the benefit of these technique...
 
Predicting Data Cache Misses in Non-Numeric Applications Through Correlation Profiling
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Todd C. Mowry, Chi-Keung Luk
Issue Date:December 1997
pp. 314
To maximize the benefit and minimize the overhead of software-based latency tolerance techniques, we would like to apply them precisely to the set of dynamic references that suffer cache misses. Unfortunately, the information provided by the state-of-the-a...
 
The pochoir stencil compiler
Found in: Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures (SPAA '11)
By Bradley C. Kuszmaul, Charles E. Leiserson, Chi-Keung Luk, Rezaul Alam Chowdhury, Yuan Tang
Issue Date:June 2011
pp. 117-128
A stencil computation repeatedly updates each point of a d-dimensional grid as a function of itself and its near neighbors. Parallel cache-efficient stencil algorithms based on "trapezoidal decompositions" are known, but most programmers find them difficul...
     
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Found in: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (Micro-42)
By Chi-Keung Luk, Hyesoon Kim, Sunpyo Hong
Issue Date:December 2009
pp. 45-55
Heterogeneous multiprocessors are increasingly important in the multi-core era due to their potential for high performance and energy efficiency. In order for software to fully realize this potential, the step that maps computations to processing elements ...
     
PinOS: a programmable framework for whole-system dynamic instrumentation
Found in: Proceedings of the 3rd international conference on Virtual execution environments (VEE '07)
By Chi-Keung Luk, Prashanth P. Bungale
Issue Date:June 2007
pp. 137-147
PinOS is an extension of the Pin dynamic instrumentation framework for whole-system instrumentation, i.e., to instrument both kernel and user-level code. It achieves this by interposing between the subject system and hardware using virtualization technique...
     
Pin: building customized program analysis tools with dynamic instrumentation
Found in: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation (PLDI '05)
By Artur Klauser, Chi-Keung Luk, Geoff Lowney, Harish Patil, Kim Hazelwood, Robert Cohn, Robert Muth, Steven Wallace, Vijay Janapa Reddi
Issue Date:June 2005
pp. 280-280
Robust and powerful software instrumentation tools are essential for program analysis tasks such as profiling, performance evaluation, and bug detection. To meet this need, we have developed a new instrumentation system called Pin. Our goals are to provide...
     
Profile-guided post-link stride prefetching
Found in: Proceedings of the 16th international conference on Supercomputing (ICS '02)
By Chi-Keung Luk, Harish Patil, P. Geoffrey Lowney, Richard Weiss, Robert Cohn, Robert Muth
Issue Date:June 2002
pp. 167-178
Data prefetching is an effective approach to addressing the memory latency problem. While a few processors have implemented hardware-based data prefetching, the majority of modern processors support data-prefetch instructions and rely on compilers to autom...
     
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors
Found in: Proceedings of the 28th annual international symposium on Computer architecture (ISCA '01)
By Chi-Keung Luk
Issue Date:June 2001
pp. 125-131
Hardly predictable data addresses in many irregular applications have rendered prefetching ineffective. In many cases, the only accurate way to predict these addresses is to directly execute the code that generates them. As multithreaded architectures beco...
     
Compiler-based prefetching for recursive data structures
Found in: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems (ASPLOS-VII)
By Chi-Keung Luk, Todd C. Mowry
Issue Date:October 1996
pp. 205-209
Software-controlled data prefetching offers the potential for bridging the ever-increasing speed gap between the memory subsystem and today's high-performance processors. While prefetching has enjoyed considerable success in array-based numeric codes, its ...
     
 1