Search For:

Displaying 1-7 out of 7 total
Bottleneck identification and scheduling in multithreaded applications
Found in: Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '12)
By Onur Mutlu, Yale N. Patt, Jose A. Joao, M. Aater Suleman
Issue Date:March 2012
pp. 223-234
Performance of multithreaded applications is limited by a variety of bottlenecks, e.g. critical sections, barriers and slow pipeline stages. These bottlenecks serialize execution, waste valuable execution cycles, and limit scalability of applications. This...
Parallel application memory scheduling
Found in: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44 '11)
By Eiman Ebrahimi, Onur Mutlu, Yale N. Patt, Chang Joo Lee, Chris Fallin, Jose A. Joao, Rustam Miftakhutdinov
Issue Date:December 2011
pp. 362-373
A primary use of chip-multiprocessor (CMP) systems is to speed up a single application by exploiting thread-level parallelism. In such systems, threads may slow each other down by issuing memory requests that interfere in the shared memory subsystem. This ...
Data Marshaling for Multicore Systems
Found in: IEEE Micro
By M. Aater Suleman, Onur Mutlu, Jose A. Joao, Khubaib Khubaib, Yale N. Patt
Issue Date:January 2011
pp. 56-64
<p>Dividing a program into segments and executing each segment at the core best suited to run it can improve performance and save power. When consecutive segments run on different cores, accesses to intersegment data incur cache misses. Data Marshali...
Virtual Program Counter (VPC) Prediction: Very Low Cost Indirect Branch Prediction Using Conditional Branch Prediction Hardware
Found in: IEEE Transactions on Computers
By Hyesoon Kim, José A. Joao, Onur Mutlu, Chang Joo Lee, Yale N. Patt, Robert Cohn
Issue Date:September 2009
pp. 1153-1170
Indirect branches have become increasingly common in modular programs written in modern object-oriented languages and virtual-machine-based runtime systems. Unfortunately, the prediction accuracy of indirect branches has not improved as much as that of con...
Improving the performance of object-oriented languages with dynamic predication of indirect jumps
Found in: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS XIII)
By Hyesoon Kim, Jose A. Joao, Onur Mutlu, Rishi Agarwal, Yale N. Patt
Issue Date:March 2008
pp. 1-1
Indirect jump instructions are used to implement increasingly-common programming constructs such as virtual function calls, switch-case statements, jump tables, and interface calls. The performance impact of indirect jumps is likely to increase because ind...
Profile-assisted Compiler Support for Dynamic Predication in Diverge-Merge Processors
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By Hyesoon Kim, Jos´e A. Joao, Onur Mutlu, Yale N. Patt
Issue Date:March 2007
pp. 367-378
<p>Dynamic predication has been proposed to reduce the branch misprediction penalty due to hard-to-predict branch instructions. A recently proposed dynamic predication architecture, the diverge-merge processor (DMP), provides large performance improv...
Diverge-Merge Processor: Generalized and Energy-Efficient Dynamic Predication
Found in: IEEE Micro
By Hyesoon Kim, José A. Joao, Onur Mutlu, Yale N. Patt
Issue Date:January 2007
pp. 94-104
The branch misprediction penalty is a major performance limiter and a major cause of wasted energy in high-performance processors. The diverge-merge processor reduces this penalty by dynamically predicating a wide range of hard-to-predict branches at runti...