Search For:

Displaying 1-50 out of 59 total
Scalable Shared-Memory Multiprocessor Architectures
Found in: Computer
By Shreekant Thakkar, Michel Dubois, Anthony T. Laundrie, Gurindar S. Sohi, David V. James, Stein Gjessing, Manu Thapar, Bruce Delagi, Michael Carlton, Alvin Despain
Issue Date:June 1990
pp. 71-83
<p>Directory-based and bus-based cache coherence schemes are defined and described. Directory-based schemes can be classified as centralized or distributed. Both categories support local caches to improve processor performance and reduce traffic in t...
 
Read-After-Read Memory Dependence Prediction
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Andreas Moshovos, Gurindar S. Sohi
Issue Date:November 1999
pp. 177
We identify that typical programs exhibit highly regular read-after-read (RAR) memory dependence streams. We exploit this regularity by introducing read-after-read (RAR) memory dependence prediction. We also present two RAR memory dependence prediction-bas...
 
Parallelism in the Front-End
Found in: Computer Architecture, International Symposium on
By Paramjit S. Oberoi, Gurindar S. Sohi
Issue Date:June 2003
pp. 230
As processor back-ends get more aggressive, front-ends will have to scale as well. Although the back-ends of superscalar processors have continued to become more parallel, the front-ends remain sequential. This paper describes techniques for fetching and r...
 
Supporting Overcommitted Virtual Machines through Hardware Spin Detection
Found in: IEEE Transactions on Parallel and Distributed Systems
By Koushik Chakraborty,Philip M. Wells,Gurindar S. Sohi
Issue Date:February 2012
pp. 353-366
Multiprocessor operating systems (OSs) pose several unique and conflicting challenges to System Virtual Machines (System VMs). For example, most existing system VMs resort to gang scheduling a guest OS's virtual processors (VCPUs) to avoid OS synchronizati...
 
A Quantitative Framework for Automated Pre-Execution Thread Selection
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Amir Roth, Gurindar S. Sohi
Issue Date:November 2002
pp. 430
Pre-execution attacks cache misses for which address prediction driven prefetching fails. In pre-execution, copies of cache miss computations are isolated from the main program and launched as separate threads called p-threads whenever the processor antici...
 
The Use of Multithreading for Exception Handling
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Craig B. Zilles, Gurindar S. Sohi, Joel S. Emer
Issue Date:November 1999
pp. 219
Common hardware exceptions, when implemented by trapping, unnecessarily serialize program execution in dynamically scheduled superscalar processors. To avoid the consequences of trapping the main program thread, multithreaded CPUs can exploit control and d...
 
Adapting to Intermittent Faults in Future Multicore Systems
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Philip M. Wells, Koushik Chakraborty, Gurindar S. Sohi
Issue Date:September 2007
pp. 431
As technology continues to scale, future multicore processors become more susceptible to a variety of hardware failures. In particular, intermittent faults, are expected to become especially problematic [1, 2]. A circuit is susceptible to intermittent faul...
   
Cooperative Caching for Chip Multiprocessors
Found in: Computer Architecture, International Symposium on
By Jichuan Chang, Gurindar S. Sohi
Issue Date:June 2006
pp. 264-276
<p>This paper presents CMP Cooperative Caching, a unified framework to manage a CMP?s aggregate on-chip cache resources. Cooperative caching combines the strengths of private and shared cache organizations by forming an aggregate
 
Program Demultiplexing: Data-flow based Speculative Parallelization of Methods in Sequential Programs
Found in: Computer Architecture, International Symposium on
By Saisanthosh Balakrishnan, Gurindar S. Sohi
Issue Date:June 2006
pp. 302-313
<p>We present Program Demultiplexing (PD), an execution paradigm that creates concurrency in sequential programs by
 
Speculative Multithreaded Processors
Found in: Computer
By Gurindar S. Sohi, Amir Roth
Issue Date:April 2001
pp. 66-73
<p>Although novel functionality in the 1990s played a dominant role in processor design, the authors predict that implementation will dominate over functionality. Designing, debugging, and verifying monolithic designs that use hundreds of millions of...
 
Speculative Incoherent Cache Protocols
Found in: IEEE Micro
By Jaehyuk Huh, Doug Burger, Jichuan Chang, Gurindar S. Sohi
Issue Date:November 2004
pp. 104-109
Coherence decoupling is a microarchitectural mechanism that implements separate protocols for speculative use and for the eventual verification of values. The technique reduces the effect of long communication latencies while mitigating the burdens on the ...
 
Use-Based Register Caching with Decoupled Indexing
Found in: Computer Architecture, International Symposium on
By J. Adam Butts, Gurindar S. Sohi
Issue Date:June 2004
pp. 302
Wide, deep pipelines need many physical registers to hold the results of in-flight instructions. Simultaneously, high clock frequencies prohibit using large register files and bypass networks without a significant performance penalty. Previously proposed t...
 
Characterization of Problem Stores
Found in: IEEE Computer Architecture Letters
By Allison L. Holloway, Gurindar S. Sohi
Issue Date:January 2004
pp. N/A
This paper introduces the concept of problem stores: static stores whose dependent loads often miss in the cache. Accurately identifying problem stores allows the early determination of addresses likely to cause later misses, potentially allowing for the d...
 
Exploiting Value Locality in Physical Register Files
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Saisanthosh Balakrishnan, Gurindar S. Sohi
Issue Date:December 2003
pp. 265
The physical register file is an important component of a dynamically-scheduled processor. Increasing the amount of parallelism places increasing demands on the physical register file, calling for alternative file organization and management strategies. Th...
 
Characterizing and Predicting Value Degree of Use
Found in: Microarchitecture, IEEE/ACM International Symposium on
By J. Adam Butts, Gurindar S. Sohi
Issue Date:November 2002
pp. 15
A value?s degree of use -- the number of dynamic uses of that value -- provides the most essential information needed to optimize its communication. We present simulation results demonstrating the properties of degree of use of values, including their pred...
 
A Programmable Co-processor for Profiling
Found in: High-Performance Computer Architecture, International Symposium on
By Craig B. Zilles, Gurindar S. Sohi
Issue Date:January 2001
pp. 0241
Abstract: Aggressive program optimization requires accurate profile information, but such accuracy requires many samples to be collected. We explore a novel profiling architecture that reduces the overhead of collecting each sample by including a programma...
 
Speculative Data-Driven Multithreading
Found in: High-Performance Computer Architecture, International Symposium on
By Amir Roth, Gurindar S. Sohi
Issue Date:January 2001
pp. 0037
Abstract: Mispredicted branches and loads that miss in the cache cause the majority of retirement stalls experienced by sequential processors; we call these critical instructions.Despite their importance, a sequential processor has difficulty prioritizing ...
 
Understanding the Backward Slices of Performance Degrading Instructions
Found in: Computer Architecture, International Symposium on
By Gurindar S. Sohi, Craig B. Zilles
Issue Date:June 2000
pp. 172
For many applications, branch mispredictions and cache misses limit a processor's performance to a level well below its peak instruction throughput. A small fraction of static instructions, whose behavior cannot be anticipated using current branch predicto...
 
Memory Dependence Speculation Tradeoffs in Centralized, Continuous-Window Superscalar Processors
Found in: High-Performance Computer Architecture, International Symposium on
By Andreas Moshovos, Gurindar S. Sohi
Issue Date:January 2000
pp. 301
We consider a variety of dynamic, hardware-based methods for exploiting load/store parallelism, including mechanisms that use memory dependence speculation. While previous work has also investigated such methods, this has been done primarily for split, dis...
 
Effective Jump-Pointer Prefetching for Linked Data Structures
Found in: Computer Architecture, International Symposium on
By Amir Roth, Gurindar S. Sohi
Issue Date:May 1999
pp. 0111
Current techniques for prefetching linked data structures (LDS) exploit the work available in one loop iteration or recursive call to overlap pointer chasing latency. Jump-pointers, which provide direct access to non-adjacent nodes, can be used for prefetc...
 
New Methods for Exploiting Program Structure and Behavior in Computer Architecture
Found in: Innovative Architecture for Future Generation High-Performance Processors and Systems, International Workshop on
By Amir Roth, Gurindar S. Sohi
Issue Date:October 1998
pp. 71
Micro-architectural techniques of the next decade will have to be more efficient and scalable in order to handle growing workloads and longer communication and memory latencies. We believe that information about program structure, the data and control rela...
 
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References
Found in: IEEE Transactions on Computers
By Manoj Franklin, Gurindar S. Sohi
Issue Date:May 1996
pp. 552-571
<p><b>Abstract</b>—To exploit instruction level parallelism, it is important not only to execute multiple memory references per cycle, but also to reorder memory references(especially to execute loads before stores that precede them in th...
 
Multiscalar Processors
Found in: Computer Architecture, International Symposium on
By Scott E. Breach, T. N. Vijaykumar, Gurindar S. Sohi
Issue Date:June 1995
pp. 414
Multiscalar processors use a new, aggressive implementation paradigm for extracting large quantities of instruction level parallelism from ordinary high level language programs. A single program is divided into a collection of tasks by a combination of sof...
 
Streamlining Data Cache Access with Fast Address Calculation
Found in: Computer Architecture, International Symposium on
By Gurindar S. Sohi, Dionisios N. Pnevmatikatos, Todd M. Austin
Issue Date:June 1995
pp. 369
For many programs, especially integer codes, untolerated load instruction latencies account for a significant portion of total execution time. In this paper, we present the design and evaluation of a fast address generation mechanism capable of eliminating...
 
A Static Power Model for Architects
Found in: Microarchitecture, IEEE/ACM International Symposium on
By J. Adam Butts, Gurindar S. Sohi
Issue Date:December 2000
pp. 191
Static power dissipation due to transistor leakage constitutes an increasing fraction of the total power in modern semiconductor technologies. Current technology trends indicate that the contribution will increase rapidly, reaching one half of total power ...
 
Dynamic Speculation and Synchronization of Data Dependences
Found in: Computer Architecture, International Symposium on
By Gurindar S. Sohi, Scott E. Breach, T. N. Vijaykumar, Andreas Moshovos
Issue Date:June 1997
pp. 181
Data dependence speculation is used in instruction-level parallel (ILP) processors to allow early execution of an instruction before a logically preceding instruction on which it may be data dependent. If the instruction is independent, data dependence spe...
 
Dynamic Instruction Reuse
Found in: Computer Architecture, International Symposium on
By Gurindar S. Sohi, Avinash Sodani
Issue Date:June 1997
pp. 194
This paper introduces the concept of dynamic instruction reuse. Empirical observations suggest that many instructions, and groups of instructions, having the same inputs, are executed dynamically. Such instructions do not have to be executed repeatedly ---...
 
High-Bandwidth Address Translation for Multiple-Issue Processors
Found in: Computer Architecture, International Symposium on
By Gurindar S. Sohi, Todd M. Austin
Issue Date:May 1996
pp. 158
In an effort to push the envelope of system performance, microprocessor designs are continually exploiting higher levels of instruction-level parallelism, resulting in increasing bandwidth demands on the address translation mechanism. Most current micropro...
 
Holistic run-time parallelism management for time and energy efficiency
Found in: Proceedings of the 27th international ACM conference on International conference on supercomputing (ICS '13)
By Gagan Gupta, Gurindar S. Sohi, Srinath Sridharan
Issue Date:June 2013
pp. 337-348
The ubiquity of parallel machines will necessitate time- and energy-efficient parallel execution of a program in a wide range of hardware and software environments. Prevalent parallel execution models can fail to be efficient. Unable to account for dynamic...
     
Dataflow execution of sequential imperative programs on multicore architectures
Found in: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44 '11)
By Gagan Gupta, Gurindar S. Sohi
Issue Date:December 2011
pp. 59-70
As multicore processors become the default, researchers are aggressively looking for program execution models that make it easier to use the available resources. Multithreaded programming models that rely on statically-parallel programs have gained prevale...
     
Mixed-mode multicore reliability
Found in: Proceeding of the 14th international conference on Architectural support for programming languages and operating systems (ASPLOS '09)
By Gurindar S. Sohi, Koushik Chakraborty, Philip M. Wells
Issue Date:March 2009
pp. 23-27
Future processors are expected to observe increasing rates of hardware faults. Using Dual-Modular Redundancy (DMR), two cores of a multicore can be loosely coupled to redundantly execute a single software thread, providing very high coverage from many diff...
     
Serialization sets: a dynamic dependence-based parallel execution model
Found in: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP '09)
By Gurindar S. Sohi, Matthew D. Allen, Srinath Sridharan
Issue Date:February 2009
pp. 283-284
This paper proposes a new parallel execution model where programmers augment a sequential program with pieces of code called serializers that dynamically map computational operations into serialization sets of dependent operations. A runtime system execute...
     
Adapting to intermittent faults in multicore systems
Found in: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS XIII)
By Gurindar S. Sohi, Koushik Chakraborty, Philip M. Wells
Issue Date:March 2008
pp. 1-1
Future multicore processors will be more susceptible to a variety of hardware failures. In particular, intermittent faults, caused in part by manufacturing, thermal, and voltage variations, can cause bursts of frequent faults that last from several cycles ...
     
Computation spreading: employing hardware migration to specialize CMP cores on-the-fly
Found in: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems (ASPLOS-XII)
By Gurindar S. Sohi, Koushik Chakraborty, Philip M. Wells
Issue Date:October 2006
pp. 109-es
In canonical parallel processing, the operating system (OS) assigns a processing core to a single thread from a multithreaded server application. Since different threads from the same application often carry out similar computation, albeit at different tim...
     
Hardware support for spin management in overcommitted virtual machines
Found in: Proceedings of the 15th international conference on Parallel architectures and compilation techniques (PACT '06)
By Gurindar S. Sohi, Koushik Chakraborty, Philip M. Wells
Issue Date:September 2006
pp. 124-133
Multiprocessor operating systems (OSs) pose several unique and conflicting challenges to System Virtual Machines (System VMs). For example, most existing system VMs resort to gang scheduling a guest OS's virtual processors (VCPUs) to avoid OS synchronizati...
     
Coherence decoupling: making use of incoherence
Found in: Proceedings of the 11th international conference on Architectural support for programming languages and operating systems (ASPLOS-XI)
By Doug Burger, Gurindar S. Sohi, Jaehyuk Huh, Jichuan Chang
Issue Date:October 2004
pp. 97-105
This paper explores a new technique called coherence decoupling, which breaks a traditional cache coherence protocol into two protocols: a Speculative Cache Lookup (SCL) protocol and a safe, backing coherence protocol. The SCL protocol produces a speculati...
     
Register integration: a simple and efficient implementation of squash reuse
Found in: Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture (MICRO 33)
By Amir Roth, Gurindar S. Sohi
Issue Date:December 2000
pp. 223-234
Recent research has suggested that the branch history register need not contain the outcomes of the most recent branches in order for the Two-Level Adaptive Branch Predictor to work well. From this result, it is tempting to conclude that the branch history...
     
A static power model for architects
Found in: Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture (MICRO 33)
By Gurindar S. Sohi, J. Adam Butts
Issue Date:December 2000
pp. 191-201
Recent research has suggested that the branch history register need not contain the outcomes of the most recent branches in order for the Two-Level Adaptive Branch Predictor to work well. From this result, it is tempting to conclude that the branch history...
     
Understanding the backward slices of performance degrading instructions
Found in: Proceedings of the 27th annual international symposium on Computer architecture (ISCA '00)
By Craig B. Zilles, Gurindar S. Sohi
Issue Date:June 2000
pp. 125-131
For many applications, branch mispredictions and cache misses limit a processor's performance to a level well below its peak instruction throughput. A small fraction of static instructions, whose behavior cannot be anticipated using current branch predicto...
     
Improving virtual function call target prediction via dependence-based pre-computation
Found in: Proceedings of the 13th international conference on Supercomputing (ICS '99)
By Amir Roth, Andreas Moshovos, Gurindar S. Sohi
Issue Date:June 1999
pp. 356-364
To minimize the amount of computation and storage for parallel sparse factorization, sparse matrices have to be reordered prior to factorization. We show that none of the popular ordering heuristics proposed before, namely, mulitple minimum degree and nest...
     
Dependence based prefetching for linked data structures
Found in: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems (ASPLOS-VIII)
By Amir Roth, Andreas Moshovos, Gurindar S. Sohi
Issue Date:October 1998
pp. 205-209
We introduce a dynamic scheme that captures the accesspat-terns of linked data structures and can be used to predict future accesses with high accuracy. Our technique exploits the dependence relationships that exist between loads that produce addresses and...
     
An empirical analysis of instruction repetition
Found in: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems (ASPLOS-VIII)
By Avinash Sodani, Gurindar S. Sohi
Issue Date:October 1998
pp. 205-209
We study the phenomenon of instruction repetition, where the inputs and outputs of multiple dynamic instances of a static instruction are repeated. We observe that over 80% of the dynamic instructions executed in several programs are repeated and most of t...
     
Multiscalar processors
Found in: 25 years of the international symposia on Computer architecture (selected papers) (ISCA '98)
By Gurindar S. Sohi, Scott E. Breach, T. N. Vijaykumar
Issue Date:June 1998
pp. 521-532
DataScalar architectures improve memory system performance by running computation redundantly across multiple processors, which are each tightly coupled with an associated memory. The program data set (and/or text) is distributed across these memories. In ...
     
Instruction issue logic for high-performance, interruptable pipelined processors
Found in: 25 years of the international symposia on Computer architecture (selected papers) (ISCA '98)
By Gurindar S. Sohi, Sriram Vajapeyam
Issue Date:June 1998
pp. 329-336
DataScalar architectures improve memory system performance by running computation redundantly across multiple processors, which are each tightly coupled with an associated memory. The program data set (and/or text) is distributed across these memories. In ...
     
Retrospective: instruction issue logic for high-performance, interruptable pipelined processors
Found in: 25 years of the international symposia on Computer architecture (selected papers) (ISCA '98)
By Gurindar S. Sohi
Issue Date:June 1998
pp. 51-53
DataScalar architectures improve memory system performance by running computation redundantly across multiple processors, which are each tightly coupled with an associated memory. The program data set (and/or text) is distributed across these memories. In ...
     
Dynamic instruction reuse
Found in: Proceedings of the 24th annual international symposium on Computer architecture (ISCA '97)
By Avinash Sodani, Gurindar S. Sohi
Issue Date:June 1997
pp. 309-319
This paper introduces the concept of dynamic instruction reuse. Empirical observations suggest that many instructions, and groups of instructions, having the same inputs, are executed dynamically. Such instructions do not have to be executed repeatedly ---...
     
Dynamic speculation and synchronization of data dependences
Found in: Proceedings of the 24th annual international symposium on Computer architecture (ISCA '97)
By Andreas Moshovos, Gurindar S. Sohi, Scott E. Breach, T. N. Vijaykumar
Issue Date:June 1997
pp. 309-319
Data dependence speculation is used in instruction-level parallel (ILP) processors to allow early execution of an instruction before a logically preceding instruction on which it may be data dependent. If the instruction is independent, data dependence spe...
     
High-bandwidth address translation for multiple-issue processors
Found in: Proceedings of the 23rd annual international symposium on Computer architecture (ISCA '96)
By Gurindar S. Sohi, Todd M. Austin
Issue Date:May 1996
pp. 309-319
In an effort to push the envelope of system performance, microprocessor designs are continually exploiting higher levels of instruction-level parallelism, resulting in increasing bandwidth demands on the address translation mechanism. Most current micropro...
     
Multiscalar processors
Found in: Proceedings of the 22nd annual international symposium on Computer architecture (ISCA '95)
By Gurindar S. Sohi, Scott E. Breach, T. N. Vijaykumar
Issue Date:June 1995
pp. 309-319
Multiscalar processors use a new, aggressive implementation paradigm for extracting large quantities of instruction level parallelism from ordinary high level language programs. A single program is divided into a collection of tasks by a combination of sof...
     
Streamlining data cache access with fast address calculation
Found in: Proceedings of the 22nd annual international symposium on Computer architecture (ISCA '95)
By Dionisios N. Pnevmatikatos, Gurindar S. Sohi, Todd M. Austin
Issue Date:June 1995
pp. 309-319
For many programs, especially integer codes, untolerated load instruction latencies account for a significant portion of total execution time. In this paper, we present the design and evaluation of a fast address generation mechanism capable of eliminating...
     
 1  2 Next >>