Search For:

Displaying 1-40 out of 40 total
Multicore Model from Abstract Single Core Inputs
Found in: IEEE Computer Architecture Letters
By Emily Blem,Hadi Esmaeilzadeh,Renee St. Amant,Karthikeyan Sankaralingam,Doug Burger
Issue Date:July 2013
pp. 59-62
This paper describes a first order multicore model to project a tighter upper bound on performance than previous Amdahl's Law based approaches. The speedup over a known baseline is a function of the core performance, microarchitectural features, applicatio...
 
Power struggles: Revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures
Found in: 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
By Emily Blem,Jaikrishnan Menon,Karthikeyan Sankaralingam
Issue Date:February 2013
pp. 1-12
RISC vs. CISC wars raged in the 1980s when chip area and processor design complexity were the primary constraints and desktops and servers exclusively dominated the computing landscape. Today, energy and power are the primary design constraints and the com...
 
Idempotent code generation: Implementation, analysis, and evaluation
Found in: 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
By Marc de Kruijf,Karthikeyan Sankaralingam
Issue Date:February 2013
pp. 1-12
Leveraging idempotence for efficient recovery is of emerging interest in compiler design. In particular, identifying semantically idempotent code and then compiling such code to preserve the semantic idempotence property enables recovery with substantially...
 
DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing
Found in: IEEE Micro
By Venkatraman Govindaraju,Chen-Han Ho,Tony Nowatzki,Jatin Chhugani,Nadathur Satish,Karthikeyan Sankaralingam,Changkyu Kim
Issue Date:September 2012
pp. 38-51
The DySER (Dynamically Specializing Execution Resources) architecture supports both functionality specialization and parallelism specialization. By dynamically specializing frequently executing regions and applying parallelism mechanisms, DySER provides ef...
 
Mechanisms and Evaluation of Cross-Layer Fault-Tolerance for Supercomputing
Found in: 2012 41st International Conference on Parallel Processing (ICPP)
By Chen-Han Ho,Marc de Kruijf,Karthikeyan Sankaralingam,Barry Rountree,Martin Schulz,Bronis R. de Supinski
Issue Date:September 2012
pp. 510-519
Reliability is emerging as an important constraint for future microprocessors. Cooperative hardware and software approaches for error tolerance can solve this hardware reliability challenge. Cross-layer fault tolerance frameworks expose hardware failures t...
 
Dark Silicon and the End of Multicore Scaling
Found in: IEEE Micro
By Hadi Esmaeilzadeh,Emily Blem,Renee St. Amant,Karthikeyan Sankaralingam,Doug Burger
Issue Date:May 2012
pp. 122-134
A key question for the microprocessor research and design community is whether scaling multicores will provide the performance and value needed to scale down many more technology generations. To provide a quantitative answer to this question, a comprehensi...
 
Design, integration and implementation of the DySER hardware accelerator into OpenSPARC
Found in: High-Performance Computer Architecture, International Symposium on
By Jesse Benson,Ryan Cofell,Chris Frericks,Chen-Han Ho,Venkatraman Govindaraju,Tony Nowatzki,Karthikeyan Sankaralingam
Issue Date:February 2012
pp. 1-12
Accelerators and specialization in various forms are emerging as a way to increase processor performance. Examples include Navigo, Conservation-Cores, BERET, and DySER. While each of these employ different primitives and principles to achieve specializatio...
 
Experiences in Co-designing a Packet Classification Algorithm and a Flexible Hardware Platform
Found in: Symposium On Architecture For Networking And Communications Systems
By Nilay Vaish,Thawan Kooburat,Lorenzo De Carli,Karthikeyan Sankaralingam,Cristian Estan
Issue Date:October 2011
pp. 189-199
Algorithmic solutions to the packet classification problem in network equipment have long been a subject of study in academia and industry and with increases in network speeds they are becoming even more important. Since general purpose processors cannot m...
 
Dynamically Specialized Datapaths for energy efficient computing
Found in: High-Performance Computer Architecture, International Symposium on
By Venkatraman Govindaraju, Chen-Han Ho, Karthikeyan Sankaralingam
Issue Date:February 2011
pp. 503-514
Due to limits in technology scaling, energy efficiency of logic devices is decreasing in successive generations. To provide continued performance improvements without increasing power, regardless of the sequential or parallel nature of the application, mic...
 
A unified model for timing speculation: Evaluating the impact of technology scaling, CMOS design style, and fault recovery mechanism
Found in: Dependable Systems and Networks, International Conference on
By Marc de Kruijf, Shuou Nomura, Karthikeyan Sankaralingam
Issue Date:July 2010
pp. 487-496
Due to fundamental device properties, energy efficiency from CMOS scaling is showing diminishing improvements. To overcome the energy efficiency challenges, timing speculation has been proposed to optimize for common-case timing conditions, with errors occ...
 
Toward a multicore architecture for real-time ray-tracing
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Venkatraman Govindaraju, Peter Djeu, Karthikeyan Sankaralingam, Mary Vernon, William R. Mark
Issue Date:November 2008
pp. 176-187
Significant improvement to visual quality for real-time 3D graphics requires modeling of complex illumination effects like soft-shadows, reflections, and diffuse lighting interactions. The conventional Z-buffer algorithm driven GPU model does not provide s...
 
Implementing Signatures for Transactional Memory
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Daniel Sanchez, Luke Yen, Mark D. Hill, Karthikeyan Sankaralingam
Issue Date:December 2007
pp. 123-133
Transactional Memory (TM) systems must track the read and write sets--items read and written during a transaction--to detect conflicts among concurrent trans- actions. Several TMs use signatures, which summarize unbounded read/write sets in bounded hardwar...
 
On-Chip Interconnection Networks of the TRIPS Chip
Found in: IEEE Micro
By Paul Gratz, Changkyu Kim, Karthikeyan Sankaralingam, Heather Hanson, Premkishore Shivakumar, Stephen W. Keckler, Doug Burger
Issue Date:September 2007
pp. 41-50
The TRIPS chip prototypes two networks on chip to demonstrate the viability of a routed interconnection fabric for memory and operand traffic. In a 170-million-transistor custom ASIC chip, these NoCs provide system performance within 28 percent of ideal no...
 
Implementation and Evaluation of a Dynamically Routed Processor Operand Network
Found in: Networks-on-Chip, International Symposium on
By Paul Gratz, Karthikeyan Sankaralingam, Heather Hanson, Premkishore Shivakumar, Robert McDonald, Stephen W. Keckler, Doug Burger
Issue Date:May 2007
pp. 7-17
Microarchitecturally integrated on-chip networks, or micronets, are candidates to replace busses for processor component interconnect in future processor designs. For micronets, tight coupling between processor microarchitecture and network architecture is...
 
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Karthikeyan Sankaralingam, Ramadass Nagarajan, Robert McDonald, Rajagopalan Desikan, Saurabh Drolia, M.S. Govindan, Paul Gratz, Divya Gulati, Heather Hanson, Changkyu Kim, Haiming Liu, Nitya Ranganathan, Simha Sethumadhavan, Sadia Sharif, Premkishore Shiva
Issue Date:December 2006
pp. 480-491
Growing on-chip wire delays will cause many future microarchitectures to be distributed, in which hardware resources within a single processor become nodes on one or more switched micronetworks. Since large processor cores will require multiple clock cycle...
 
Dataflow Predication
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Aaron Smith, Ramadass Nagarajan, Karthikeyan Sankaralingam, Robert McDonald, Doug Burger, Stephen W. Keckler, Kathryn S. McKinley
Issue Date:December 2006
pp. 89-102
Predication facilitates high-bandwidth fetch and large static scheduling regions, but has typically been too complex to implement comprehensively in out-of-ordermicroarchitectures. This paper describes dataflow predication, which provides per-instruction p...
 
Universal Mechanisms for Data-Parallel Architectures
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Karthikeyan Sankaralingam, Stephen W. Keckler, William R. Mark, Doug Burger
Issue Date:December 2003
pp. 303
Data-parallel programs are both growing in importance and increasing in diversity, resulting in specialized processors targeted at specific classes of these programs. This paper presents a classification scheme for data-parallel program attributes, and pro...
 
Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture
Found in: IEEE Micro
By Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Doug Burger, Stephen W. Keckler, Charles Moore
Issue Date:November 2003
pp. 46-51
<p>The TRIPS architecture seeks to deliver system-level configurability to applications and runtime systems. It does so by employing the concept of polymorphism, which permits the runtime system to configure the hardware execution resources to match ...
 
Routed Inter-ALU Networks for ILP Scalability and Performance
Found in: Computer Design, International Conference on
By Karthikeyan Sankaralingam, Vincent Ajay Singh, Stephen W. Keckler, Doug Burger
Issue Date:October 2003
pp. 170
Modern processors rely heavily on broadcast networks to bypass instruction results to dependent instructions in the pipeline. However, as clock rates increase, architectures get wider, and pipelines get deeper, broadcasting becomes more complex, slower, an...
 
Distributed Pagerank for P2P Systems
Found in: High-Performance Distributed Computing, International Symposium on
By Karthikeyan Sankaralingam, Simha Sethumadhavan, James C. Browne
Issue Date:June 2003
pp. 58
<p>This paper de.nes and describes a fully distributed implementation of Google?s highly effective Pagerank algorithm, for
 
Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture
Found in: Computer Architecture, International Symposium on
By Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Doug Burger, Stephen W. Keckler, Charles R. Moore
Issue Date:June 2003
pp. 422
This paper describes the polymorphous TRIPS architecture which can be configured for different granularities and types of parallelism. TRIPS contains mechanisms that enable the processing cores and the on-chip memory system to be configured and combined in...
 
A Design Space Evaluation of Grid Processor Architectures
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Ramadass Nagarajan, Karthikeyan Sankaralingam, Doug Burger, Stephen W. Keckler
Issue Date:December 2001
pp. 40
In this paper, we survey the design space of a new class of architectures called Grid Processor Architectures (GPAs). These architectures are designed to scale with technology, allowing faster clock rates than conventional architectures while providing sup...
 
Understanding the impact of gate-level physical reliability effects on whole program execution
Found in: 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)
By Raghuraman Balasubramanian,Karthikeyan Sankaralingam
Issue Date:February 2014
pp. 60-71
This paper introduces a novel end-to-end platform called PERSim that allows FPGA accelerated full-system simulation of complete programs on prototype hardware with detailed fault injection that can capture gate delays and digital logic behavior of arbitrar...
   
SWSL: SoftWare Synthesis for network Lookup
Found in: 2013 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)
By Sung Jin Kim,Lorenzo De Carli,Karthikeyan Sankaralingam,Cristian Estan
Issue Date:October 2013
pp. 191-201
Data structure lookups are among the most expensive operations on routers' critical path in terms of latency and power. Therefore, efficient lookup engines are crucial. Several approaches have been proposed, based on either custom ASICs, general-purpose pr...
   
Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG
Found in: 2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)
By Venkatraman Govindaraju,Tony Nowatzki,Karthikeyan Sankaralingam
Issue Date:September 2013
pp. 341-351
Modern microprocessors exploit data level parallelism through in-core data-parallel accelerators in the form of short vector ISA extensions such as SSE/AVX and NEON. Although these ISA extensions have existed for decades, compilers do not generate good qua...
   
Hands-on introduction to computer science at the freshman level
Found in: Proceedings of the 45th ACM technical symposium on Computer science education (SIGCSE '14)
By Aritra Biswas, Karthikeyan Sankaralingam, Matthew Doran, Raghuraman Balasubramanian, Timur Girgin, Zachary York
Issue Date:March 2014
pp. 235-240
This paper details the creation of a hands-on introduction course that reflects the dramatic growth and diversity in computer science. Our aim was to enable students to get an end-to-end perspective on computer system design by building one. We report on a...
     
Virtually-aged sampling DMR: unifying circuit failure prediction and circuit failure detection
Found in: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46)
By Karthikeyan Sankaralingam, Raghuraman Balasubramanian
Issue Date:December 2013
pp. 123-135
Hardware failure due to wearout is a growing concern. Circuit failure prediction is an approach that is effective if it meets the following requirements: low design complexity, low overheads, generality (supporting various types of wearout including soft a...
     
A general constraint-centric scheduling framework for spatial architectures
Found in: Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation (PLDI '13)
By Behnam Robatmili, Cristian Estan, Karthikeyan Sankaralingam, Lorenzo De Carli, Michael Sartin-Tarm, Tony Nowatzki
Issue Date:June 2013
pp. 495-506
Specialized execution using spatial architectures provides energy efficient computation, but requires effective algorithms for spatially scheduling the computation. Generally, this has been solved with architecture-specific heuristics, an approach which su...
     
ConAir: featherweight concurrency bug recovery via single-threaded idempotent execution
Found in: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '13)
By Ang Li, Karthikeyan Sankaralingam, Marc de Kruijf, Shan Lu, Wei Zhang
Issue Date:March 2013
pp. 113-126
Many concurrency bugs are hidden in deployed software and cause severe failures for end-users. When they finally manifest and become known by developers, they are difficult to fix correctly. To support end-users, we need techniques that help software survi...
     
Power challenges may end the multicore era
Found in: Communications of the ACM
By Doug Burger, Emily Blem, Hadi Esmaeilzadeh, Karthikeyan Sankaralingam, Renée St. Amant
Issue Date:February 2013
pp. 93-102
Starting in 2004, the microprocessor industry has shifted to multicore scaling---increasing the number of cores per die each generation---as its principal strategy for continuing performance growth. Many in the research community believe that this exponent...
     
LEAP: latency- energy- and area-optimized lookup pipeline
Found in: Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems (ANCS '12)
By Cristian Estan, Eric N. Harris, Karthikeyan Sankaralingam, Lorenzo De Carli, Samuel L. Wasmundt
Issue Date:October 2012
pp. 175-186
Table lookups and other types of packet processing require so much memory bandwidth that the networking industry has long been a major consumer of specialized memories like TCAMs. Extensive research in algorithms for longest prefix matching and packet clas...
     
Power Limitations and Dark Silicon Challenge the Future of Multicore
Found in: ACM Transactions on Computer Systems (TOCS)
By Doug Burger, Emily Blem, Hadi Esmaeilzadeh, Karthikeyan Sankaralingam, Renée St. Amant
Issue Date:August 2012
pp. 1-27
Since 2004, processor designers have increased core counts to exploit Moore’s Law scaling, rather than focusing on single-core performance. The failure of Dennard scaling, to which the shift to multicore parts is partially a response, may soon limit ...
     
iGPU: exception support and speculative execution on GPUs
Found in: Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA '12)
By Jaikrishnan Menon, Karthikeyan Sankaralingam, Marc De Kruijf
Issue Date:June 2012
pp. 72-83
Since the introduction of fully programmable vertex shader hardware, GPU computing has made tremendous advances. Exception support and speculative execution are the next steps to expand the scope and improve the usability of GPUs. However, traditional mech...
     
Idempotent processor architecture
Found in: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44 '11)
By Karthikeyan Sankaralingam, Marc de Kruijf
Issue Date:December 2011
pp. 140-151
Improving architectural energy efficiency is important to address diminishing energy efficiency gains from technology scaling. At the same time, limiting hardware complexity is also important. This paper presents a new processor architecture, the idempoten...
     
Dark silicon and the end of multicore scaling
Found in: Proceeding of the 38th annual international symposium on Computer architecture (ISCA '11)
By Doug Burger, Emily Blem, Hadi Esmaeilzadeh, Karthikeyan Sankaralingam, Renee St. Amant
Issue Date:June 2011
pp. 365-376
Since 2005, processor designers have increased core counts to exploit Moore's Law scaling, rather than focusing on single-core performance. The failure of Dennard scaling, to which the shift to multicore parts is partially a response, may soon limit multic...
     
Sampling + DMR: practical and low-overhead permanent fault detection
Found in: Proceeding of the 38th annual international symposium on Computer architecture (ISCA '11)
By Chen-Han Ho, Karthikeyan Sankaralingam, Marc de Kruijf, Matthew D. Sinclair, Shuou Nomura, Venkatraman Govindaraju
Issue Date:June 2011
pp. 201-212
With technology scaling, manufacture-time and in-field permanent faults are becoming a fundamental problem. Multi-core architectures with spares can tolerate them by detecting and isolating faulty cores, but the required fault detection coverage becomes ef...
     
Design and implementation of the PLUG architecture for programmable and efficient network lookups
Found in: Proceedings of the 19th international conference on Parallel architectures and compilation techniques (PACT '10)
By Amit Kumar, Cristian Estan, Karthikeyan Sankaralingam, Lorenzo De Carli, Marc de Kruijf, Somesh Jha, Sung Jin Kim
Issue Date:September 2010
pp. 331-342
This paper proposes a new architecture called Pipelined LookUp Grid (PLUG) that can perform data structure lookups in network processing. PLUGs are programmable and through simplicity achieve power efficiency. We draw upon one key insights: data structure ...
     
Relax: an architectural framework for software recovery of hardware faults
Found in: Proceedings of the 37th annual international symposium on Computer architecture (ISCA '10)
By Karthikeyan Sankaralingam, Marc de Kruijf, Shuou Nomura
Issue Date:June 2010
pp. 72-ff
As technology scales ever further, device unreliability is creating excessive complexity for hardware to maintain the illusion of perfect operation. In this paper, we consider whether exposing hardware fault information to software and allowing software to...
     
PLUG: flexible lookup modules for rapid deployment of new protocols in high-speed routers
Found in: Proceedings of the ACM SIGCOMM 2009 conference on Data communication (SIGCOMM '09)
By Amit Kumar, Cristian Estan, Karthikeyan Sankaralingam, Lorenzo De Carli, Yi Pan
Issue Date:August 2009
pp. 101-104
New protocols for the data link and network layer are being proposed to address limitations of current protocols in terms of scalability, security, and manageability. High-speed routers and switches that implement these protocols traditionally perform pack...
     
General parallel computations on desktop grid and P2P systems
Found in: Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems (LCR '04)
By James C. Browne, Karthikeyan Sankaralingam, Kevin Kane, Madulika Yalamanchi
Issue Date:October 2004
pp. 1-8
This paper defines the requirements for effective execution of iterative computations requiring communication on a desktop grid. It then proposes a combination of a p2p communication model, an algorithmic approach (asynchronous iterations) and a programmin...
     
 1