Search For:

Displaying 1-20 out of 20 total
The M5 Simulator: Modeling Networked Systems
Found in: IEEE Micro
By Nathan L. Binkert, Ronald G. Dreslinski, Lisa R. Hsu, Kevin T. Lim, Ali G. Saidi, Steven K. Reinhardt
Issue Date:July 2006
pp. 52-60
Developed specifically to enable research in TCP/IP networking, the M5 simulator provides features necessary for simulating networked hosts, including full-system capability, a detailed I/O subsystem, and the ability to simulate multiple networked systems ...
 
Performance Analysis of System Overheads in TCP/IP Workloads
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Nathan L. Binkert, Lisa R. Hsu, Ali G. Saidi, Ronald G. Dreslinski, Andrew L. Schultz, Steven K. Reinhardt
Issue Date:September 2005
pp. 218-230
<p>Current high-performance computer systems are unable to saturate the latest available high-bandwidth networks such as 10 Gigabit Ethernet. A key obstacle in achieving 10 gigabits per second is the high overhead of communication between the CPU and...
 
Scaling towards kilo-core processors with asymmetric high-radix topologies
Found in: 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
By Nilmini Abeyratne,Reetuparna Das,Qingkun Li,Korey Sewell,Bharan Giridhar,Ronald G. Dreslinski,David Blaauw,Trevor Mudge
Issue Date:February 2013
pp. 496-507
In this paper, we explore the challenges in scaling on-chip networks towards kilo-core processors. Current low-radix topologies optimize for fast local communication, but do not scale well to kilo-core systems because of the large number of routers require...
 
Limits of Parallelism and Boosting in Dim Silicon
Found in: IEEE Micro
By Nathaniel Pinckney,Ronald G. Dreslinski,Korey Sewell,David Fick,Trevor Mudge,Dennis Sylvester,David Blaauw
Issue Date:September 2013
pp. 30-37
Supply-voltage scaling has stagnated in recent technology nodes, leading to so-called dark silicon. To increase overall chip multiprocessor (CMP) performance, it is necessary to improve the energy efficiency of individual tasks so that more tasks can be ex...
 
Centip3De: A 64-Core, 3D Stacked Near-Threshold System
Found in: IEEE Micro
By Ronald G. Dreslinski,David Fick,Bharan Giridhar,Gyouho Kim,Sangwon Seo,Matthew Fojtik,Sudhir Satpathy,Yoonmyung Lee,Daeyeon Kim,Nurrachman Liu,Michael Wieckowski,Gregory Chen,Dennis Sylvester,David Blaauw,Trevor Mudge
Issue Date:March 2013
pp. 8-16
Centip3De uses the synergy between 3D integration and near-threshold computing to create a reconfigurable system that provides both energy-efficient operation and techniques to address single-thread performance bottlenecks. The original Centip3De design is...
 
Embedded way prediction for last-level caches
Found in: 2012 IEEE 30th International Conference on Computer Design (ICCD 2012)
By Faissal M. Sleiman,Ronald G. Dreslinski,Thomas F. Wenisch
Issue Date:September 2012
pp. 167-174
This paper investigates Embedded Way Prediction for large last-level caches (LLCs): an architecture and circuit design to provide the latency of parallel tag-data access at substantial energy savings. Existing way prediction approaches for L1 caches are co...
 
Full-system analysis and characterization of interactive smartphone applications
Found in: IEEE Workload Characterization Symposium
By Anthony Gutierrez,Ronald G. Dreslinski,Thomas F. Wenisch,Trevor Mudge,Ali Saidi,Chris Emmons,Nigel Paver
Issue Date:November 2011
pp. 81-90
Smartphones have recently overtaken PCs as the primary consumer computing device in terms of annual unit shipments. Given this rapid market growth, it is important that mobile system designers and computer architects analyze the characteristics of the inte...
 
Bloom Filter Guided Transaction Scheduling
Found in: High-Performance Computer Architecture, International Symposium on
By Geoffrey Blake, Ronald G. Dreslinski, Trevor Mudge
Issue Date:February 2011
pp. 75-86
Contention management is an important design component to a transactional memory system. Without effective contention management to ensure forward progress, a transactional memory system can experience live-lock, which is difficult to debug in parallel pro...
 
Reconfigurable energy efficient near threshold cache architectures
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Ronald G. Dreslinski, Gregory K. Chen, Trevor Mudge, David Blaauw, Dennis Sylvester, Krisztian Flautner
Issue Date:November 2008
pp. 459-470
Battery life is an important concern for modern embedded processors. Supply voltage scaling techniques can provide an order of magnitude reduction in energy. Current commercial memory technologies have been limited in the degree of supply voltage scaling t...
 
Energy efficient near-threshold chip multi-processing
Found in: Low Power Electronics and Design, International Symposium on
By Bo Zhai, Ronald G. Dreslinski, David Blaauw, Trevor Mudge, Dennis Sylvester
Issue Date:August 2007
pp. 32-37
Subthreshold circuit design has become a popular approach for building energy efficient digital circuits. One drawback is performance degradation due to the exponentially reduced driving current. This had limited subthreshold circuits to relatively low per...
 
A study of Thread Level Parallelism on mobile devices
Found in: 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
By Cao Gao,Anthony Gutierrez,Ronald G. Dreslinski,Trevor Mudge,Krisztian Flautner,Geoffery Blake
Issue Date:March 2014
pp. 126-127
Mobile devices continue to increase the number of cores in an attempt to meet the needs of performance-demanding applications. However, the increasing number of cores does not necessarily translate into performance gain and/or power reduction. In this pape...
   
Sources of error in full-system simulation
Found in: 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
By Anthony Gutierrez,Joseph Pusdesris,Ronald G. Dreslinski,Trevor Mudge,Chander Sudanthi,Christopher D. Emmons,Mitchell Hayenga,Nigel Paver
Issue Date:March 2014
pp. 13-22
In this work we investigate the sources of error in gem5—a state-of-the-art computer simulator—by validating it against a real hardware platform: the ARM Versatile Express TC2 development board. We design a custom gem5 configuration and make several change...
   
Analysis of hardware prefetching across virtual page boundaries
Found in: Proceedings of the 4th international conference on Computing frontiers (CF '07)
By Ali G. Saidi, Ronald G. Dreslinski, Steven K. Reinhardt, Trevor Mudge
Issue Date:May 2007
pp. 13-22
Data cache prefetching in the L2 is at the forefront of pre-fetching research. In this paper we analyze the impact of virtual page boundaries on these prefetchers. Conservative measurements on real hardware show that 30-50% of consecutive virtual pages are...
     
Integrated 3D-stacked server designs for increasing physical density of key-value stores
Found in: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems (ASPLOS '14)
By Anthony Gutierrez, Bharan Giridhar, Luis Ceze, Michael Cieslak, Ronald G. Dreslinski, Trevor Mudge
Issue Date:March 2014
pp. 485-498
Key-value stores, such as Memcached, have been used to scale web services since the beginning of the Web 2.0 era. Data center real estate is expensive, and several industry experts we have spoken to have suggested that a significant portion of their data c...
     
Centip3De: a many-core prototype exploring 3D integration and near-threshold computing
Found in: Communications of the ACM
By Bharan Giridhar, David Fick, Dennis Sylvester, Gregory Chen, Matthew Fojtik, Ronald G. Dreslinski, Daeyeon Kim, David Blaauw, Gyouho Kim, Michael Wieckowski, Nurrachman Liu, Sangwon Seo, Sudhir Satpathy, Trevor Mudge, Yoonmyung Lee
Issue Date:November 2013
pp. 97-104
Process scaling has resulted in an exponential increase of the number of transistors available to designers. Meanwhile, global interconnect has not scaled nearly as well, because global wires scale only in one dimension instead of two, resulting in fewer, ...
     
Catnap: energy proportional multiple network-on-chip
Found in: Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13)
By Reetuparna Das, Ronald G. Dreslinski, Satish Narayanasamy, Sudhir K. Satpathy
Issue Date:June 2013
pp. 320-331
Multiple networks have been used in several processor implementations to scale bandwidth and ensure protocol-level deadlock freedom for different message classes. In this paper, we observe that a multiple-network design is also attractive from a power pers...
     
XPoint cache: scaling existing bus-based coherence protocols for 2D and 3D many-core systems
Found in: Proceedings of the 21st international conference on Parallel architectures and compilation techniques (PACT '12)
By David Blaauw, Dennis Sylvester, Korey Sewell, Nathaniel Pinckney, Reetuparna Das, Ronald G. Dreslinski, Sudhir Satpathy, Thomas Manville, Trevor Mudge
Issue Date:September 2012
pp. 75-86
With multi-core processors now mainstream, the shift to many-core processors poses a new set of design challenges. In particular, the scalability of coherence protocols remains a significant challenge. While complex Network-on-Chip interconnect fabrics hav...
     
Diet SODA: a power-efficient processor for digital cameras
Found in: Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design (ISLPED '10)
By Chaitali Chakrabarti, Mark Woh, Ronald G. Dreslinski, Sangwon Seo, Scott Mahlke, Trevor Mudge
Issue Date:August 2010
pp. 79-84
Power has become the most critical design constraint for embedded handheld devices. This paper proposes a power-efficient SIMD architecture, referred to as Diet SODA, for DSP applications. The key design idea is to apply near-threshold operation on a singl...
     
Proactive transaction scheduling for contention management
Found in: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (Micro-42)
By Geoffrey Blake, Ronald G. Dreslinski, Trevor Mudge
Issue Date:December 2009
pp. 156-167
Hardware Transactional Memory offers a promising high performance and easier to program alternative to lock-based synchronization for creating parallel programs. This is particularly important as hardware manufacturers continue to put more cores on die. Bu...
     
Energy efficient near-threshold chip multi-processing
Found in: Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)
By Bo Zhai, David Blaauw, Dennis Sylvester, Ronald G. Dreslinski, Trevor Mudge
Issue Date:August 2007
pp. 32-37
Subthreshold circuit design has become a popular approach for building energy efficient digital circuits. One drawback is performance degradation due to the exponentially reduced driving current. This had limited subthreshold circuits to relatively low per...
     
 1