Search For:

Displaying 1-41 out of 41 total
Utilizing Dark Silicon to Save Energy with Computational Sprinting
Found in: IEEE Micro
By Arun Raghavan,Laurel Emurian,Lei Shao,Marios Papaefthymiou,Kevin P. Pipe,Thomas F. Wenisch,Milo M. K. Martin
Issue Date:September 2013
pp. 20-28
Computational sprinting activates dark silicon to improve responsiveness by briefly but intensely exceeding a system's sustainable power limit. This article focuses on the energy implications of sprinting. The authors observe that sprinting can save energy...
 
Computational sprinting
Found in: High-Performance Computer Architecture, International Symposium on
By Arun Raghavan,Yixin Luo,Anuj Chandawalla,Marios Papaefthymiou,Kevin P. Pipe,Thomas F. Wenisch,Milo M. K. Martin
Issue Date:February 2012
pp. 1-12
Although transistor density continues to increase, voltage scaling has stalled and thus power density is increasing each technology generation. Particularly in mobile devices, which have limited cooling options, these trends lead to a utilization wall in w...
 
Spatial Memory Streaming
Found in: Computer Architecture, International Symposium on
By Stephen Somogyi, Thomas F. Wenisch, Anastassia Ailamaki, Babak Falsafi, Andreas Moshovos
Issue Date:June 2006
pp. 252-263
<p>Prior research indicates that there is much spatial variation in applications' memory access patterns. Modern memory systems, however, use small fixed-size cache blocks and as such cannot exploit the variation. Increasing the block size would not ...
 
CoScale: Coordinating CPU and Memory System DVFS in Server Systems
Found in: 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
By Qingyuan Deng,David Meisner,Abhishek Bhattacharjee,Thomas F. Wenisch,Ricardo Bianchini
Issue Date:December 2012
pp. 143-154
Recent work has introduced memory system dynamic voltage and frequency scaling (DVFS), and has suggested that balanced scaling of both CPU and the memory system is the most promising approach for conserving energy in server systems. In this paper, we first...
 
Energy-Aware Computing
Found in: IEEE Micro
By Thomas F. Wenisch,Alper Buyuktosunoglu
Issue Date:September 2012
pp. 6-8
The introduction to the special issue discusses efforts in the area of energy-aware computing.
   
Temporal instruction fetch streaming
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Michael Ferdman, Thomas F. Wenisch, Anastasia Ailamaki, Babak Falsafi, Andreas Moshovos
Issue Date:November 2008
pp. 1-10
L1 instruction-cache misses pose a critical performance bottleneck in commercial server workloads. Cache access latency constraints preclude L1 instruction caches large enough to capture the application, library, and OS instruction working sets of these wo...
 
Thinking outside the box: power management at the system level & beyond
Found in: Low Power Electronics and Design, International Symposium on
By Thomas F. Wenisch
Issue Date:August 2009
pp. 151-152
Architects and circuit designers have made enormous strides in managing the energy efficiency and peak power demands of processors and other silicon systems. Sophisticated power management features and modes are now myriad across system components, from DR...
 
SimFlex: Statistical Sampling of Computer System Simulation
Found in: IEEE Micro
By Thomas F. Wenisch, Roland E. Wunderlich, Michael Ferdman, Anastassia Ailamaki, Babak Falsafi, James C. Hoe
Issue Date:July 2006
pp. 18-31
Timing-accurate full-system multiprocessor simulations can take years because of architecture and application complexity. Statistical sampling makes simulation-based studies feasibly by providing ten-thousand-fold reductions in simulation runtime and enabl...
 
BigHouse: A simulation infrastructure for data center systems
Found in: Performance Analysis of Systems and Software, IEEE International Symmposium on
By David Meisner,Junjie Wu,Thomas F. Wenisch
Issue Date:April 2012
pp. 35-45
Recently, there has been an explosive growth in Internet services, greatly increasing the importance of data center systems. Applications served from
 
Minimizing Remote Accesses in MapReduce Clusters
Found in: 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)
By Prateek Tandon,Michael J. Cafarella,Thomas F. Wenisch
Issue Date:May 2013
pp. 1928-1936
MapReduce, in particular Hadoop, is a popular framework for the distributed processing of large datasets on clusters of relatively inexpensive servers. Although Hadoop clusters are highly scalable and ensure data availability in the face of server failures...
 
Sonic Millip3De: A massively parallel 3D-stacked accelerator for 3D ultrasound
Found in: 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
By Richard Sampson,Ming Yang,Siyuan Wei,Chaitali Chakrabarti,Thomas F. Wenisch
Issue Date:February 2013
pp. 318-329
Three-dimensional (3D) ultrasound is becoming common for non-invasive medical imaging because of its high accuracy, safety, and ease of use. Unlike other modalities, ultrasound transducers require little power, which makes hand-held imaging platforms possi...
 
Designing for Responsiveness with Computational Sprinting
Found in: IEEE Micro
By Arun Raghavan,Yixin Luo,Anuj Chandawalla,Marios Papaefthymiou,Kevin P. Pipe,Thomas F. Wenisch,Milo M.K. Martin
Issue Date:May 2013
pp. 8-15
The tight thermal constraints of mobile devices, which limit sustainable performance, and the bursty nature of interactive mobile applications call for a new design focus: enhancing user responsiveness rather than sustained throughput. To that end, this ar...
 
Composite Cores: Pushing Heterogeneity Into a Core
Found in: 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
By Andrew Lukefahr,Shruti Padmanabha,Reetuparna Das,Faissal M. Sleiman,Ronald Dreslinski,Thomas F. Wenisch,Scott Mahlke
Issue Date:December 2012
pp. 317-328
Heterogeneous multicore systems -- comprised of multiple cores with varying capabilities, performance, and energy characteristics -- have emerged as a promising approach to increasing energy efficiency. Such systems reduce energy consumption by identifying...
 
Embedded way prediction for last-level caches
Found in: 2012 IEEE 30th International Conference on Computer Design (ICCD 2012)
By Faissal M. Sleiman,Ronald G. Dreslinski,Thomas F. Wenisch
Issue Date:September 2012
pp. 167-174
This paper investigates Embedded Way Prediction for large last-level caches (LLCs): an architecture and circuit design to provide the latency of parallel tag-data access at substantial energy savings. Existing way prediction approaches for L1 caches are co...
 
Full-system analysis and characterization of interactive smartphone applications
Found in: IEEE Workload Characterization Symposium
By Anthony Gutierrez,Ronald G. Dreslinski,Thomas F. Wenisch,Trevor Mudge,Ali Saidi,Chris Emmons,Nigel Paver
Issue Date:November 2011
pp. 81-90
Smartphones have recently overtaken PCs as the primary consumer computing device in terms of annual unit shipments. Given this rapid market growth, it is important that mobile system designers and computer architects analyze the characteristics of the inte...
 
Towards a scalable data center-level evaluation methodology
Found in: Performance Analysis of Systems and Software, IEEE International Symmposium on
By David Meisner, Junjie Wu, Thomas F. Wenisch
Issue Date:April 2011
pp. 121-122
As the popularity of Internet services continues to rise, the need to understand the design of the data center systems hosting these workloads becomes increasingly important. Unfortunately, research in this area has been stifled, primarily due to a lack of...
 
Making Address-Correlated Prefetching Practical
Found in: IEEE Micro
By Thomas F. Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, Andreas Moshovos
Issue Date:January 2010
pp. 50-59
<p>Despite a decade of research demonstrating its efficacy, address-correlated prefetching has never been implemented in a shipping processor because it requires megabytes of metadata&#x2014;too large to store practically on chip. New storage-, l...
 
Store-Ordered Streaming of Shared Memory
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Thomas F. Wenisch, Stephen Somogyi, Nikolaos Hardavellas, Jangwoo Kim, Chris Gniady, Anastassia Ailamaki, Babak Falsafi
Issue Date:September 2005
pp. 75-86
<p>Coherence misses in shared-memory multiprocessors account for a substantial fraction of execution time in many important scientific and commercial workloads. Memory streaming provides a promising solution to the coherence miss bottleneck because i...
 
Temporal Streaming of Shared Memory
Found in: Computer Architecture, International Symposium on
By Thomas F. Wenisch, Stephen Somogyi, Nikolaos Hardavellas, Jangwoo Kim, Anastassia Ailamaki, Babak Falsafi
Issue Date:June 2005
pp. 222-233
<p>Coherent read misses in shared-memory multiprocessors account for a substantial fraction of execution time in many important scientific and commercial workloads. We propose Temporal Streaming, to eliminate coherent read misses by streaming data to...
 
SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling
Found in: Computer Architecture, International Symposium on
By Roland E. Wunderlich, Thomas F. Wenisch, Babak Falsafi, James C. Hoe
Issue Date:June 2003
pp. 84
Current software-based microarchitecture simulators are many orders of magnitude slower than the hardware they simulate. Hence, most microarchitecture design studies draw their conclusions from drastically truncated benchmark simulations that are often ina...
 
Active Low-Power Modes for Main Memory with MemScale
Found in: IEEE Micro
By Qingyuan Deng,Luiz Ramos,Ricardo Bianchini,David Meisner,Thomas F. Wenisch
Issue Date:May 2012
pp. 60-69
Main memory accounts for a growing fraction of server energy usage. Investigating active low-power modes for managing main memory, with a system called MemScale, the authors offer a solution for performance-aware energy management. By creating a set of low...
 
Memory persistency
Found in: 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)
By Steven Pelley,Peter M. Chen,Thomas F. Wenisch
Issue Date:June 2014
pp. 265-276
Emerging nonvolatile memory technologies (NVRAM) promise the performance of DRAM with the persistence of disk. However, constraining NVRAM write order, necessary to ensure recovery correctness, limits NVRAM write concurrency and degrades throughput. We req...
   
Sonic Millip3De: An Architecture for Handheld 3D Ultrasound
Found in: IEEE Micro
By Richard Sampson,Ming Yang,Siyuan Wei,Chaitali Chakrabarti,Thomas F. Wenisch
Issue Date:May 2014
pp. 100-108
3D ultrasound is becoming common for noninvasive medical imaging because of its high accuracy, safety, and ease of use. Unlike other modalities, ultrasound transducers require little power, which makes handheld imaging platforms possible, and several low-r...
 
System-level implications of disaggregated memory
Found in: High-Performance Computer Architecture, International Symposium on
By Kevin Lim,Yoshio Turner,Jose Renato Santos,Alvin AuYoung,Jichuan Chang,Parthasarathy Ranganathan,Thomas F. Wenisch
Issue Date:February 2012
pp. 1-12
Recent research on memory disaggregation introduces a new architectural building block -- the memory blade -- as a cost-effective approach for memory capacity expansion and sharing for an ensemble of blade servers. Memory blades augment blade servers' loca...
 
RDIP: return-address-stack directed instruction prefetching
Found in: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46)
By Aasheesh Kolli, Ali Saidi, Thomas F. Wenisch
Issue Date:December 2013
pp. 260-271
L1 instruction fetch misses remain a critical performance bottleneck, accounting for up to 40% slowdowns in server applications. Whereas instruction footprints typically fit within last-level caches, they overwhelm L1 caches, whose capacity is limited by l...
     
Thin servers with smart pipes: designing SoC accelerators for memcached
Found in: Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13)
By Ali G. Saidi, David Meisner, Kevin Lim, Parthasarathy Ranganathan, Thomas F. Wenisch
Issue Date:June 2013
pp. 36-47
Distributed in-memory key-value stores, such as memcached, are central to the scalability of modern internet services. Current deployments use commodity servers with high-end processors. However, given the cost-sensitivity of internet services and the rece...
     
Computational sprinting on a hardware/software testbed
Found in: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '13)
By Arun Raghavan, Kevin P. Pipe, Laurel Emurian, Lei Shao, Marios Papaefthymiou, Milo M.K. Martin, Thomas F. Wenisch
Issue Date:March 2013
pp. 155-166
CMOS scaling trends have led to an inflection point where thermal constraints (especially in mobile devices that employ only passive cooling) preclude sustained operation of all transistors on a chip --- a phenomenon called "dark silicon." Recent research ...
     
MultiScale: memory system DVFS with multiple memory controllers
Found in: Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design (ISLPED '12)
By Abhishek Bhattacharjee, David Meisner, Qingyuan Deng, Ricardo Bianchini, Thomas F. Wenisch
Issue Date:July 2012
pp. 297-302
The fraction of server energy consumed by the memory system has been increasing rapidly and is now on par with that consumed by processors. Recent work demonstrates that substantial memory energy can be saved with only a small, tightly-controlled performan...
     
DreamWeaver: architectural support for deep sleep
Found in: Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '12)
By David Meisner, Thomas F. Wenisch
Issue Date:March 2012
pp. 313-324
Numerous data center services exhibit low average utilization leading to poor energy efficiency. Although CPU voltage and frequency scaling historically has been an effective means to scale down power with utilization, transistor scaling trends are limitin...
     
Power management of online data-intensive services
Found in: Proceeding of the 38th annual international symposium on Computer architecture (ISCA '11)
By Christopher M. Sadler, David Meisner, Luiz Andre Barroso, Thomas F. Wenisch, Wolf-Dietrich Weber
Issue Date:June 2011
pp. 319-330
Much of the success of the Internet services model can be attributed to the popularity of a class of workloads that we call Online Data-Intensive (OLDI) services. These workloads perform significant computing over massive data sets per user request but, un...
     
Peak power modeling for data center servers with switched-mode power supplies
Found in: Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design (ISLPED '10)
By David Meisner, Thomas F. Wenisch
Issue Date:August 2010
pp. 319-324
Accurately modeling server power consumption is critical in designing data center power provisioning infrastructure. However, to date, most research proposals have used average CPU utilization to infer the power consumption of clusters, typically averaging...
     
Power routing: dynamic power provisioning in the data center
Found in: Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems (ASPLOS '10)
By David Meisner, Jack Underwood, Pooya Zandevakili, Steven Pelley, Thomas F. Wenisch
Issue Date:March 2010
pp. 222-230
Data center power infrastructure incurs massive capital costs, which typically exceed energy costs over the life of the facility. To squeeze maximum value from the infrastructure, researchers have proposed over-subscribing power circuits, relying on the ob...
     
Thinking outside the box: power management at the system level & beyond
Found in: Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design (ISLPED '09)
By Thomas F. Wenisch
Issue Date:August 2009
pp. 1-2
Architects and circuit designers have made enormous strides in managing the energy efficiency and peak power demands of processors and other silicon systems. Sophisticated power management features and modes are now myriad across system components, from DR...
     
Disaggregated memory for expansion and sharing in blade servers
Found in: Proceedings of the 36th annual international symposium on Computer architecture (ISCA '09)
By Jichuan Chang, Kevin Lim, Parthasarathy Ranganathan, Steven K. Reinhardt, Thomas F. Wenisch, Trevor Mudge
Issue Date:June 2009
pp. 70-73
Analysis of technology and application trends reveals a growing imbalance in the peak compute-to-memory-capacity ratio for future servers. At the same time, the fraction contributed by memory systems to total datacenter costs and power consumption during t...
     
InvisiFence: performance-transparent memory ordering in conventional multiprocessors
Found in: Proceedings of the 36th annual international symposium on Computer architecture (ISCA '09)
By Colin Blundell, Milo M.K. Martin, Thomas F. Wenisch
Issue Date:June 2009
pp. 70-73
A multiprocessor's memory consistency model imposes ordering constraints among loads, stores, atomic operations, and memory fences. Even for consistency models that relax ordering among loads and stores, ordering constraints still induce significant perfor...
     
Spatio-temporal memory streaming
Found in: Proceedings of the 36th annual international symposium on Computer architecture (ISCA '09)
By Anastasia Ailamaki, Babak Falsafi, Stephen Somogyi, Thomas F. Wenisch
Issue Date:June 2009
pp. 70-73
Recent research advocates memory streaming techniques to alleviate the performance bottleneck caused by the high latencies of off-chip memory accesses. Temporal memory streaming replays previously observed miss sequences to eliminate long chains of depende...
     
PowerNap: eliminating server idle power
Found in: Proceeding of the 14th international conference on Architectural support for programming languages and operating systems (ASPLOS '09)
By Brian T. Gold, David Meisner, Thomas F. Wenisch
Issue Date:March 2009
pp. 23-27
Data center power consumption is growing to unprecedented levels: the EPA estimates U.S. data centers will consume 100 billion kilowatt hours annually by 2011. Much of this energy is wasted in idle systems: in typical deployments, server utilization is bel...
     
Mechanisms for store-wait-free multiprocessors
Found in: Proceedings of the 34th annual international symposium on Computer architecture (ISCA '07)
By Anastasia Ailamaki, Andreas Moshovos, Babak Falsafi, Thomas F. Wenisch
Issue Date:June 2007
pp. 266-277
Store misses cause significant delays in shared-memory multiprocessors because of limited store buffering and ordering constraints required for proper synchronization. Today, programmers must choose from a spectrum of memory consistency models that reduce ...
     
Statistical sampling of microarchitecture simulation
Found in: ACM Transactions on Modeling and Computer Simulation (TOMACS)
By Babak Falsafi, James C. Hoe, Roland E. Wunderlich, Thomas F. Wenisch
Issue Date:July 2006
pp. 197-224
Current software-based microarchitecture simulators are many orders of magnitude slower than the hardware they simulate. Hence, most microarchitecture design studies draw their conclusions from drastically truncated benchmark simulations that are often ina...
     
TurboSMARTS: accurate microarchitecture simulation sampling in minutes
Found in: Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems (SIGMETRICS '05)
By Babak Falsafi, James C. Hoe, Roland E. Wunderlich, Thomas F. Wenisch
Issue Date:June 2005
pp. 408-409
Recent research proposes accelerating processor microarchitecture simulation through statistical sampling. Prior simulation sampling approaches construct accurate model state for each measurement by continuously warming large microarchitectural structures ...
     
Memory coherence activity prediction in commercial workloads
Found in: Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture (WMPI '04)
By Anastassia Ailamaki, Babak Falsafi, Jangwoo Kim, Nikolaos Hardavellas, Stephen Somogyi, Thomas F. Wenisch
Issue Date:June 2004
pp. 37-45
Recent research indicates that prediction-based coherence optimizations offer substantial performance improvements for scientific applications in distributed shared memory multiprocessors. Important commercial applications also show sensitivity to coherenc...
     
 1