This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
An Energy-Oriented Evaluation of Buffer Cache Algorithms Using Parallel I/O Workloads
November 2008 (vol. 19 no. 11)
pp. 1565-1578
Jianhui Yue, University of Maine, Orono
Yifeng Zhu, University of Maine, Orono
Zhao Cai, University of Maine, Orono
Power consumption is an important issue for cluster supercomputers as it directly affects running cost and cooling requirements. This paper investigates the memory energy efficiency of high-end data servers used for supercomputers. Emerging memory technologies allow memory devices to dynamically adjust their power states and enable free rides by overlapping multiple DMA transfers from different I/O buses to the same memory device. To achieve maximum energy saving, the memory management on data servers needs to judiciously utilize these energy-aware devices. As we explore different management schemes under five real-world parallel I/O workloads, we find that the memory energy behavior is determined by a complex interaction among four important factors: (1) cache hit rates that may directly translate performance gain into energy saving, (2) cache populating schemes that perform buffer allocation and affect access locality at the chip level, (3) request clustering that aims to temporally align memory transfers from different buses into the same memory chips, and (4) access patterns in workloads that affect the first three factors.

[1] J. Yue, Y. Zhu, and C. Zhao, “Evaluating Memory Energy Efficiency in Parallel I/O Workloads,” Proc. IEEE Int'l Conf. Cluster Computing (Cluster '07), pp. 21-30, Best Paper Award, Sept. 2007.
[2] L.A. Barroso, J. Dean, and U. Holzle, “Web Search for a Planet: The Google Cluster Architecture,” IEEE Micro, vol. 23, no. 2, pp.22-28, 2003.
[3] B. Moore, “Take the Data Center Power and Cooling Challenge,” Energy User News, Aug. 2002.
[4] H. Meuer, E. Strohmaier, J. Dongarra, and H.D. Simon, Top 500 Supercomputers, http:/www.top500.org, 2005.
[5] Y. Zhu and H. Jiang, “CEFT: A Cost-Effective, Fault-Tolerant Parallel Virtual File System,” J. Parallel and Distributed Computing, vol. 66, no. 2, pp. 291-306, 2006.
[6] M.E. Tolentino, J. Turner, and K.W. Cameron, “An Implementation of Page Allocation Shaping for Energy Efficiency,” Proc. Third Workshop High-Performance, Power-Aware Computing (HP-PAC '07), Apr. 2007.
[7] V. Pandey, W. Jiang, Y. Zhou, and R. Bianchini, “DMA-Aware Memory Energy Management for Data Servers,” Proc. 10th Int'l Symp. High-Performance Computer Architecture (HPCA), 2006.
[8] A.R. Lebeck, X. Fan, H. Zeng, and C. Ellis, “Power Aware Page Allocation,” Proc. Ninth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '00), pp.105-116, 2000.
[9] C. Lefurgy, K. Rajamani, F. Rawson, W. Felter, M. Kistler, and T.W. Keller, “Energy Management for Commercial Servers,” Computer, vol. 36, no. 12, pp. 39-48, 2003.
[10] L.A. Belady, “A Study of Replacement Algorithms for a Virtual-Storage Computer,” IBM Systems J., vol. 5, no. 2, pp. 78-101, 1966.
[11] E.J. O'Neil, P.E. O'Neil, and G. Weikum, “The Lru-K Page Replacement Algorithm for Database Disk Buffering,” Proc. ACM SIGMOD '93, pp. 297-306, 1993.
[12] T. Johnson and D. Shasha, “2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm,” Proc. 20th Int'l Conf. Very Large Data Bases (VLDB '94), pp. 439-450, 1994.
[13] D. Lee, J. Choi, J.-H. Kim, S.H. Noh, S.L. Min, Y. Cho, and C.S. Kim, “On the Existence of a Spectrum of Policies That Subsumes the Least Recently Used (LRU) and Least Frequently Used (LFU) Policies,” Proc. ACM Sigmetrics '99, pp.134-143, 1999.
[14] Y. Zhou, J. Philbin, and K. Li, “The Multi-Queue Replacement Algorithm for Second Level Buffer Caches,” Proc. General Track: Usenix Ann. Technical Conf., pp. 91-104, 2001.
[15] S. Jiang and X. Zhang, “LIRS: An Efficient Low Inter-Reference Recency Set Replacement Policy to Improve Buffer Cache Performance,” Proc. ACM Sigmetrics '02, pp. 31-42, June 2002.
[16] S. Jiang, F. Chen, and X. Zhang, “CLOCK-Pro: An Effective Improvement of the CLOCK Replacement,” Proc. Usenix Ann. Technical Conf., Apr. 2005.
[17] N. Megiddo and D.S. Modha, “ARC: A Self-Tuning, Low Overhead Replacement Cache,” Proc. Second Usenix Conf. File and Storage Technologies (FAST '03), pp. 115-130, Mar. 2003.
[18] S. Bansal and D.S. Modha, CAR: Clock with Adaptive Replacement, pp. 187-200, Mar. 2004.
[19] Intel, Server and Workstation Chipsets, http://www.intel.com/products/serverchipsets /, 2008.
[20] R. Inc., Rambus Memory Chips, http:/www.rambus.com, 2008.
[21] M. Lee, E. Seo, J. Lee, and J. Kim, “PABC: Power-Aware Buffer Cache Management for Low Power Consumption,” IEEE Trans. Computers, vol. 56, no. 4, Apr. 2007.
[22] F. Wang, Q. Xin, B. Hong, S.A. Brandt, E.L. Miller, D.D.E. Long, and T.T. McLarty, “File System Workload Analysis for Large ScaleScientific Computing Applications,” Proc. 20th IEEE/11th NASA Goddard Conf. Mass Storage Systems and Technologies (MSST'04), http://ssrc.cse.ucsc.edu/Paperswang-mss04.pdf , Apr. 2004.
[23] M. Uysal, A. Acharya, and J. Saltz, “Requirements of I/O Systems for Parallel Machines: An Application-Driven Study,” technical report, 1997.
[24] R. Hedges, B. Loewe, T. McLarty, and C. Morrone, “Parallel File System Testing for the Lunatic Fringe: The Care and Feeding of Restless I/O Power Users,” Proc. 22nd IEEE / 13th NASA Goddard Conf. Mass Storage Systems and Technologies (MSST '05), pp. 3-17, 2005.
[25] National Center for Biotechnology Information (NCBI), N. L. of Medicine and N. I. of Health, ftp:/ftp.ncbi.nih.gov/, 2005.
[26] A.E. Darling, L. Carey, and W. chun Feng, “The Design, Implementation, and Evaluation of mpiBLAST,” Proc. Cluster World Conf. and Expo, June 2003.
[27] Y. Zhu, H. Jiang, X. Qin, and D. Swanson, “A Case Study of Parallel I/O for Biological Sequence Analysis on Linux Clusters,” Proc. IEEE Int'l Conf. Cluster Computing (Cluster '03), pp. 308-315, Dec. 2003.
[28] J.S. Bucy et al., The Disksim Simulation Environment Version 3.0 Reference Manual, http://www.pdl.cmu.eduDiskSim, 2008.
[29] F. Rawson, “Mempower: A Simple Memory Power Analysis Tool Set,” technical report, http://www.research.ibm.com/arl/publications papers, 2004.
[30] I. Hur, “Enhancing Memory Controllers to Improve Dram Power and Performance,” PhD dissertation, Univ. of Texas at Austin, http://www.cs.utexas.edu/~lin/papersibrahim.pdf , 2006.
[31] A. Purakayastha, C.S. Ellis, D. Kotz, N. Nieuwejaar, and M. Best, “Characterizing Parallel File-Access Patterns on a Large-Scale Multiprocessor,” Proc. Ninth Int'l Parallel Processing Symp. (IPPS'95), pp. 165-172, 1995.
[32] B.K. Pasquale and G.C. Polyzos, “Dynamic I/O Characterization of I/O Intensive Scientific Applications,” Proc. Conf. Supercomputing (Supercomputing '94), pp. 660-669, 1994.
[33] W.W. Hsu and A.J. Smith, “The Performance Impact of I/O Optimizations and Disk Improvements,” IBM J. Research and Development, vol. 48, no. 2, pp. 255-289, 2004.
[34] C. hsing Hsu and W. chun Feng, “A Power-Aware Run-Time System for High-Performance Computing,” Proc. ACM/IEEE Conf. Supercomputing (SC '05), p. 1, 2005.
[35] E. Pinheiro, R. Bianchini, E.V. Carrera, and T. Heath, “Load Balancing and Unbalancing for Power and Performance in Cluster-Based Systems,” Proc. Workshop Compilers and Operating Systems for Low Power (COLP '01), http://research.ac.upc.es/pact01/colppaper04.pdf , Sept. 2001.
[36] V.W. Freeh and D.K. Lowenthal, “Using Multiple Energy Gears in Mpi Programs on a Power-Scalable Cluster,” Proc. 10th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '05), pp. 164-173, 2005.
[37] N. Kappiah, V.W. Freeh, and D.K. Lowenthal, “Just in Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in Mpi Programs,” Proc. ACM/IEEE Conf. Supercomputing (SC '05), p. 33, 2005.
[38] R. Ge, X. Feng, and K.W. Cameron, “Performance-Constrained Distributed DVS Scheduling for Scientific Applications on Power-Aware Clusters,” Proc. ACM/IEEE Conf. Supercomputing (SC '05), p. 34, 2005.
[39] B. Lawson and E. Smirni, “Power-Aware Resource Allocation in High-End Systems via Online Simulation,” Proc. 19th Ann. Int'l Conf. Supercomputing (ICS '05), pp. 229-238, 2005.
[40] K. Coloma, A. Choudhary, A. Ching, W.K. Liao, S.W. Son, M. Kandemir, and L. Ward, “Power and Performance in I/O for Scientific Applications,” Proc. 19th IEEE Int'l Parallel and Distributed Processing Symp. Workshop 10 (IPDPS '05), p. 224.2, 2005.
[41] V. Delaluz, A. Sivasubramaniam, M. Kandemir, N. Vijaykrishnan, and M.J. Irwin, “Scheduler-Based Dram Energy Management,” Proc. 39th Conf. Design Automation (DAC '02), pp. 697-702, 2002.
[42] H. Huang, P. Pillai, and K.G. Shin, “Design and Implementation of Power-Aware Virtual Memory,” Proc. Usenix Ann. Technical Conf., pp. 57-70, citeseer.ist.psu.edu/articlehuang03design.html , 2003.
[43] M.E. Tolentino, J. Turner, and K.W. Cameron, “Memory-Miser: A Performance-Constrained Runtime System for Power-Scalable Clusters,” Proc. Fourth Int'l Conf. Computing Frontiers (CF '07), pp. 237-246, 2007.
[44] B. Diniz, D. Guedes, W. Meira Jr., and R. Bianchini, “Limiting the Power Consumption of Main Memory,” Proc. Int'l Symp. Computer Architecture (ISCA '07), pp. 290-301, June 2007.
[45] V.D.L. Luz, M. Kandemir, and I. Kolcu, “Automatic Data Migration for Reducing Energy Consumption in Multi-Bank Memory Systems,” Proc. 39th Conf. Design Automation (DAC '02), pp. 213-218, 2002.
[46] X. Li, Z. Li, Y. Zhou, and S. Adve, “Performance Directed Energy Management for Main Memory and Disks,” Trans. Storage, vol. 1, no. 3, pp. 346-380, 2005.
[47] L. Cai and Y.-H. Lu, “Joint Power Management of Memory and Disk,” Proc. Conf. Design, Automation and Test in Europe (DATE '05), pp. 86-91, 2005.
[48] Q. Zhu and Y. Zhou, “Power Aware Storage Cache Management,” IEEE Trans. Computers, vol. 54, no. 5, pp. 587-602, May 2005.

Index Terms:
Main memory, Storage Management, Operating Systems, Software/Software Engineering, Storage hierarchies, Parallel I/O, Interconnections (Subsystems), I/O and Data Communications Hardware
Citation:
Jianhui Yue, Yifeng Zhu, Zhao Cai, "An Energy-Oriented Evaluation of Buffer Cache Algorithms Using Parallel I/O Workloads," IEEE Transactions on Parallel and Distributed Systems, vol. 19, no. 11, pp. 1565-1578, Nov. 2008, doi:10.1109/TPDS.2008.109
Usage of this product signifies your acceptance of the Terms of Use.