The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - Jan. (2014 vol.25)
pp: 156-166
Yongwei Wu , Tsinghua University, Beijing
Feng Ye , Tsinghua University, Beijing
Kang Chen , Tsinghua University, Beijing
Weimin Zheng , Tsinghua University, Beijing
ABSTRACT
Cloud computing has received significant attention recently. Delivering quality guaranteed services in clouds is highly desired. Distributed file systems (DFSs) are the key component of any cloud-scale data processing middleware. Evaluating the performance of DFSs is accordingly very important. To avoid cost for late life cycle performance fixes and architectural redesign, providing performance analysis before the deployment of DFSs is also particularly important. In this paper, we propose a systematic and practical performance analysis framework, driven by architecture and design models for defining the structure and behavior of typical master/slave DFSs. We put forward a configuration guideline for specifications of configuration alternatives of such DFSs, and a practical approach for both qualitatively and quantitatively performance analysis of DFSs with various configuration settings in a systematic way. What distinguish our approach from others is that 1) most of existing works rely on performance measurements under a variety of workloads/strategies, comparing with other DFSs or running application programs, but our approach is based on architecture and design level models and systematically derived performance models; 2) our approach is able to both qualitatively and quantitatively evaluate the performance of DFSs; and 3) our approach not only can evaluate the overall performance of a DFS but also its components and individual steps. We demonstrate the effectiveness of our approach by evaluating Hadoop distributed file system (HDFS). A series of real-world experiments on EC2 (Amazon Elastic Compute Cloud), Tansuo and Inspur Clusters, were conducted to qualitatively evaluate the effectiveness of our approach. We also performed a set of experiments of HDFS on EC2 to quantitatively analyze the performance and limitation of the metadata server of DFSs. Results show that our approach can achieve sufficient performance analysis. Similarly, the proposed approach could be also applied to evaluate other DFSs such as MooseFS, GFS, and zFS.
INDEX TERMS
Unified modeling language, Performance analysis, Data models, Analytical models, Computer architecture, Time factors, Software,HDFS, Distributed file system, architecture model, practical performance analysis
CITATION
Yongwei Wu, Feng Ye, Kang Chen, Weimin Zheng, "Modeling of Distributed File Systems for Practical Performance Analysis", IEEE Transactions on Parallel & Distributed Systems, vol.25, no. 1, pp. 156-166, Jan. 2014, doi:10.1109/TPDS.2013.19
REFERENCES
[1] Amazon Simple Storage Service (Amazon S3), http://aws. amazon.comS3/, 2013.
[2] Hadoop Distributed File System (HDFS), http://hadoop. apache.orghdfs/, 2013.
[3] MooseFS, http:/www.moosefs.org/, 2013.
[4] OpenStack SWIFT, http://openstack.org/softwareopenstack-storage /, 2013.
[5] Parallel Virtual File System, http:/www.pvfs.org/, 2013.
[6] The UML MARTE Profile, http:/www.omgmarte.org/, 2013.
[7] Windows Azure, http:/www.windowsazure.com/, 2013.
[8] M.G. Baker, J.H. Hartman, M.D. Kupfer, K.W. Shirriff, and J.K. Ousterhout, "Measurements of a Distributed File System," Proc. ACM SIGOPS Operating Systems Rev., vol. 25, pp. 198-212, 1991.
[9] S. Balsamo, A. Di Marco, P. Inverardi, and M. Simeoni, "Model-Based Performance Prediction in Software Development: A Survey," IEEE Trans. Software Eng., vol. 30, no. 5, pp. 295-310, May 2004.
[10] S. Becker, H. Koziolek, and R. Reussner, "The Palladio Component Model for Model-Driven Performance Prediction," J. Systems and Software, vol. 82, no. 1, pp. 3-22, 2009.
[11] S. Ghemawat, H. Gobioff, and S.T. Leung, "The Google File System," vol. 37, pp. 29-43, 2003.
[12] Q. He, S. Zhou, B. Kobler, D. Duffy, and T. McGlynn, "Case Study for Running HPC Applications in Public Clouds," Proc. 19th ACM Int'l Symp. High Performance Distributed Computing, pp. 395-401, 2010.
[13] J.H. Howard, M.L. Kazar, S.G. Menees, D.A. Nichols, M. Satyanarayanan, R.N. Sidebotham, and M.J. West, "Scale and Performance in a Distributed File System," ACM Trans. Computer Systems, vol. 6, no. 1, pp. 51-81, 1988.
[14] S. Kundu, R. Rangaswami, A. Gulati, M. Zhao, and K. Dutta, "Modeling Virtualized Applications Using Machine Learning Techniques," Proc. Eighth ACM SIGPLAN/SIGOPS Conf. Virtual Execution Environment, 2012.
[15] W. LigonIII and R. Ross, "Implementation and Performance of a Parallel File System for High Performance Distributed Applications," Proc. IEEE Fifth Int'l Symp. High Performance Distributed Computing, pp. 471-480, 1996.
[16] S. Mohammad, S. Breß, and E. Schallehn, "Cloud Data Management: A Short Overview and Comparison of Current Approaches," Proc. 24th GI-Workshop Foundations of Databases (Grundlagen von Datenbanken), 2012.
[17] B. Nicolae, D. Moise, G. Antoniu, L. Boug, and M. Dorier, "BlobSeer: Bringing High Throughput under Heavy Concurrency to Hadoop Map-Reduce Applications," Proc. IEEE Int'l Symp. Parallel and Distributed Processing (IPDPS), pp. 1-11, 2010.
[18] OMG, "UML 2.2 Superstructure Specification (formal/2009-02-04)."
[19] D. Petriu and H. Shen, "Applying the UML Performance Profile: Graph Grammar-Based Derivation of LQN Models from UML Specifications," Proc. 12th Int'l Conf. Computer Performance Evaluation: Modelling Techniques and Tools, pp. 183-204, 2002.
[20] S. Pllana and T. Fahringer, "On customizing the UML for Modeling Performance-Oriented Applications," Proc. Winter Simulation Conf., pp. 83-102, 2002.
[21] O. Rodeh and A. Teperman, "zFS-a Scalable Distributed File System Using Object Disks," Proc. IEEE/11th 20th NASA Goddard Conf. Mass Storage Systems and Technologies (MSST '03), pp. 207-218, 2003.
[22] A. Silberschatz, P.B. Galvin, and G. Gagne, Operating System Concepts, vol. 89, Addison-Wesley, 1994.
[23] J. Stribling, Y. Sovran, I. Zhang, X. Pretzer, J. Li, M.F. Kaashoek, and R. Morris, "Flexible, Wide-Area Storage for Distributed Systems with WheelFS," Proc. Sixth USENIX Symp. Networked Systems Design and Implementation, pp. 43-58, 2009.
[24] W. Tantisiriroj, S.W. Son, S. Patil, S.J. Lang, G. Gibson, and R.B. Ross, "On the Duality of Data-Intensive File System Design: Reconciling HDFS and PVFS," Proc. Int'l Conf. for High Performance Computing, pp. 1-12, 2011.
[25] R. Tawhid and D. Petriu, "Integrating Performance Analysis in the Model Driven Development of Software Product Lines," Proc. 11th Int'l Conf. Model Driven Eng. Languages and Systems, pp. 490-504, 2008.
[26] W. Tantisiriroj, S. Patil, and G. Gibson, "Data-Intensive File Systems for Internet Services: A Rose by Any Other Name," Parallel Data Laboratory, Carnegie Mello Univ., 2008.
[27] S.A. Weil, S.A. Brandt, E.L. Miller, D.D.E. Long, and C. Maltzahn, "Ceph: A Scalable, High-Performance Distributed File System," Proc. Seventh Symp. Operating Systems Design and Implementation (OSDI), pp. 307-320, 2006.
[28] B. Welch, M. Unangst, Z. Abbasi, G. Gibson, B. Mueller, J. Small, J. Zelenka, and B. Zhou, "Scalable Performance of the Panasas Parallel File System," Proc. Sixth USENIX Conf. File and Storage Technologies, p. 2, 2008.
[29] Y. Wu, K. Hwang, Y. Yuan, and W. Zheng, "Adaptive Workload Prediction of Grid Performance in Confidence Windows," IEEE Trans. Parallel and Distributed Systems, vol. 21, no. 7, pp. 925-938, July 2010.
[30] Y. Zhai, M. Liu, J. Zhai, X. Ma, and W. Chen, "Cloud versus In-House Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications," Proc State of the Practice Reports Article. (SC '11), p. 11, 2011.
22 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool