|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
2011 IEEE/ACM 12th International Conference on Grid Computing
Using the Gfarm File System as a POSIX Compatible Storage Platform for Hadoop MapReduce Applications
Lyon, France
September 21-September 23
ISBN: 978-0-7695-4572-1
| ASCII Text | x | ||
| Shunsuke Mikami, Kazuki Ohta, Osamu Tatebe, "Using the Gfarm File System as a POSIX Compatible Storage Platform for Hadoop MapReduce Applications," Grid Computing, IEEE/ACM International Workshop on, pp. 181-189, 2011 IEEE/ACM 12th International Conference on Grid Computing, 2011. | |||
| BibTex | x | ||
| @article{ 10.1109/Grid.2011.31, author = {Shunsuke Mikami and Kazuki Ohta and Osamu Tatebe}, title = {Using the Gfarm File System as a POSIX Compatible Storage Platform for Hadoop MapReduce Applications}, journal ={Grid Computing, IEEE/ACM International Workshop on}, volume = {0}, year = {2011}, issn = {1550-5510}, pages = {181-189}, doi = {http://doi.ieeecomputersociety.org/10.1109/Grid.2011.31}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Grid Computing, IEEE/ACM International Workshop on TI - Using the Gfarm File System as a POSIX Compatible Storage Platform for Hadoop MapReduce Applications SN - 1550-5510 SP181 EP189 A1 - Shunsuke Mikami, A1 - Kazuki Ohta, A1 - Osamu Tatebe, PY - 2011 KW - MapReduce KW - Hadoop KW - Gfarm KW - Distributed file system VL - 0 JA - Grid Computing, IEEE/ACM International Workshop on ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/Grid.2011.31
MapReduce is a promising parallel programming model for processing large data sets. Hadoop is an up-and-coming open-source implementation of MapReduce. It uses the Hadoop Distributed File System (HDFS) to store input and output data. Due to a lack of POSIX compatibility, it is difficult for existing software to directly access data stored in HDFS. Therefore, it is not possible to share storage between existing software and MapReduce applications. In order for external applications to process data using MapReduce, we must first import the data, process it, then export the output data into a POSIX compatible file system. This results in a large number of redundant file operations. In order to solve this problem we propose using Gfarm file system instead of HDFS. Gfarm is a POSIX compatible distributed file system and has similar architecture to HDFS. We design and implement of Hadoop-Gfarm plug-in which enables Hadoop MapReduce to access files on Gfarm efficiently. We compared the MapReduce workload performance of HDFS, Gfarm, PVFS and Gluster FS, which are open-source distributed file systems. Our various evaluations show that Gfarm performed just as well as Hadoop's native HDFS. In most evaluations, Gfarm performed bettar than twice as well as PVFS and Gluster FS.
Index Terms:
MapReduce, Hadoop, Gfarm, Distributed file system
Citation:
Shunsuke Mikami, Kazuki Ohta, Osamu Tatebe, "Using the Gfarm File System as a POSIX Compatible Storage Platform for Hadoop MapReduce Applications," grid, pp.181-189, 2011 IEEE/ACM 12th International Conference on Grid Computing, 2011
Usage of this product signifies your acceptance of the Terms of Use.
