Block-Based Concurrent and Storage-Aware Data Streaming for Grid Applications with Lots of Small Files
Cluster Computing and the Grid, IEEE International Symposium on (2009)
May 18, 2009 to May 21, 2009
Data streaming management and scheduling is required by many grid computing applications, especially when the volume of data to be processed is extremely high while available storage is relatively limited. Big bulk of data from scientific experiments is usually partitioned into lots of small files (LOSF), bringing challenges to data streaming supports. Block-based data transferring is proposed in this work and implemented using GridFTP, where the number of blocks or the size of each block must be carefully scheduled, taking makespan and available storage into account simultaneously. To increase processing efficiency, data streaming and processing have to be performed concurrently; data streaming scheduling must be storage-aware to avoid data overflow. Experimental results show that the optimization method for block-based concurrent and storage-aware data streaming proposed in this work is efficient to deal with the LOSF problem with a relatively good performance in terms of makespan and storage usage.
Grid Computing, Data Streaming, Lots of Small Files
L. Liu, Y. Zhong, W. Zhang, J. Cao and C. Wu, "Block-Based Concurrent and Storage-Aware Data Streaming for Grid Applications with Lots of Small Files," Cluster Computing and the Grid, IEEE International Symposium on(CCGRID), Shanghai, China, 2009, pp. 538-543.