This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Prediction of Optimal Parallelism Level in Wide Area Data Transfers
December 2011 (vol. 22 no. 12)
pp. 2033-2045
Esma Yildirim, The State University of New York at Buffalo, Buffalo
Dengpan Yin, Louisiana State University, Baton Rouge
Tevfik Kosar, The State University of New York at Buffalo, Buffalo
Wide area data transfer may be a major bottleneck for the end-to-end performance of distributed applications. A practical way of increasing the wide area throughput at the application layer is using multiple parallel streams. Although increased number of parallel streams may yield much better performance than using a single stream, overwhelming the network by opening too many streams may have an inverse effect. The congestion created by excess number of streams may cause a drop down in the throughput achieved. Hence, it is important to decide on the optimal number of streams without congesting the network. Predicting this "optimum” number is not straightforward, since it depends on many parameters specific to each individual transfer. Generic models that try to predict this number either rely too much on historical information or fail to achieve accurate predictions. In this paper, we present a set of new models which aim to approximate the optimal number with least history information and lowest prediction overhead. An algorithm is introduced to select the best combination of historic information to do the prediction for evaluation purposes as well as optimizing prediction by reducing error rate. We measure the feasibility and accuracy of the proposed prediction models by comparing to actual GridFTP data transfer by using little historical information and have seen that we could predict the throughput of parallel streams accurately and find a very close approximation of the optimal stream number.

[1] L. Eggert, J. Heideman, and J. Touch, "Effects of Ensemble tcp," ACM Computer Comm. Rev., vol. 30, no. 1, pp. 15-29, Jan. 2000.
[2] R.P. Karrer, J. Park, and J. Kim, "Adaptive Data Block Scheduling for Parallel Streams," technical report, Deutsche Telekom Laboratories, 2006.
[3] J. Crowcroft and P. Oechslin, "Differentiated End-to-End Internet Services Using a Weighted Proportional Fair Sharing tcp," ACM SIGCOMM Computer Comm. Rev., vol. 28, no. 3, pp. 53-69, July 1998.
[4] G. Kola and M.K. Vernon, "Target Bandwidth Sharing Using Endhost Measures," Performance Evaluation, vol. 64, nos. 9-12, pp. 948-964, Oct. 2007.
[5] J. Lee, D. Gunter, B. Tierney, B. Allcock, J. Bester, J. Bresnahan, and S. Tuecke, "Applied Techniques for High Bandwidth Data Transfers across Wide Area Networks," Proc. Int'l Conf. Computing in High Energy and Nuclear Physics (CHEP '01), Sept. 2001.
[6] T. Kosar and M. Livny, "Stork: Making Data Placement a First Class Citizen in the Grid," Proc. IEEE Int'l Conf. Distributed Computing Systems (ICDCS '04), pp. 342-349, 2004.
[7] G. Kola, T. Kosar, and M. Livny, "Run-Time Adaptation of Grid Data-Placement Jobs," Scalable Computing: Practice and Experience, vol. 6, no. 3, pp. 33-43, 2005.
[8] H. Sivakumar, S. Bailey, and R.L. Grossman, "Psockets: The Case for Application-Level Network Striping for Data Intensive Applications Using High Speed Wide Area Networks," Proc. IEEE Super Computing Conf. (SC '00), p. 38, Nov. 2000.
[9] H. Balakrishman, V.N. Padmanabhan, S. Seshan, and R.H.K.M. Stemm, "Tcp Behavior of a Busy Internet Server: Analysis and Improvements," Proc. IEEE INFOCOM '98, pp. 252-262, Mar. 1998.
[10] T.J. Hacker, B.D. Noble, and B.D. Atley, "Adaptive Data Block Scheduling for Parallel Streams," Proc. IEEE Int'l Symp. High Performance Distributed Computing (HPDC '05), pp. 265-275, July 2005.
[11] D. Lu, Y. Qiao, and P.A. Dinda, "Characterizing and Predicting tcp Throughput on the Wide Area Network," Proc. IEEE Int'l Conf. Distributed Computing Systems (ICDCS '05), pp. 414-424, June 2005.
[12] T.J. Hacker, B.D. Noble, and B.D. Atley, "The End-to-End Performance Effects of Parallel tcp Sockets on a Lossy Wide Area Network," Proc. IEEE Int'l Symp. Parallel and Distributed Processing (IPDPS '02), pp. 434-443, 2002.
[13] D. Lu, Y. Qiao, P.A. Dinda, and F.E. Bustamante, "Modeling and Taming Parallel tcp on the Wide Area Network," Proc. IEEE Int'l Symp. Parallel and Distributed Processing (IPDPS '05), p. 68b, Apr. 2005.
[14] E. Altman, D. Barman, B. Tuffin, and M. Vojnovic, "Parallel tcp Sockets: Simple Model, Throughput and Validation," Proc. IEEE INFOCOM '06, pp. 1-12, Apr. 2006.
[15] Globus Toolkit, http:/www.globus.org, 2011.
[16] Wireshark, http:/www.wireshark.org, 2011.
[17] Loni Optical Network, http:/www.loni.org, 2011.
[18] Srm, https://sdm.lbl.govsrm-wg/, 2011.

Index Terms:
Distributed applications, modeling and prediction, parallelism and concurrency, network protocols.
Citation:
Esma Yildirim, Dengpan Yin, Tevfik Kosar, "Prediction of Optimal Parallelism Level in Wide Area Data Transfers," IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 12, pp. 2033-2045, Dec. 2011, doi:10.1109/TPDS.2011.228
Usage of this product signifies your acceptance of the Terms of Use.