The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - December (2011 vol.22)
pp: 2033-2045
Esma Yildirim , The State University of New York at Buffalo, Buffalo
Dengpan Yin , Louisiana State University, Baton Rouge
Tevfik Kosar , The State University of New York at Buffalo, Buffalo
ABSTRACT
Wide area data transfer may be a major bottleneck for the end-to-end performance of distributed applications. A practical way of increasing the wide area throughput at the application layer is using multiple parallel streams. Although increased number of parallel streams may yield much better performance than using a single stream, overwhelming the network by opening too many streams may have an inverse effect. The congestion created by excess number of streams may cause a drop down in the throughput achieved. Hence, it is important to decide on the optimal number of streams without congesting the network. Predicting this "optimum” number is not straightforward, since it depends on many parameters specific to each individual transfer. Generic models that try to predict this number either rely too much on historical information or fail to achieve accurate predictions. In this paper, we present a set of new models which aim to approximate the optimal number with least history information and lowest prediction overhead. An algorithm is introduced to select the best combination of historic information to do the prediction for evaluation purposes as well as optimizing prediction by reducing error rate. We measure the feasibility and accuracy of the proposed prediction models by comparing to actual GridFTP data transfer by using little historical information and have seen that we could predict the throughput of parallel streams accurately and find a very close approximation of the optimal stream number.
INDEX TERMS
Distributed applications, modeling and prediction, parallelism and concurrency, network protocols.
CITATION
Esma Yildirim, Dengpan Yin, Tevfik Kosar, "Prediction of Optimal Parallelism Level in Wide Area Data Transfers", IEEE Transactions on Parallel & Distributed Systems, vol.22, no. 12, pp. 2033-2045, December 2011, doi:10.1109/TPDS.2011.228
REFERENCES
[1] L. Eggert, J. Heideman, and J. Touch, "Effects of Ensemble tcp," ACM Computer Comm. Rev., vol. 30, no. 1, pp. 15-29, Jan. 2000.
[2] R.P. Karrer, J. Park, and J. Kim, "Adaptive Data Block Scheduling for Parallel Streams," technical report, Deutsche Telekom Laboratories, 2006.
[3] J. Crowcroft and P. Oechslin, "Differentiated End-to-End Internet Services Using a Weighted Proportional Fair Sharing tcp," ACM SIGCOMM Computer Comm. Rev., vol. 28, no. 3, pp. 53-69, July 1998.
[4] G. Kola and M.K. Vernon, "Target Bandwidth Sharing Using Endhost Measures," Performance Evaluation, vol. 64, nos. 9-12, pp. 948-964, Oct. 2007.
[5] J. Lee, D. Gunter, B. Tierney, B. Allcock, J. Bester, J. Bresnahan, and S. Tuecke, "Applied Techniques for High Bandwidth Data Transfers across Wide Area Networks," Proc. Int'l Conf. Computing in High Energy and Nuclear Physics (CHEP '01), Sept. 2001.
[6] T. Kosar and M. Livny, "Stork: Making Data Placement a First Class Citizen in the Grid," Proc. IEEE Int'l Conf. Distributed Computing Systems (ICDCS '04), pp. 342-349, 2004.
[7] G. Kola, T. Kosar, and M. Livny, "Run-Time Adaptation of Grid Data-Placement Jobs," Scalable Computing: Practice and Experience, vol. 6, no. 3, pp. 33-43, 2005.
[8] H. Sivakumar, S. Bailey, and R.L. Grossman, "Psockets: The Case for Application-Level Network Striping for Data Intensive Applications Using High Speed Wide Area Networks," Proc. IEEE Super Computing Conf. (SC '00), p. 38, Nov. 2000.
[9] H. Balakrishman, V.N. Padmanabhan, S. Seshan, and R.H.K.M. Stemm, "Tcp Behavior of a Busy Internet Server: Analysis and Improvements," Proc. IEEE INFOCOM '98, pp. 252-262, Mar. 1998.
[10] T.J. Hacker, B.D. Noble, and B.D. Atley, "Adaptive Data Block Scheduling for Parallel Streams," Proc. IEEE Int'l Symp. High Performance Distributed Computing (HPDC '05), pp. 265-275, July 2005.
[11] D. Lu, Y. Qiao, and P.A. Dinda, "Characterizing and Predicting tcp Throughput on the Wide Area Network," Proc. IEEE Int'l Conf. Distributed Computing Systems (ICDCS '05), pp. 414-424, June 2005.
[12] T.J. Hacker, B.D. Noble, and B.D. Atley, "The End-to-End Performance Effects of Parallel tcp Sockets on a Lossy Wide Area Network," Proc. IEEE Int'l Symp. Parallel and Distributed Processing (IPDPS '02), pp. 434-443, 2002.
[13] D. Lu, Y. Qiao, P.A. Dinda, and F.E. Bustamante, "Modeling and Taming Parallel tcp on the Wide Area Network," Proc. IEEE Int'l Symp. Parallel and Distributed Processing (IPDPS '05), p. 68b, Apr. 2005.
[14] E. Altman, D. Barman, B. Tuffin, and M. Vojnovic, "Parallel tcp Sockets: Simple Model, Throughput and Validation," Proc. IEEE INFOCOM '06, pp. 1-12, Apr. 2006.
[15] Globus Toolkit, http:/www.globus.org, 2011.
[16] Wireshark, http:/www.wireshark.org, 2011.
[17] Loni Optical Network, http:/www.loni.org, 2011.
[18] Srm, https://sdm.lbl.govsrm-wg/, 2011.
15 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool