The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.11 - November (2011 vol.22)
pp: 1896-1903
Bahman Javadi , University of Melbourne, Melbourne
Derrick Kondo , INRIA, Monbonnot Saint Martin
Jean-Marc Vincent , University of Joseph Fourier, Grenoble
David P. Anderson , U.C. Berkeley Space Sciences Laboratory, Berkeley
ABSTRACT
In the age of cloud, Grid, P2P, and volunteer distributed computing, large-scale systems with tens of thousands of unreliable hosts are increasingly common. Invariably, these systems are composed of heterogeneous hosts whose individual availability often exhibit different statistical properties (for example stationary versus nonstationary behavior) and fit different models (for example exponential, Weibull, or Pareto probability distributions). In this paper, we describe an effective method for discovering subsets of hosts whose availability have similar statistical properties and can be modeled with similar probability distributions. We apply this method with about 230,000 host availability traces obtained from a real Internet-distributed system, namely SETI@home. We find that about 21 percent of hosts exhibit availability, that is, a truly random process, and that these hosts can often be modeled accurately with a few distinct distributions from different families. We show that our models are useful and accurate in the context of a scheduling problem that deals with resource brokering. We believe that these methods and models are critical for the design of stochastic scheduling algorithms across large systems where host availability is uncertain.
INDEX TERMS
Statistical availability models, reliability, resource failures, stochastic scheduling.
CITATION
Bahman Javadi, Derrick Kondo, Jean-Marc Vincent, David P. Anderson, "Discovering Statistical Models of Availability in Large Distributed Systems: An Empirical Study of SETI@home", IEEE Transactions on Parallel & Distributed Systems, vol.22, no. 11, pp. 1896-1903, November 2011, doi:10.1109/TPDS.2011.50
REFERENCES
[1] J. Anselmi and B. Gaujal, "On the Price of Anarchy and the Optimal Routing of Parallel Non-Observable Queues," Research Report 00457603, INRIA Rhone-Alpes, Feb. 2010.
[2] C. Bell and S. Stidham, "Individual versus Social Optimization in the Allocation of Customers to Alternative Servers," Management Science, vol. 29, pp. 831-839, 1983.
[3] W. Bolosky, J. Douceur, D. Ely, and M. Theimer, "Feasibility of a Serverless Distributed File System Deployed on an Existing Set of Desktop PCs," Proc. ACM SIGMETRICS Int'l Conf. Measurement and Modeling of Computer Systems, 2000.
[4] P.J. Brockwell and R.A. Davis, Introduction to Time Series and Forecasting. Springer, 2002.
[5] G. Cirrone et al., "A Goodness-of-Fit Statistical Toolkit," IEEE Trans. Nuclear Science, vol. 51, no. 5, pp. 2056-2063, Oct. 2004.
[6] C. Elkan, "Using the Triangle Inequality to Accelerate K-Means," Proc. Int'l Conf. Machine Learning (ICML), pp. 147-153, 2003.
[7] T. Estrada, K. Reed, and M. Taufer, "Modeling Job Lifespan Delays in Volunteer Computing Projects," Proc. Ninth IEEE Int'l Symp. Cluster Computing and Grid (CCGrid), 2009.
[8] P. Kacsuk et al., "EDGeS: Bridging EGEE to Boinc and Xtremweb," J. Grid Computing, vol. 7, no. 3, pp. 335-354, 2009.
[9] A. Hordijk and D.A. van der Laan, "Bounds for Deterministic Periodic Routing Sequences," Proc. Eighth Int'l Conf. Integer Programming and Combinatorial Optimization (IPCO), pp. 236-250, 2001.
[10] A. Iosup, H. Li, M. Jan, S. Anoep, C. Dumitrescu, L. Wolters, and D.H.J. Epema, "The Grid Workloads Archive," Future Generation Computer Systems, vol. 24, pp. 672-686, 2008.
[11] A. Iosup, O. Sonmez, S. Anoep, and D.H.J. Epema, "The Performance of Bags-of-Tasks in Large-Scale Distributed Computing Systems," Proc. IEEE Symp. High Performance Distributed Computing (HPDC '08), 2008.
[12] B. Javadi, D. Kondo, J.M. Vincent, and D.P. Anderson, "Mining for Statistical Availability Models in Large-Scale Distributed Systems: An Empirical Study of Seti@home," Proc. 17th IEEE/ACM Int'l Symp. Modelling, Analysis and Simulation of Computer, and Telecomm. Systems (MASCOTS), Sept. 2009.
[13] L. Kleinrock and W. Korfhage, "Collecting Unused Processing Capacity: An Analysis of Transient Distributed Systems," IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 5, pp. 535-546, May 1993.
[14] D. Kondo, A. Andrzejak, and D.P. Anderson, "On Correlated Availability in Internet Distributed Systems," Proc. Ninth IEEE/ACM Int'l Conf. Grid Computing (GRID), 2008.
[15] D. Kondo, B. Javadi, A. Iosup, and D.H.J. Epema, "The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems," Proc. 10th IEEE/ACM Int'l Conf. Cluster, Cloud, and Grid Computing (CCGRID), pp. 398-407, 2010.
[16] M. Mutka and M. Livny, "The Available Capacity of a Privately Owned Workstation Environment," Performance Evaluation, vol. 12, no. 4, pp. 269-284, July 1991.
[17] A. Varga and R. Hornig, "An Overview of the Omnet++ Simulation Environment," Proc. First Int'l Conf. Simulation Tools and Techniques for Comm., Networks, and Systems and Workshops, 2008.
[18] Y. Wang, "Nonparametric Tests for Randomness," Research Report, UIUC, May 2003.
[19] World Community Grid, http:/www.worldcommunitygrid.or, 2011.
15 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool