This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
AVMON: Optimal and Scalable Discovery of Consistent Availability Monitoring Overlays for Distributed Systems
April 2009 (vol. 20 no. 4)
pp. 446-459
Ramsés Morales, University of Illinois at Urbana-Champaign, Urbana
Indranil Gupta, University of Illinois at Urbana-Champaign, Urbana
This paper proposes to build overlays that help in monitoring of long-term availability histories of hosts, with a focus on large-scale distributed settings where hosts may be selfish or colluding. Concretely, we target the important problems of selection and discovery of an availability monitoring overlay. We motivate six significant goals - firstly, consistency, verifiability, and randomness, in selecting availability monitors of nodes, so as to be probabilistically resilient to selfish and colluding nodes. The next three goals are discoverability, load-balancing, and scalability in finding these monitors. We present AVMON, the first availability monitoring overlay to satisfy these six requirements. Our core algorithmic contribution is a range of protocols for discovering the availability monitoring overlay scalably and efficiently, given any arbitrary monitor selection scheme that is consistent and verifiable. We mathematically analyze the performance of AVMON's discovery protocols w.r.t. scalability and discovery time of monitors. Most interestingly, we are able to derive optimal (and practical) variants of AVMON, that minimize different combinations of memory, bandwidth, computation, and monitor discovery time. Finally, our extensive experimental evaluations using three types of availability traces - synthetic, from PlanetLab, and from the Overnet p2p system - demonstrate AVMON's practicality in a variety of distributed systems.

[1] R. Morales and I. Gupta, “AVMON: Optimal and Scalable Discovery of Consistent Availability Monitoring Overlays for Distributed Systems,” Proc. Int'l Conf. Distributed Computing Systems (ICDCS '07), pp. 55-65, 2007.
[2] L. Peterson, T. Anderson, D. Culler, and T. Roscoe, “A Blueprint for Introducing Disruptive Technology into the Internet,” Proc. ACM Hot Topics in Networking (HotNets '02), pp. 59-64, 2002.
[3] Open Grid Forum, http:/www.ogf.org/, 2008.
[4] R. Bhagwan, K. Tati, Y.-C. Cheng, S. Savage, and G.M. Voelker, “Total Recall: System Support for Automated Availability Management,” Proc. Usenix Symp. Networked Systems Design and Implementation (NSDI '04), pp. 337-350, 2004.
[5] B.-G. Chun et al., “Efficient Replica Maintenance for Distributed Storage Systems,” Proc. Usenix Symp. Networked Systems Design and Implementation (NSDI '06), pp. 45-58, 2006.
[6] P.B. Godfrey, S. Shenker, and I. Stoica, “Minimizing Churn in Distributed Systems,” Proc. ACM SIGCOMM, 2006.
[7] J.W. Mickens and B.D. Noble, “Exploiting Availability Prediction in Distributed Systems,” Proc. Usenix Symp. Networked Systems Design and Implementation (NSDI '06), pp. 73-86, 2006.
[8] T. Pongthawornkamol and I. Gupta, “AVCast: New Approaches for Implementing Availability-Dependent Reliability for Multicast Receivers,” Proc. IEEE Symp. Reliable Distributed Systems (SRDS'06), pp. 345-354, 2006.
[9] T. Schwarz, Q. Xin, and E.L. Miller, “Availability in Global Peer-to-Peer Storage Systems,” Proc. Workshop Distributed Data Structures (WDAS), 2004.
[10] J.R. Douceur, “The Sybil Attack,” Revised Papers from the First Int'l Workshop Peer-to-Peer Systems (IPTPS '01), pp. 251-260, 2002.
[11] A. Rowstron and P. Druschel, “Pastry: Scalable, Distributed Object Location and Routing for Large-Scale Peer-to-Peer Systems,” Proc. IFIP/ACM Int'l Conf. Distributed Systems Platforms (Middleware '01), pp. 329-350, 2001.
[12] I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan, “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications,” Proc. ACM SIGCOMM '01, pp. 149-160, 2001.
[13] V. Vishnumurthy, S. Chandrakumar, and E.G. Sirer, “KARMA: A Secure Economic Framework for P2P Resource Sharing,” Proc. Workshop Economics of P2P Systems (EconP2P), 2003.
[14] A. Das, I. Gupta, and A. Motivala, “SWIM: Scalable Weakly-Consistent Infection-Style Process Group Membership Protocol,” Proc. IEEE Int'l Conf. Dependable Systems and Networks (DSN '02), pp. 303-312, 2002.
[15] R. van Renesse, Y. Minsky, and M. Hayden, “A Gossip-Style Failure Detection Service,” Proc. Int'l Conf. Distributed Systems Platforms (Middleware), 1998.
[16] A.J. Ganesh, A.-M. Kermarrec, and L. Massoulie, “Peer-to-Peer Membership Management for Gossip-Based Protocols,” IEEE Trans. Computers, vol. 52, pp. 139-149, Feb. 2003.
[17] S. Voulgaris, D. Gavidia, and M. van Steen, “CYCLON: Inexpensive Membership Management for Unstructured P2P Overlays,” J. Network and Systems Management, vol. 13, no. 2, pp.197-217, June 2005.
[18] M. Jelasity and O. Babaoglu, “T-Man: Gossip-Based Overlay Topology Management,” Self-Organising Systems: Engineering Self-Organizing Systems, pp. 1-15, July 2005.
[19] R. Bhagwan, S. Savage, and G. Voelker, “Understanding Availability,” Proc. Int'l Workshop Peer-to-Peer Systems (IPTPS '03), pp.135-140, Feb. 2003.
[20] J. Chu, K. Labonte, and B. Levine, “Availability and Locality Measurements of Peer-to-Peer File Systems,” Proc. SPIE, vol. 4868, 2002.
[21] X. Hei, C. Liang, J. Liang, Y. Liu, and K.W. Ross, “A Measurement Study of a Large-Scale P2P IPTV System,” IEEE Trans. Multimedia, vol. 9, no. 8, pp. 1672-1687, Dec. 2007.
[22] D. Stutzbach and R. Rejaie, “Characterizing Unstructured Overlay Topologies in Modern P2P File-Sharing Systems,” Proc. Internet Measurement Conf. (IMC '05), pp. 49-62, 2005.
[23] D. Kostoulas, D. Psaltoulis, I. Gupta, K. Birman, and A.J. Demers, “Active and Passive Techniques for Group Size Estimation in Large-Scale and Dynamic Distributed Systems,” J. Systems and Software, vol. 80, no. 10, pp. 1639-1658, Oct. 2007.
[24] Speed Benchmarks for MD5 and Other Cryptographic Functions, http://www.eskimo.com/~weidaibenchmarks.html , 2008.
[25] A.-M. Kermarrec, L. Massoulie, and A.J. Ganesh, “Probabilistic Reliable Dissemination in Large-Scale Systems,” IEEE Trans. Parallel and Distributed Systems, vol. 14, no. 3, pp. 248-258, Mar. 2003.
[26] M. Raab and A. Steger, “Balls into Bins—A Simple and Tight Analysis,” Proc. Second Int'l Workshop Randomization and Approximation Techniques in Computer Science, pp. 159-170, http://citeseer.ist.psu.edu296823.html, 1998.

Index Terms:
Distributed Systems, Churn, Availability, Monitoring, Overlay, Consistency, Scalability, Optimality
Citation:
Ramsés Morales, Indranil Gupta, "AVMON: Optimal and Scalable Discovery of Consistent Availability Monitoring Overlays for Distributed Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 20, no. 4, pp. 446-459, April 2009, doi:10.1109/TPDS.2008.84
Usage of this product signifies your acceptance of the Terms of Use.