2011 14th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops (2011)
Newport Beach, California USA
Mar. 28, 2011 to Mar. 31, 2011
Next generation processor and memory technologies will provide tremendously increasing computing and memory capacities for application scaling. However, this comes at a price: Due to the growing number of transistors and shrinking structural sizes, overall system reliability of future server systems is about to suffer significantly. This makes reactive fault tolerance schemes less appropriate for server applications under reliability and timeliness constraints. We propose an architectural blueprint for managing server system dependability in a pro-active fashion, in order to keep service-level promises for response times and availability even with increasing hardware failure rates. We introduce the concept of anticipatory virtual machine migration that proactively moves computation away from faulty or suspicious machines. The migration decision is based on health indicators at various system levels that are combined into a global probabilistic reliability measure. Based on this measure, live migration techniques can be triggered in order to move computation to healthy machines even before a failure brings the system down.
meta-learning, virtualization, failure prediction, live migration, monitoring
F. Salfner, P. Tröger and A. Polze, "Timely Virtual Machine Migration for Pro-active Fault Tolerance," 2011 14th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops(ISORCW), Newport Beach, California USA, 2011, pp. 234-243.