2015 IEEE 21st Pacific Rim International Symposium on Dependable Computing (PRDC) (2015)
Nov. 18, 2015 to Nov. 20, 2015
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/PRDC.2015.22
Many organizations today still manage mid or large in-house data centers that require very expensive maintenance efforts, including fault detection. Common monitoring frameworks used to quickly detect faults are complex to deploy/maintain, expensive, and intrusive as they require the installation of probes on monitored hw/sw to collect raw data. Such intrusiveness can be problematic as it imposes installation/management overhead and may interfere with security/privacy policies. In this paper we introduce NIRVANA, a novel monitoring system for fault detection that works at rack-level and is (i) non-intrusive, i.e., it does not require the installation of software probes on the hosts to be monitored and (ii) black-box, i.e., agnostic with respect to monitored applications. At the core of our solution lies the observation that aggregated features that can be monitored at rack-level in a non-intrusive and black-box way, show predictable behaviors while the system works in both fault-free and faulty states, it is therefore possible to detect and identify faults by monitoring and analyzing any perturbations to these behaviors. An extensive experimental evaluation shows that non-intrusiveness does not significantly hamper the fault detection capabilities of the monitoring system, thus validating our approach.
Monitoring, Probes, Fault detection, Software, Power demand, Computer architecture, Organizations
C. Ciccotelli, L. Aniello, F. Lombardi, L. Montanari, L. Querzoni and R. Baldoni, "NIRVANA: A Non-intrusive Black-Box Monitoring Framework for Rack-Level Fault Detection," 2015 IEEE 21st Pacific Rim International Symposium on Dependable Computing (PRDC), Zhangjiajie, China, 2015, pp. 11-20.