Issue No.02 - February (2002 vol.51)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/12.980002
<p>We introduce the ROC-1 hardware platform, a large-scale cluster system designed to provide high availability for Internet service applications. The ROC-1 prototype embodies our philosophy of Recovery-Oriented Computing (ROC) by emphasizing detection and recovery from the failures that inevitably occur in Internet service environments, rather than simple avoidance of such failures. ROC-1 promises greater availability than existing server systems by incorporating four techniques applied from the ground up to both hardware and software: redundancy and isolation, online self-testing and verification, support for problem diagnosis, and concern for human interaction with the system.</p>
Availability, fault tolerance, fault diagnosis, Internet, network servers, computer network management.
D. Oppenheimer, A. Brown, J. Beck, D. Hettena, J. Kuroda, N. Treuhaft, D.A. Patterson, K. Yelick, "ROC-1: Hardware Support for Recovery-Oriented Computing", IEEE Transactions on Computers, vol.51, no. 2, pp. 100-107, February 2002, doi:10.1109/12.980002