Issue No. 02 - February (2002 vol. 51)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/12.980002
<p>We introduce the ROC-1 hardware platform, a large-scale cluster system designed to provide high availability for Internet service applications. The ROC-1 prototype embodies our philosophy of Recovery-Oriented Computing (ROC) by emphasizing detection and recovery from the failures that inevitably occur in Internet service environments, rather than simple avoidance of such failures. ROC-1 promises greater availability than existing server systems by incorporating four techniques applied from the ground up to both hardware and software: redundancy and isolation, online self-testing and verification, support for problem diagnosis, and concern for human interaction with the system.</p>
Availability, fault tolerance, fault diagnosis, Internet, network servers, computer network management.
J. Beck, A. Brown, J. Kuroda, D. Oppenheimer, D. Hettena, K. Yelick, N. Treuhaft, D.A. Patterson, "ROC-1: Hardware Support for Recovery-Oriented Computing", IEEE Transactions on Computers, vol. 51, no. , pp. 100-107, February 2002, doi:10.1109/12.980002