The hardware redundancy costs required for self-healing systems increase with system complexity, creating concerns about the cost-feasibility of large-scale autonomic systems. Redundancy costs can be minimized without reducing reliability by providing finest-grained fault recovery, but modern redundancy techniques are not effective for sub-component-level recovery.
This paper introduces a design technique for low-cost, fine-grained self-healing hardware. Small amounts of reconfigurable logic are integrated within primarily fixed-logic circuits to provide heterogeneous redundancy for efficient fine-grained, self-contained fault detection, diagnosis, and recovery. When combined with traditional coarse-grained recovery techniques for larger failures, this technique enables large-scale self-healing systems with reduced redundancy costs.