Reliability. Reducing distributed system components' power and energy consumption increases reliability. A growing number of scientific simulations require hundreds of hours of continuous execution. 4 Current devices executing these applications fail at an annual rate of 2 to 3 percent. A hypothetical Pflops system of about 12,000 nodes would experience a hardware failure every 24 hours.
The Arrhenius life-stress model implies that the operating temperature of an electronic component reduces its life expectancy and thus its mean time to failure. Every 10°C (18°F) temperature increase reduces component life expectancy by half; reducing a component's operating temperature by the same amount doubles its life expectancy. Decreasing high-performance systems' energy consumption will reduce the loss of time, effort, and experimental data due to aborted program executions or incorrect results from component failure.
Cost. Reducing distributed system components' power and energy consumption also decreases cost. Assuming a rate of $100 per megawatt, a Pflops machine consuming 100 megawatts of power at peak operation would cost $10,000 per hour and could surpass $85 million in a year. More conservative rule-of-thumb predictions of 20 percent peak power consumption imply a lower, but not necessarily manageable, $17 million annual operational cost ($2,000 per hour). These rough estimates do not include air-cooling requirements, which commonly amount to 40 percent of system operating costs. Even small reductions in overall energy consumption would significantly impact Pflops systems' operational costs.