Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture (1995)
Raleigh, North Carolina
Jan. 22, 1995 to Jan. 25, 1995
R. Libeskind-Hadas , Dept. of Comput. Sci., Harvey Mudd Coll., Claremont, CA, USA
E. Brandt , Dept. of Comput. Sci., Harvey Mudd Coll., Claremont, CA, USA
The ability to tolerate faults is critical in multi-computers employing large numbers of processors. This paper describes a class of fault-tolerant routing algorithms for n-dimensional meshes that can tolerate large numbers of faults without using virtual channels. We show that these routing algorithms prevent livelock and deadlock while remaining highly adaptive.
distributed memory systems; multiprocessor interconnection networks; fault tolerant computing; reliability; concurrency control; message passing; origin-based fault-tolerant routing; fault-tolerant routing algorithms; n-dimensional meshes; virtual channels; livelock; deadlock
E. Brandt and R. Libeskind-Hadas, "Origin-based fault-tolerant routing in the mesh," Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture(HPCA), Raleigh, North Carolina, 1995, pp. 102.