From the September 2014 issue
FTH-B&B: A Fault-Tolerant HierarchicalBranch and Bound for Large ScaleUnreliable Environments
By Ahcène Bendjoudi, Nouredine Melab, and El-Ghazali Talbi
Solving to optimality large instances of combinatorial optimization problems using Brand and Bound (B&B) algorithms requires a huge amount of computing resources. In this paper, we investigate the design and implementation of such algorithms on computational grids. Most of existing grid-based B&B algorithms are based on the Master-Worker paradigm, their scalability is therefore limited. In addition, even if the volatility of resources is a major issue in grids fault tolerance is rarely addressed in these works. We thereby propose FTH-B&B, a fault tolerant hierarchical B&B. FTH-B&B is based on different new mechanisms enabling to efficiently build and maintain balanced the hierarchy, and to store and recover work units (sub-problems). FTH-B&B has been implemented on top of the ProActive grid middleware and programming environment and applied to the Flow-Shop scheduling problem. Very often, the validation of existing grid-based B&B works is performed either through simulation or a very small real grid. In this paper, we experimented FTH-B&B on the Grid'5000 real French nation-wide computational grid using up to 1,900 processor cores distributed over six sites. The reported results show that the overhead induced by the proposed mechanisms is very low and an efficiency close to 100 percent can be achieved on some Taillards benchmarks of the Flow-Shop problem. In addition, the results demonstrate the robustness of the proposed mechanisms even in extreme failure situations.
Editorials and Announcements
- Dr. Paolo Montuschi Announced as New Editor-in-Chief of the IEEE Transactions on Computers
- Get Your Journals as eBooks for Free
- eBooks of issues of TC can now be downloaded from the Computer Society Digital Library
- IEEE Transactions on Computers EIC Albert Zomaya receives two IEEE awards.
New Essential Set
- "Cloud Computing" available at computer.org/store
- "Industrial Implementations of Floating-Point Units" available at computer.org/store
- State of the Journal by Albert Y. Zomaya (Jan 2012)
- State of the Journal by Albert Y. Zomaya (May 2011)
- Special Section on Computer Arithmetic (August 2014)
- Special Issue on Network-on-Chip (March 2014)
- Special Issue on Cloud of Clouds (January 2014)
- Special Section on Concurrent On-Line Testing and Error/Fault Resilience of Digital Systems (September 2011)
Access Recently Published TC Articles
Subscribe to the RSS feed of latest TC content added to the digital library
Sign up for the Transactions Connection newsletter.
Importance of Coherence Protocols with Network Applications on Multi-Core Processors
Automated Generation of Performance and Dependability Models for the Assessment of Wireless Sensor Networks