Issue No.10 - October (2005 vol.16)
Hyo Jung Song , IEEE
Andrew A. Chien , IEEE Computer Society
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2005.122
<p><b>Abstract</b>—Many applications in cluster computing require QoS (Quality of Service) services. Since performance predictability is essential to provide QoS service, underlying systems must provide predictable performance guarantees. One way to ensure such guarantees from network subsystems is to generate global schedules from applications' network requests and to execute the local portion of the schedules at each network interface. To ensure accurate execution of the schedules, it is essential that a global time base must be maintained by local clocks at each network interface. The task of providing a single time base is called a synchronization problem and this paper addresses the problem for system area networks. To solve the synchronization problem, FM-QoS [CHECK END OF SENTENCE] proposed a simple synchronization mechanism called FBS (Feedback-Based Synchronization) which uses built-in flow control signals. This paper extends the basic notion of FM-QoS to a theoretical framework and generalizes it: 1) to identify a set of built-in network flow control signals for synchrony and to formalize it as a synchronizing schedule and 2) to analyze the synchronization precision of FBS in terms of flow control parameters. Based on generalization, two application classes are studied for a single switch network and a multiple switch network. For each class, a synchronizing schedule is proposed and its bounded skew is analyzed. Unlike FM-QoS, the synchronizing schedule is proven to minimize the bounded skew value for a single switch network. To understand the analysis results in practical networks, skew values are obtained with flow control parameters of Myrinet-2000 [CHECK END OF SENTENCE]. We observed that the maximum bounded skew of FBS is <tmath>5.79\mu</tmath>sec or less over all our experiments. Based on this result, we came to a conclusion that FBS was a feasible synchronization mechanism in system area networks.</p>
Synchronization, link level flow control, system area networks, cluster computing.
Hyo Jung Song, Andrew A. Chien, "Feedback-Based Synchronization in System Area Networks for Cluster Computing", IEEE Transactions on Parallel & Distributed Systems, vol.16, no. 10, pp. 908-920, October 2005, doi:10.1109/TPDS.2005.122