The Community for Technology Leaders
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (1997)
San Francisco, CA
Nov. 11, 1997 to Nov. 15, 1997
ISSN: 1089-795X
ISBN: 0-8186-8090-3
pp: 189
Thomas Fahringer , University of Vienna
Eduard Mehofer , University of Vienna
This paper presents a novel approach to reduce communication costs of programs for distributed memory machines. Our techniques are based on uni-directional bit-vector data flow analysis that enable vectorizing and coalescing communication, overlapping communication with computation, eliminating redundant messages and amount of data being transferred both within and across loop nests. Our data flow analysis differs from previous techniques that it does not require to explicitly model balanced communication placement and loops and does not employ interval analysis. Our techniques are based on simple yet highly effective data flow equations which are solved iteratively for arbitrary control flow graphs. Moving communication earlier to hide latency has been shown to dramatically increase communication buffer sizes and can even cause runtime errors. We use P3T, a state-of-the-art performance estimator to create a buffer-safe program. By accurately estimating both the communication buffer sizes required and the implied communication times of every single communication of a program we can selectively choose communication that must be delayed in order to ensure a correct communication placement while maximizing communication latency hiding. Experimental results are presented to prove the efficacy of our communication optimization strategy.
Communication, Communication Optimization, Data Flow Analysis, Performance Prediction
Thomas Fahringer, Eduard Mehofer, "Buffer-Safe Communication Optimization based on Data Flow Analysis and Performance Prediction", Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, vol. 00, no. , pp. 189, 1997, doi:10.1109/PACT.1997.644015
90 ms
(Ver 3.3 (11022016))