The Community for Technology Leaders
Green Image
<p><b>Abstract</b>—In this paper, the Scheduled Dataflow (SDF) architecture—a decoupled memory/execution, multithreaded architecture using nonblocking threads—is presented in detail and evaluated against Superscalar architecture. Recent focus in the field of new processor architectures is mainly on VLIW (e.g., IA-64), superscalar, and superspeculative designs. This trend allows for better performance, but at the expense of increased hardware complexity and, possibly, higher power expenditures resulting from dynamic instruction scheduling. Our research deviates from this trend by exploring a simpler, yet powerful execution paradigm that is based on dataflow and multithreading. A program is partitioned into nonblocking execution threads. In addition, all memory accesses are decoupled from the thread's execution. Data is preloaded into the thread's context (registers) and all results are poststored after the completion of the thread's execution. While multithreading and decoupling are possible with control-flow architectures, SDF makes it easier to coordinate the memory accesses and execution of a thread, as well as eliminate unnecessary dependencies among instructions. We have compared the execution cycles required for programs on SDF with the execution cycles required by programs on SimpleScalar (a superscalar simulator) by considering the essential aspects of these architectures in order to have a fair comparison. The results show that SDF architecture can outperform the superscalar. SDF performance scales better with the number of functional units and allows for a good exploitation of Thread Level Parallelism (TLP) and available chip area.</p>
Multithreaded architectures, dataflow architectures, superscalar, decoupled architectures, Thread Level Parallelism.
J. Arul, R. Giorgi, K.M. Kavi, "Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation", IEEE Transactions on Computers, vol. 50, no. , pp. 834-846, August 2001, doi:10.1109/12.947003
99 ms
(Ver 3.3 (11022016))