This Article 
 Bibliographic References 
 Add to: 
Data-Driven Multithreading Using Conventional Microprocessors
October 2006 (vol. 17 no. 10)
pp. 1176-1188

Abstract—This paper describes the Data-Driven Multithreading (DDM) model and how it may be implemented using off-the-shelf microprocessors. Data-Driven Multithreading is a nonblocking multithreading execution model that tolerates internode latency by scheduling threads for execution based on data availability. Scheduling based on data availability can be used to exploit cache management policies that reduce significantly cache misses. Such policies include firing a thread for execution only if its data is already placed in the cache. We call this cache management policy the CacheFlow policy. The core of the DDM implementation presented is a memory mapped hardware module that is attached directly to the processor's bus. This module is responsible for thread scheduling and is known as the Thread Synchronization Unit (TSU). The evaluation of DDM was performed using simulation of the Data-Driven Network of Workstations ({\rm{D}}^2{\rm{NOW}}). {\rm{D}}^2{\rm{NOW}} is a DDM implementation built out of regular workstations augmented with the TSU. The simulation was performed for nine scientific applications, seven of which belong to the SPLASH-2 suite. The results show that DDM can tolerate well both the communication and synchronization latency. Overall, for 16 and 32-node {\rm{D}}^2{\rm{NOW}} machines the speedup observed was 14.4 and 26.0, respectively.

[1] P. Evripidou, “D3-Machine: A Decoupled Data-Driven Multithreaded Architecture with Variable Resolution Support,” Parallel Computing, vol. 27, no. 9, pp. 1197-1225, 2001.
[2] P. Evripidou and J.-L. Gaudiot, “A Decoupled Graph/Computation Data-Driven Architecture with Variable-Resolution Actors,” Proc. 1990 Int'l Conf. Parallel Processing (ICPP), pp. 405-414, Aug. 1990.
[3] A. Agarwal et al., “Sparcle: An Evolutionary Processor Design for Multiprocessors,” IEEE Micro, vol. 13, pp. 48-61, June 1993.
[4] K. Kavi, R. Giorgi, and J. Arul, “Scheduled Dataflow: Execution Paradigm, Architecure, and Performance Evaluation,” IEEE Trans. Computers, vol. 50, no. 8, pp. 834-846, Aug. 2001.
[5] R.S. Nikhil, G.M. Papadopoulos, and Arvind, “*T: A Multithreaded Massively Parallel Architecture,” Proc. Int'l Symp. Computer Architecture (ISCA), pp. 156-167, 1992.
[6] H. Hum et al., “A Design Study of the EARTH Multiprocessor,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '95), pp. 59-68, June 1995.
[7] D. Culler et al., “TAM: A Compiler Controlled Threaded Abstract Machine,” J. Parallel and Distributed Computing, vol. 18, no. 3, pp. 347-370, 1993.
[8] S. Woo et al., “The SPLASH-2 Programs: Characterization and Methodological Considerations,” Proc. 22nd Ann. Int'l Symp. Computer Architecture (ISCA), pp. 24-36, June 1995.
[9] C. Kyriacou, P. Evripidou, and P. Trancoso, “Cacheflow: A Short-Term Optimal Cache Management Policy for Data Driven Multithreading,” Proc. EuroPar-04, pp. 561-570, Aug. 2004.
[10] C. Kyriacou and P. Evripidou, “Communication Assist for Data Driven Multithreading,” Advances in Informatics, (LNCS2563), Springer-Verlang, pp. 351-367, 2002.
[11] Intel, IA-32 Intel Architecture: Software Developers Manual, Series System Programming Guide, Intel, vol. 3, 2003.
[12] C. Kyriacou, “Data Driven Multithreading Using Conventional Control Flow Microprocessors,” PhD Thesis, Dept. of Computer Science, Univ. of Cyprus, 2005.
[13] A. Bilas, C. Liao, and J.P. Singh, “Using Network Interface Support to Avoid Asynchronous Protocol Processing in Shared Virtual Memory Systems,” Proc. Int'l Symp. Computer Architecture (ISCA), pp. 282-293, 1999.
[14] G. Papadopoulos and D. Culler, “Monsoon: An Explicid Token Store Architecture,” Proc. 17th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 82-91, May 1990.
[15] B. Shankar, L. Roh, W. Bohm, and W. Najjar, “Control of Loop Parallelism in Multithreaded Code,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '95), pp. 131-139, June 1995.
[16] L. Roh and W.A. Najjar, “Design of Storage Hierarchy in Multithreaded Architectures,” Proc. Int'l Symp. Microarchitecture (Micro-28), pp. 271-278, Nov. 1995.
[17] D. Burger et al., “Scaling to the End of Silicon with EDGE Architectures,” Computer, vol. 37, no. 7, pp. 44-55, July 2004.
[18] S. Swanson and M. Oskin, “WaveScalar,” Proc. Int'l Symp. Microarchitecture (Micro-36), pp. 291-302, Nov. 2003.
[19] D.K. Poulsen and P.-C. Yew, “Data Prefetching and Data Forwarding in Shared Memory Multiprocessors,” Proc. Int'l Conf. Parallel Processing (ICPP), pp. 276-280, Aug. 1994.
[20] P. Trancoso and J. Torrellas, “The Impact of Speeding up Critical Sections with Data Prefetching and Forwarding,” Proc. Int'l Conf. Parallel Processing (ICPP), vol. 3, pp. 79-86, 1996.
[21] J.D. Collins, S. Sair, B. Calder, and D.M. Tullsen, “Pointer Cache Assisted Prefetching,” Proc. 35th Ann. Int'l Symp. Microarchitecture (MICRO-35), pp. 62-73, Nov. 2002.
[22] T. Sherwood, S. Sair, and B. Calder, “Predictor-Directed Stream Buffers,” Proc. 33rd Int'l Symp. Microarchitecture (MICRO-33), pp. 42-53, Dec. 2000.
[23] C.-K. Luk and T.C. Mowry, “Compiler-Based Prefetching for Recursive Data Structures,” Proc. Seventh Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII), pp. 222-233, Oct. 1996.
[24] J.D. Collins, D.M. Tullsen, H. Wang, and J.P. Shen, “Dynamic Speculative Precomputation,” Proc. 34th Ann. Int'l Symp. Microarchitecture (MICRO-34), pp. 306-317, Dec. 2001.
[25] A. Roth and G.S. Sohi, “Speculative Data-Driven Multithreading,” Proc. Seventh Int'l Symp. High-Performance Computer Architecture (HPCA), pp. 37-48, Jan. 2001.
[26] P. Evripidou and C. Kyriacou, “Data Driven Network of Workstations (${\rm{D}}^2{\rm{NOW}}$ ),” J. Universal Computer Science, vol. 6, no. 10, pp. 1015-1033, 2000.

Index Terms:
Dataflow, multithreading, nonblocking threads, cache prefetching, multiprocessors, network of workstations, high performance computing.
Costas Kyriacou, Paraskevas Evripidou, Pedro Trancoso, "Data-Driven Multithreading Using Conventional Microprocessors," IEEE Transactions on Parallel and Distributed Systems, vol. 17, no. 10, pp. 1176-1188, Oct. 2006, doi:10.1109/TPDS.2006.136
Usage of this product signifies your acceptance of the Terms of Use.