loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05)
Pinot: Speculative Multi-threading Processor Architecture Exploiting Parallelism over a Wide Range of Granularities
Barcelona, Spain
November 12-November 16
ISBN: 0-7695-2440-0
Taku Ohsawa, NEC Corporation
Masamichi Takagi, NEC Corporation
Shoji Kawahara, NEC Corporation
Satoshi Matsushita, NEC Corporation

We propose a speculative multi-threading processor architecture called Pinot. Pinot exploits parallelism over a wide range of granularities without modifying program sources. Since exploitation of fine-grain parallelism suffers from limits of parallelism and overhead incurred by parallelization, it is better to extract coarse-grain parallelism. Coarse-grain parallelism is biased in some programs (mainly, numerical ones) and some program portions. Therefore, exploiting both coarse- and fine-grain parallelism is a key to the performance of speculative multithreading.

The features of Pinot are as follows: (1) A parallelizing tool extracts parallelism at any level of granularity (e.g. even ten thousand instructions) from any program sub-structures (e.g. loops, calls, or basic blocks). The tool utilizes formulation in which the parallelization process is reduced to a combinatorial optimization problem. (2) A parallel execution model with extension of thread control instructions is designed in order to minimize the increase of the dynamic instruction count. The model employs implicit thread termination and cancellation, as well as register value transfer without synchronization. (3) A versioning cache called Version Resolution Cache (VRC) accomplishes both coarse- and fine-grained speculative multithreading. VRC operates as a large buffer for coarsegrained multi-threading. In addition, it provides low latency inter-thread communication with an update-based protocol for fine-grained multi-threading.

We performed cycle-accurate simulations with 38 programs from the SPEC and MiBench benchmarks. The speedup with 4-processor-element- Pinot is up to 3.7 times, and 2.2 times on geometric mean against a conventional processor. The speedup in a program (susan) drops from 3.7 to 1.6 when the speculative buffer size is limited to 256 bytes. It confirms that exploiting coarse-grain parallelism is essential to the improved performance. FPGA implementation shows 32% overhead of area and 12% increase of

Citation:
Taku Ohsawa, Masamichi Takagi, Shoji Kawahara, Satoshi Matsushita, "Pinot: Speculative Multi-threading Processor Architecture Exploiting Parallelism over a Wide Range of Granularities," micro, pp.81-92, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.