loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14
Extracting Speedup From C-Code With Poor Instruction-Level Parallelism
Denver, Colorado
April 04-April 08
ISBN: 0-7695-2312-9
Dara Kusic, University of Pittsburgh, Pennsylvania
Raymond Hoare, University of Pittsburgh, Pennsylvania
Alex K. Jones, University of Pittsburgh, Pennsylvania
Joshua Fazekas, University of Pittsburgh, Pennsylvania
John Foster, University of Pittsburgh, Pennsylvania
Scientific computing and multimedia applications frequently call loop-intensive functions that dominate execution time. Applying homogeneous, parallel processors (e.g. single-instruction, multiple-data (SIMD) and very-long instruction word (VLIW)) is a common approach to minimizing execution time. However, many benchmark applications offer disappointing degrees of instruction-level parallelism (ILP) that cause these ar-chitectures to fall short of expected performance gains.
This paper presents findings on execution time speedup achieved by heterogeneous massively parallel processors - standard reduced instruction-set comput-ing (RISC) CPUs tightly coupled with arrays of super-complex instruction-set computing (SuperCISC) datapaths on the same chip. SuperCISC datapaths are created by mapping frequently-called functions into reconfigurable hardware. Encouraging performance results from the RISC/SuperCISC architecture point to the efficiency of reconfigurable devices to support large numbers of parallel computational accelerators. Calls to SuperCISC functions can greatly expedite execution time when applied to CPUs that support extensible in-struction sets.
In this paper we show how SuperCISC functions can accelerate an application up to 25x over a 4-way VLIW. SuperCISC functions show superlinear speedup, a per-formance gain significantly greater than the software's ILP. SuperCISC functions also benefit from cycle com-pression, or a reduction of the idle cycle time for an operation to execute within a traditional CPU. Imple-menting software controls, or if-then-else statements, as hardware multiplexers within a SuperCISC function further advances performance.
Citation:
Dara Kusic, Raymond Hoare, Alex K. Jones, Joshua Fazekas, John Foster, "Extracting Speedup From C-Code With Poor Instruction-Level Parallelism," ipdps, vol. 15, pp.264b, 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14, 2005
Usage of this product signifies your acceptance of the Terms of Use.