2004 International Conference on Parallel Processing (ICPP'04)
16-Bit FP Sub-Word Parallelism to Facilitate Compiler Vectorization and Improve Performance of Image and Media Processing
Montreal, Quebec, Canada
August 15-August 18
ISBN: 0-7695-2197-5
We consider the implementation of 16-bit floating point instructions on a Pentium 4 and a PowerPC G5 for image and media processing. By measuring the execution time of benchmarks with these new simulated instructions, we show that significant speed-up is obtained compared to 32-bit FP versions. For image processing, the speed-up both comes from doubling the number of operations per SIMD instruction and the better cache behavior with byte storage. For data stream processing with arrays of structures, the speed-up mainly comes from the wider SIMD instructions.
Citation:
Daniel Etiemble, Lionel Lacassagne, "16-Bit FP Sub-Word Parallelism to Facilitate Compiler Vectorization and Improve Performance of Image and Media Processing," icpp, pp.540-547, 2004 International Conference on Parallel Processing (ICPP'04), 2004