Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing
Reducing 3D Wavelet Transform Execution Time through the Streaming SIMD Extensions
Genova, Italy
February 05-February 07
ISBN: 0-7695-1875-3
This paper focuses on reducing the execution time of the video compression algorithms based on the 3D wavelet transform. We present several optimizations that could not be applied by the compiler due to the characteristics of the algorithm. First, we use the Streaming SIMD Extensions (SSE) for some of the dimensions of the sequence (y and time, in order to reduce the number of floating point instructions, exploiting Data Level Parallelism. Then,we apply loop unrolling and data prefetching to critical parts of the code, and finally the algorithm is vectorized by columns, allowing the use of SIMD instructions for the y dimension. Results show improvements of up to 1.54 over a version compiled with the maximum optimizations of the Intel C/C++compiler. Our experiments also show that, allowing the compiler to perform some of these optimizations (i.e. automatic code vectorization) causes performance slowdown which demonstrates the effectiveness of our optimizations.
Citation:
Gregorio Bernabé, José M. García, José González, "Reducing 3D Wavelet Transform Execution Time through the Streaming SIMD Extensions," pdp, pp.49, Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2003