Issue No. 03 - September (1996 vol. 2)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/2945.537305
<p><b>Abstract</b>—This paper presents a parallel volume rendering algorithm that can render a 256 × 256 × 225 voxel medical data set at over 15 Hz and a 512 × 512 × 334 voxel data set at over 7 Hz on a 32-processor Silicon Graphics Challenge. The algorithm achieves these results by minimizing each of the three components of execution time: computation time, synchronization time, and data communication time. Computation time is low because the parallel algorithm is based on the recently-reported shear-warp serial volume rendering algorithm which is over five times faster than previous serial algorithms. The algorithm uses run-length encoding to exploit coherence and an efficient volume traversal to reduce overhead. Synchronization time is minimized by using dynamic load balancing and a task partition that minimizes synchronization events. Data communication costs are low because the algorithm is implemented for shared-memory multiprocessors, a class of machines with hardware support for low-latency fine-grain communication and hardware caching to hide latency.</p><p>We draw two conclusions from our implementation. First, we find that on shared-memory architectures data redistribution and communication costs do not dominate rendering time. Second, we find that cache locality requirements impose a limit on parallelism in volume rendering algorithms. Specifically, our results indicate that shared-memory machines with hundreds of processors would be useful only for rendering very large data sets.</p>
Volume rendering, parallel algorithms for shared memory multiprocessors, shear-warp factorization, coherence optimizations, image partition, multiprocessor performance analysis.
P. Lacroute, "Analysis of a Parallel Volume Rendering System Based on the Shear-Warp Factorization," in IEEE Transactions on Visualization & Computer Graphics, vol. 2, no. , pp. 218-231, 1996.