The Community for Technology Leaders
2014 IEEE International Conference On Cluster Computing (CLUSTER) (2014)
Madrid, Spain
Sept. 22, 2014 to Sept. 26, 2014
ISBN: 978-1-4799-5548-0
pp: 331-338
Estefania Serrano , University Carlos III, Avda. Universidad 30, Leganes, Spain 28911
Guzman Bermejo , University Carlos III, Avda. Universidad 30, Leganes, Spain 28911
Javier Garcia Blas , University Carlos III, Avda. Universidad 30, Leganes, Spain 28911
Jesus Carretero , University Carlos III, Avda. Universidad 30, Leganes, Spain 28911
ABSTRACT
Many medical image processing applications need high processing speed to achieve almost real-time image reconstruction features. Due to that, massively parallel architectures based on accelerators have become very popular in the area, specially GPGPUs. In this paper we show Mangoose++, an application to perform X-Ray Computed Tomography (CT) from medical image based on a new implementation of the FDK algorithm. Mangoose++ have been designed and implemented to exploit the parallelism existing on several hardware accelerators platforms, as GPGPUs and Intel Xeon Phi accelerators. In this paper we show the design and implementation of the application in three types of platforms, multi-core CPU, GPGPU, and Intel Xeon Phi, and the evaluation made to test the performance, resource utilization, and scalability of each platform. Moreover, to avoid hardware dependencies, we have also implemented the application using the OpenACC runtime to check portability and the overhead incurred when using runtimes. The evaluation results show that our solution is faster than recent related works and that, in terms of computation, Intel Xeon Phi and the CUDA-based GPU versions obtain similar results as the problem size increases. Moreover, the evaluation shows that using OpenACC, we have enhanced programmability because there is a single version of the source code. But it also shows that using OpenACC heavily affects performance of Mangoose++, which is reduced in a 50% when compared with the many-core versions, even when it is not so drastical when compared to the CPU version.
INDEX TERMS
Programming, Parallel processing, Graphics processing units, Instruction sets, Image reconstruction, Computed tomography, Optimization
CITATION

E. Serrano, G. Bermejo, J. G. Blas and J. Carretero, "High-performance X-ray tomography reconstruction algorithm based on heterogeneous accelerated computing systems," 2014 IEEE International Conference On Cluster Computing (CLUSTER), Madrid, Spain, 2014, pp. 331-338.
doi:10.1109/CLUSTER.2014.6968781
87 ms
(Ver 3.3 (11022016))