The Community for Technology Leaders
Parallel and Distributed Processing Symposium, International (2012)
Shanghai, China China
May 21, 2012 to May 25, 2012
ISSN: 1530-2075
ISBN: 978-1-4673-0975-2
pp: 1105-1116
Traditional radio telescopes use large steel dishes to observe radio sources. The largest radio telescope in the world, LOFAR, uses tens of thousands of fixed, omni-directional antennas instead, a novel design that promises ground-breaking research in astronomy. Where traditional tele-scopes use custom-built hardware, LOFAR uses software to do signal processing in real time. This leads to an instrument that is inherently more flexible. However, the enormous data rates and processing requirements (tens to hundreds of teraflops) make this extremely challenging. The next-generation telescope, the SKA, will require exa flops. Unlike traditional instruments, LOFAR and SKA can observe in hundreds of directions simultaneously, using beam forming. This is useful, for example, to search the sky for pulsars (i.e. rapidly rotating highly magnetized neutron stars). Beam forming is an important technique in signal processing: it is also used in WIFI and 4G cellular networks, radar systems, and health-care microwave imaging instruments. We propose the use of many-core architectures, such as 48-core CPU systems and Graphics Processing Units (GPUs), to accelerate beam forming. We use two different frameworks for GPUs, CUDA and Open CL, and present results for hardware from different vendors (i.e. AMD and NVIDIA). Additionally, we implement the LOFAR beam former on multi-core CPUs, using Open MP with SSE vector instructions. We use auto-tuning to support different architectures and implementation frameworks, achieving both platform and performance portability. Finally, we compare our results with the production implementation, written in assembly and running on an IBM Blue Gene/P supercomputer. We compare both computational and power efficiency, since power usage is one of the fundamental challenges modern radio telescopes face. Compared to the production implementation, our auto-tuned beam former is 45 -- 50 times faster on GPUs, and 2 -- 8 times more power efficient. Our experimental results lead to the conclusion that GPUs are an attractive solution to accelerate beam forming.
Telescopes, Graphics processing unit, Radio astronomy, Algorithm design and analysis, Instruction sets, Hardware, auto tuning, Radio astronomy, many-core architectures, beam forming

J. D. Mol, A. L. Varbanescu, A. Sclocco and R. V. van Nieuwpoort, "Radio Astronomy Beam Forming on Many-Core Architectures," Parallel and Distributed Processing Symposium, International(IPDPS), Shanghai, China China, 2012, pp. 1105-1116.
278 ms
(Ver 3.3 (11022016))