Utility and Cloud Computing, IEEE Internatonal Conference on (2011)
Melbourne, Victoria Australia
Dec. 5, 2011 to Dec. 8, 2011
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/UCC.2011.16
In the field of astronomy, "Cross-Match" is a common operation used to mine useful information by joining different star catalogues. Nowadays star catalogues obtained through astronomical telescopes are becoming much larger than ever before, which drives us to consider implementing Cross-Match in a distributed computing environment. Although the computer hardware is cheap now and resizable compute capacity in the cloud is also available from some web services, we conduct experiments in a restricted environment to conserve resources as much as possible. In our work, we first use Hive from Face book, but find it not as efficient as we expected when facing two big catalogues. Then we analyze the join process Hive has and carry out some optimization, however, the result is still not satisfactory. Finally, we design our own Cross-Match program which bases on the directed join algorithm in MapReduce, takes advantage of the characteristics of astronomical data, and runs on top of Hadoop. Our program has improved the performance by 86% compared with the common join in Hive when making Cross-Match between USNOA and 2MASS.
Cross-Match, Hive, Directed Join, MapReduce, Big star catalogues
Q. Chen, C. Mi and T. Liu, "An Efficient Cross-Match Implementation Based on Directed Join Algorithm in MapReduce," 2011 IEEE 4th International Conference on Utility and Cloud Computing (UCC 2011)(UCC), Victoria, NSW, 2011, pp. 41-48.