The Community for Technology Leaders
Computer and Information Technology, International Conference on (2010)
Bradford, West Yorkshire, UK
June 29, 2010 to July 1, 2010
ISBN: 978-0-7695-4108-2
pp: 1235-1240
ABSTRACT
This paper compares parallel and distributed implementations of an iterative, Gibbs sampling, machine learning algorithm. Distributed implementations run under Hadoop on facility computing clouds. The probabilistic model under study is the infinite HMM, in which parameters are learnt using an instance blocked Gibbs sampling, with a step consisting of a dynamic program. We apply this model to learn part-of-speech tags from newswire text in an unsupervised fashion. However our focus here is on runtime performance, as opposed to NLP-relevant scores, embodied by iteration duration, ease of development, deployment and debugging.
INDEX TERMS
CITATION
Sébastien Bratières, Andreas Vlachos, Zoubin Ghahramani, Jurgen van Gael, "Scaling the iHMM: Parallelization versus Hadoop", Computer and Information Technology, International Conference on, vol. 00, no. , pp. 1235-1240, 2010, doi:10.1109/CIT.2010.223
83 ms
(Ver 3.3 (11022016))