The Community for Technology Leaders
2017 IEEE 31st International Conference on Advanced Information Networking and Applications (AINA) (2017)
Taipei, Taiwan
March 27, 2017 to March 29, 2017
ISSN: 1550-445X
ISBN: 978-1-5090-6029-0
pp: 1019-1026
ABSTRACT
Big Data has become commonplace in most Internet-based applications, which by delivering services to planetary scale numbers of users generate very large data sets. Such data sets are considered as a valuable source of analytics information and knowledge for many purposes and domains. It is claimed each time more that Big Data and machine learning, especially data mining, are the basis for developing advanced analytics platforms for turning data into valuable assets, gaining competitive advantage and make better decisions. At the same time, however, Big Data applications are showing to be killer applications for the state of the art machine learning and data mining algorithms. Indeed, traditional data mining frameworks such as WEKA, R, etc. and those from big companies such as IBM SPSS Modeler, SAS Enterprise Miner, Oracle Data Mining, etc. are facing the challenges of 1) coping with mining large data sets within short times and 2) under high rates of data generation. The way envisaged ahead to effectively deal with such challenges is to move to Cloud-based versions of such frameworks and development of new frameworks implemented using Cloud platforms. In either case, data mining and machine learning algorithms are being fully implemented in Cloud platforms under new requirements of Big Data for efficiency and performance. In the group of newly developed frameworks there is Apache Mahout, whose goal is "to build an environment for quickly creating scalable performant machine learning applications". In this paper we analyse the performance of some clustering algorithms of Apache Mahout using a Twitter streaming dataset under a Hadoop MapReduce cluster infrastructure according to various evaluation criteria.
INDEX TERMS
Big data, Clustering algorithms, Text mining, Heuristic algorithms, Algorithm design and analysis, Twitter
CITATION

F. Xhafa, A. Bogza and S. Caballe, "Performance Evaluation of Mahout Clustering Algorithms Using a Twitter Streaming Dataset," 2017 IEEE 31st International Conference on Advanced Information Networking and Applications (AINA), Taipei, Taiwan, 2017, pp. 1019-1026.
doi:10.1109/AINA.2017.50
95 ms
(Ver 3.3 (11022016))