2014 IEEE/ACM International Symposium on Big Data Computing (BDC) (2014)
London, United Kingdom
Dec. 8, 2014 to Dec. 11, 2014
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/BDC.2014.17
With the ever increasing production of data from various heterogeneous sources in modern information societies, the need for scalable data-intensive processing is increasing. MapReduce quickly became the de facto framework for large scale data analysis, due to its simple and abstract programming model and its efficient underlying execution system. However, this simplicity comes with a price: its unidirectional communication model and the lack of support for iterations, makes repeated querying of datasets difficult and imposes limitations in many fields including Machine Learning. In this paper we describe the implementation of a classification rule induction algorithm based on MapReduce, with the aim of building a classification model within as few iterations as possible. After a thorough description of the algorithm, we evaluate its performance from three perspectives: its accuracy, its parallel performance and the communication costs. The evaluations indicate that the approach is scalable and since it produces a comprehensive human-readable model it can be proven valuable for a wide range of applications.
Training, Machine learning algorithms, Big data, Accuracy, Radiation detectors, Production, Clustering algorithms
V. Kolias, I. Anagnostopoulos and E. Kayafas, "A Covering Classification Rule Induction Approach for Big Datasets," 2014 IEEE/ACM International Symposium on Big Data Computing (BDC), London, United Kingdom, 2014, pp. 45-53.