2014 IEEE International Conference on Data Mining Workshop (ICDMW) (2014)
Dec. 14, 2014 to Dec. 14, 2014
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2014.91
In current days, data tend to become much bigger than before, and the distributed computing system is an prevalent option to deal with them. As one of powerful tools, MapReduce framework provides a cheap and efficient way to write parallel programs to run on distributed computing systems. Chance discovery (CD) is an extension of data mining, where chance refers to rare but important events or situations. Idea Graph is an efficient algorithm proposed to detect chances. However, the traditional implementation of Idea Graph is sequential, and its performance encounters some bottlenecks when dealing with big data. In this paper, we propose a parallel implementation of Idea Graph using MapReduce to better meet with the challenge of big data. First, we introduce the MapReduce framework, and then Idea Graph is introduced in brief. After that, we present the details on how we design the parallel Idea Graph implementation. In the end of the paper, several experiments are conducted to evaluate the proposed implementation. The experimental results demonstrate the validation of the proposed implementation and its better performance as compared with that of sequential Idea Graph implementation when handling big data.
Distributed databases, Nickel, Data mining, Big data, Algorithm design and analysis, Clustering algorithms
Q. Wang, H. Wang, C. Zhang, W. Wang, Z. Chen and F. Xu, "A Parallel Implementation of Idea Graph to Extract Rare Chances from Big Data," 2014 IEEE International Conference on Data Mining Workshop (ICDMW), Shenzhen, China, 2014, pp. 503-510.