2011 IEEE Third International Conference on Cloud Computing Technology and Science (2011)
Nov. 29, 2011 to Dec. 1, 2011
Distributed and Parallel algorithms have attracted a vast amount of interest and research in recent decades, to handle large-scale data set in real-world applications. In this paper, we focus on a parallel implementation of KD-Tree based outlier detection method to deal with large-scale data set. As one of the state-of-the-art outlier detection methods, KD-Tree based has been approved to be an effective algorithm. However, it still cannot process large-scale data set efficiently due to its serial implementation. Based on the current and powerful parallel programming framework -- MapReduce, we propose to implement the parallel KD-Tree based outlier detection algorithm (e.g., PKDTree for short). Experimental results demonstrate the efficiency of PKDTree according to the evaluation criterions of scale up, speedup and size up.
Data mining, Parallel Outlier Detection, KD-Tree, MapReduce
Q. Wang, Z. Shi, Q. He, F. Zhuang and Y. Ma, "Parallel Outlier Detection Using KD-Tree Based on MapReduce," 2011 IEEE Third International Conference on Cloud Computing Technology and Science(CLOUDCOM), Athens, Greece, 2011, pp. 75-80.