Subscribe

Issue No.01 - Jan. (2014 vol.26)

pp: 194-207

Alessia Albanese , Dept. of Appl. Sci., Univ. of Naples Parthenope, Naples, Italy

Sankar K. Pal , Indian Stat. Inst., Kolkata, India

Alfredo Petrosino , Dept. of Appl. Sci., Univ. of Naples Parthenope, Naples, Italy

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2012.234

ABSTRACT

Nowadays, the high availability of data gathered from wireless sensor networks and telecommunication systems has drawn the attention of researchers on the problem of extracting knowledge from spatiotemporal data. Detecting outliers which are grossly different from or inconsistent with the remaining spatiotemporal data set is a major challenge in real-world knowledge discovery and data mining applications. In this paper, we deal with the outlier detection problem in spatiotemporal data and describe a rough set approach that finds the top outliers in an unlabeled spatiotemporal data set. The proposed method, called Rough Outlier Set Extraction (ROSE), relies on a rough set theoretic representation of the outlier set using the rough set approximations, i.e., lower and upper approximations. We have also introduced a new set, named Kernel Set, that is a subset of the original data set, which is able to describe the original data set both in terms of data structure and of obtained results. Experimental results on real-world data sets demonstrate the superiority of ROSE, both in terms of some quantitative indices and outliers detected, over those obtained by various rough fuzzy clustering algorithms and by the state-of-the-art outlier detection methods. It is also demonstrated that the kernel set is able to detect the same outliers set but with less computational time.

INDEX TERMS

Approximation methods, Set theory, Kernel, Knowledge engineering, Data engineering, Data mining, Uncertainty,rough set and granular computing, Spatiotemporal data, outlier detection, spatiotemporal uncertainty management

CITATION

Alessia Albanese, Sankar K. Pal, Alfredo Petrosino, "Rough Sets, Kernel Set, and Spatiotemporal Outlier Detection",

*IEEE Transactions on Knowledge & Data Engineering*, vol.26, no. 1, pp. 194-207, Jan. 2014, doi:10.1109/TKDE.2012.234REFERENCES

- [1] C.C. Aggarwal and P. Yu, "Finding Generalized Projected Clusters in High Dimensional Spaces,"
Proc. ACM SIGMOD Int'l Conf. Management Data, pp. 70-81, 2000.- [2] C.C. Aggarwal and P.S. Yu, "An Effective and Efficient Algorithm for High-Dimensional Outlier Detection,"
VLDB J., vol. 14, pp. 211-221, 2005.- [3] A. Albanese and A. Petrosino, "A Non Parametric Approach to the Outlier Detection in Spatio-Temporal Data Analysis,"
Information Technology and Innovation Trends in Organizations, D'Atri, et al., eds., pp. 101-108, Springer Verlag, 2011.- [4] F. Angiulli and C. Pizzuti, "Outlier Mining in Large High-Dimensional Data Sets,"
IEEE Trans. Knowledge and Data Eng., vol. 17, no. 2, pp. 203-215, Feb. 2005.- [5] F. Angiulli and F. Fassetti, "Distance-Based Outlier Queries in Data Streams: The Novel Task and Algorithms,"
J. Data Mining and Knowledge Discovery, vol. 20, no. 2, pp. 290-324, 2010.- [6] M. Ankerst, M.M. Breunig, H.-P. Kriegel, and J. Sander, "Optics: Ordering Points To Identify The Clustering Structure,"
Proc. ACM SIGMOD Int'l Conf. Management Data (SIGMOD '99), pp. 49-60, 1999.- [7] V. Barnett and T. Lewis,
Outliers in Statistical Data. John Wiley & Sons, 1994.- [8] S.D. Bay, "The UCI KDD Repository," http:/kdd.ics.uci.edu, 1999.
- [9] D. Birant and A. Kut, "Spatio-Temporal Outlier Detection in Large Databases,"
J. Computing and Information Technology, vol. 14, no. 4, pp. 291-297, 2006.- [10] T. Bittner, "Rough Sets in Spatio-Temporal Data Mining,"
Proc. First Int'l Workshop Temporal, Spatial, and Spatio-Temporal Data Mining-Revised Papers (TSDM '00), pp. 89-104, 2000.- [11] S. Boriah, V. Chandola, and V. Kumar, "Similarity Measures for Categorical Data: A Comparative Evaluation,"
Proc. Eighth SIAM Int'l Conf. Data Mining, pp. 243-254, 2008.- [12] M.M. Breunig, H-P. Kriegel, R.T. Ng, and J. Sander, "LOF: Identifying Density Based Local Outliers,"
Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 93-104, 2000.- [13] A. Ceglar, J.F. Roddick, and D.M.W. Powers, "CURIO: A Fast Outlier and Outlier Cluster Detection Algorithm for Large Data Sets,"
Proc. Second Int'l Workshop Integrating Artificial Intelligence and Data Mining, pp. 37-45, 2007.- [14] V. Chandola, A. Banerjee, and V. Kumar, "Anomaly Detection: A Survey,"
ACM Computing Surveys, vol. 41, no. 3, pp. 15:1-15:58, 2009.- [15] Y. Chen, D. Miao, and R. Wang, "Outlier Detection Based on Granular Computing,"
Proc. Sixth Int'l Conf. Rough Sets and Current Trends Computing, pp. 283-292, 2008.- [16] Y. Chen, D. Miao, and H. Zhang, "Neighborhood Outlier Detection,"
Expert Systems with Applications, vol. 37, no. 12, pp. 8745-8749, 2010.- [17] T. Cheng and Z. Li, "A Multiscale Approach to Detect Spatio-Temporal Outliers,"
Trans. GIS, vol. 10, no. 2, pp. 253-263, 2006.- [18] K. Das and J. Schneider, "Detecting Anomalous Records in Categorical Data Sets,"
Proc. 13th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 220-229, 2007.- [19] E. Frentzos, K. Gratsias, N. Pelekis, and Y. Theodoridis, "Nearest Neighbor Search on Moving Object Trajectories,"
Proc. Ninth Int'l Symp. Spatial and Temporal Databases (SSTD '05), pp. 328-345, 2005.- [20] A.K. Ghosh and P. Chaudhuri, "On Maximum Depth Classifiers,"
Scandinavian J. Statistics, vol. 32, no. 2, pp. 327-350, 2005.- [21] S. Guha, R. Rastogi, and K. Shim, "CURE: An Efficient Clustering Algorithm for Large Databases,"
Proc. ACM SIGMOD Int'l Conf. Management Data, vol. 27, no. 2, pp. 73-84, 1998.- [22] J.M.P. Gutierrez and J.F. Gregori,
Clustering Techniques Applied to Outlier Detection of Financial Market Series Using a Moving Window Filtering Algorithm, Unpublished working paper series, no. 948, European Central Bank, pp. 1-45, 2008.- [23] S. Harkins, H.X. He, G.J. Willams, and R.A. Baxter, "Outlier Detection Using Replicator Neural Networks,"
Proc. Fourth Int'l Conf. Data Warehousing and Knowledge Discovery, pp. 170-180, 2002.- [24] Z. He, X. Xu, and S. Deng, "Discovering Cluster-Based Local Outliers,"
J. Pattern Recognition Letters, vol. 24, pp. 1641-1650, 2003.- [25] V. Hodge and J. Austin, "A Survey of Outlier Detection Methodologies,"
J. Artificial Intelligence Rev., vol. 22, no. 2, pp. 85-126, 2004.- [26] F. Ingelrest, G. Barrenetxea, G. Schaefer, M. Vetterli, O. Couach, and M. Parlange, "SensorScope: Application-Specific Sensor Network for Environmental Monitoring,"
J. ACM Trans. Sensor Networks, vol. 6, no. 2, pp. 1-32, 2010.- [27] F. Jiang, Y. Sui, and C. Cao, "Outlier Detection Based on Rough Membership Function,"
Proc. Fifth Int'l Conf. Rough Sets and Current Trends Computing (RSCTC '06), pp. 388-397, 2006.- [28] F. Jiang, Y. Sui, and C. Cao, "Some Issues about Outlier Detection in Rough Set Theory,"
Expert Systems with Applications, vol. 36, no. 3, pp. 4680-4687, 2009.- [29] R. Jornsten, "Clustering and Classification Based on the L1 Data Depth,"
J. Multivariate Analysis, vol. 90, no. 1, pp. 67-89, 2004.- [30] T. Johnson, I. Kwok, and R.T. Ng, "Fast Computation of 2-Dimensional Depth Contours,"
Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining, pp. 224-228, 1998.- [31] E. Knorr and R. Ng, "Algorithms for Mining Distance-Based Outliers in Large Data Sets,"
Proc. 24th Int'l Conf. Very Large Data Bases (VLDB '98), pp. 392-403, 1998.- [32] A. Koufakou and M. Georgiopoulos, "A Fast Outlier Detection Strategy for Distributed High-Dimensional Data Sets with Mixed Attributes,"
Data Mining and Knowledge Discovery, vol. 20, no. 2, pp. 259-289, 2010.- [33] J. Laurikkala, M. Juhola, and E. Kentala, "Informal Identification of Outliers in Medical Data,"
Proc. Fifth Workshop Intelligent Data Analysis Medicine Pharmacology (IDAMAP), pp. 20-24, 2000.- [34] W. Liu, Y. Zheng, S. Chawla, J. Yuan, and X. Xie, "Discovering Spatio-Temporal Causal Interactions in Traffic Data Streams,"
Proc. 17th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 1010-1018, 2011.- [35] P. Maji and S.K. Pal, "Rough Set Based Generalized Fuzzy C-Means Algorithm and Quantitative Indices,"
IEEE Trans. Systems, Man, and Cybernetics-Part B: Cybernetics, vol. 37, no. 6, pp. 1529-1540, Dec. 2007.- [36] M. Markos and S. Sameer, "Novelty Detection: A Review Part 1: Statistical Approaches,"
Signal Processing, vol. 83, no. 12, pp. 2481-2497, 2003.- [37] E. Muller, I. Assent, U. Steinhausen, and T. Seidl, "OutRank: Ranking Outliers in High Dimensional Data,"
Proc. IEEE 24th Int'l Conf. Data Eng. Workshop, pp. 600-603, 2008.- [38] R.T. Ng and J. Han, "CLARANS: A Method for Clustering Objects for Spatial Data Mining,"
IEEE Trans. Knowledge and Data Eng., vol. 14, no. 5, pp. 1003-1016, Sept./Oct. 2002.- [39] T.T. Nguyen, "Outlier Detection: An Approximate Reasoning Approach,"
Proc. Int'l Conf. Rough Sets and Intelligent Systems Paradigms (RSEISP '07), pp. 495-504, 2007.- [40] S. Papadimitriou, H. Kitagawa, P.B. Gibbons, and C. Faloutsos, "LOCI: Fast Outlier Detection Using the Local Correlation Integral,"
Proc. 19th Int'l Conf. Data Eng. (ICDE '03), pp. 315-326, 2003.- [41] Z. Pawlak,
Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer, 1991.- [42] Z. Pawlak and A. Skowron, "A Rough Set Approach for Decision Rules Generation,"
Proc. Workshop W12: The Management Uncertainty in AI at 13th IJCAI, 1993.- [43] S. Ramaswamy, R. Rastogi, and K. Shim, "Efficient Algorithms for Mining Outliers from Large Data Sets,"
Proc. ACM SIGMOD Int'l Conf. Management Data, pp. 427-438, 2000.- [44] N.N.R. Ranga Suri, N. Murty, and G. Athithan, "Data Mining Techniques for Outlier Detection,"
Visual Analytics and Interactive Technologies: Data, Text and Web Mining Applications, chapter 2, pp. 22-38, IGI Global Snippet, 2010.- [45] J. Sander, M. Ester, H.-P. Kriegel, and X. Xu, "Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications,"
Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 169-194, 1998.- [46] P. Sun and S. Chawla, "On Local Spatial Outliers,"
Proc. IEEE Fourth Int'l Conf. Data Mining, pp. 209-216, Nov. 2004.- [47] Y. Tao, X. Xiao, and S. Zhou, "Mining Distance-Based Outliers from Large Databases in Any Metric Space,"
Proc. 12th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 394-403, 2006.- [48] P.M. Valero Mora, F.W. Young, and M. Friendly, "Visualizing Categorical Data in ViSta,"
Computational Statistics Data Analysis, vol. 43, no. 4, pp. 495-508, 2003.- [49] K. Venkateswara Rao, A. Govardhan, and K.V. Chalapati Rao, "Spatio Temporal Data Mining: Issues, Task and Applications,"
Int'l J. Computer Science Eng. Survey, vol. 3, no. 1, pp. 39-52, 2012.- [50] X.R. Wang, J.T. Lizier, O. Obst, M. Prokopenko, and P. Wang, "Spatiotemporal Anomaly Detection in Gasmonitoring Sensor Networks,"
Proc. European Conf. Wireless Sensor Networks (EWSN), pp. 90-105, 2008.- [51] G.J. Willams, R.A. Baxter, H.X. He, S. Harkins, and L.F. Gu, "A Comparative Study of RNN for Outlier Detection in Data Mining,"
Proc. IEEE Int'l Conf. Data Mining (ICDM '03), pp. 709-712, 2002.- [52] E. Wu, W. Liu, and S. Chawla, "Spatio-Temporal Outlier Detection in Precipitation Data,"
Proc. Second Int'l Conf. Knowledge Discovery from Sensor Data, pp. 115-133, 2008.- [53] Y.Y. Yao, "Two Views of the Theory of Rough Sets in Finite Universes,"
Int'l J. Approximate Reasoning, vol. 15, pp. 291-317, 1996.- [54] T. Zhang, R. Ramakrishnan, and M. Livny, "BIRCH: An Efficient Data Clustering Method for Very Large Databases,"
Proc. ACM SIGMOD Int'l Conf. Management Data, vol. 25, no. 2, pp. 103-114, 1996.- [55] Y. Zhang, S. Yang, and Y. Wang, "LDBOD: A Novel Local Distribution Based Outlier Detector,"
Pattern Recognition Letters, vol. 29, no. 7, pp. 967-976, 2008.- [56] Y. Zhang, N.A.S. Hamm, N. Meratnia, A. Stein, M. van de Voort, and P.J.M. Havinga, "Statistics-Based Outlier Detection for Wireless Sensor Networks,"
Int'l J. Geographical Information Science, vol. 26, no. 8, pp. 1373-1392, 2012.- [57] C. Zhu, H. Kitagawa, and C. Faloutsos, "Example-Based Robust Outlier Detection in High Dimensional Data Sets,"
Proc. IEEE Fifth Int'l. Conf. Data Mining (ICDM '05), pp. 829-832, Nov. 2005. |