Subscribe

Issue No.12 - December (2011 vol.23)

pp: 1857-1871

Lei Shi , University of Maryland, Baltimore County, Baltimore

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.212

ABSTRACT

The focus of this paper is to discover anomalous windows in linear intersecting paths. Anomalous windows are the contiguous groupings of data points. A linear path refers to a path represented by a line with a single dimensional spatial coordinate marking an observation point. In this paper, we propose an approach for discovering anomalous windows using a class of algorithms based on scan statistics, specifically 1) an Order invariant algorithm using Scan Statistics for Linear Intersecting Paths (SSLIP), 2) Brute force-SSLIP (BF-SSLIP), and 3) Central Brute Force—SSLIP (CBF-SSLIP). We further present two efficient variants of SSLIP: {\rm SSLIP}^\ast which employs a upper bound on the scan window size, and SSLIP-Acc, which adopts an accelerator function to speed up the scan process. The proposed approach for discovering anomalous windows along linear paths comprises the following distinct steps: 1) Cross Path Discovery: where we identify a subset of intersecting paths to be considered, 2) Anomalous Window Discovery: where we outline the various algorithms for the traversal of the cross paths to identify varying size directional windows along the paths. For identifying an anomalous window, an unusualness metric is computed, in the form of a likelihood ratio to indicate the degree of unusualness of this window with respect to the rest of the data. We identify the window with the highest likelihood ratio as our anomalous window, and 3) Monte Carlo Simulations: to ascertain whether this window is truly anomalous and not merely random occurrence, we perform hypothesis testing by computing a p-value using Monte Carlo Simulations. We present extensive experimental results in real world accident data sets for various highways with known issues (code and data available from [32], [27]). Additionally, we also perform comparisons with current approaches [18], [34] to show the efficacy of our approach. Our results show that our approach indeed is effective in identifying anomalous traffic accident windows along multiple intersecting highways.

INDEX TERMS

Spatial scan statistics, spatial scan window, linear scan statistic, anomaly detection.

CITATION

Lei Shi, "Anomalous Window Discovery for Linear Intersecting Paths",

*IEEE Transactions on Knowledge & Data Engineering*, vol.23, no. 12, pp. 1857-1871, December 2011, doi:10.1109/TKDE.2010.212REFERENCES

- [1] 2001 State Data for Fatalities Relating to Roadway, Pedestrian and Large Trucks, 2001.
- [2] V. Barnett and T. Lewis,
Outliers in Statistical Data, third ed. John Wiley and Sons, 1994.- [3] J. Besag and J. Newell, "The Detection of Clusters in Rare Diseases,"
J. Royal Statistical Soc., vol. 154, pp. 143-155, 1991.- [4] L. Duczmal and A. Renato, "A Simulated Annealing Strategy for the Detection of Arbitrarily Shaped Spatial Clusters,"
Computational Statistics and Data Analysis, vol. 45, no. 2, pp. 269-286, 2004.- [5] E. Keogh, J. Lin, and A. Fu, "Hot Sax: Efficiently Finding the Most Unusual Time Series Subsequence,"
Proc. IEEE Fifth Int'l Conf. Data Mining. 2005.- [6] M. Ester, H.P. Kriegel, J. Sander, and X. Xu, "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases,"
Proc. Second Int'l. Conf. Knowledge Discovery and Data Mining, pp. 44-49, 1996.- [7] A. Getis, "Reflections on Spatial Autocorrelation,"
Regional Science and Urban Economics, vol. 37, no. 4, pp. 491-496, 2007.- [8] J. Glaz and N. Balakrishnan,
Scan Statistics and Applications. Birkhauser, 1999.- [9] J. Glaz, J. Naus, and S. Wallenstein,
Scan Statistics. Springer, 2001.- [10] J. Glaz and Z. Zhang, "Multiple Window Discrete Scan Statistics,"
J. Applied Statistics, vol. 31, pp. 967-980, 2004.- [11] D. Griffith,
Spatial Autocorrelation: A Primer. Assoc. of Am. Geographers, 1987.- [12] V. Guralnik and J. Srivastava, "Event Detection from Time Series Data,"
KDD '99: Proc. Fifth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 33-42, 1999.- [13] R. Haining,
Spatial Data Analysis: Theory and Practice. Cambridge Univ. Press, 2003.- [14] Highway Research & Technology, the Need for Greater Inverstment, 1999.
- [15] V.S. Iyengar, "On Detecting Space-Time Clusters,"
Proc. Tenth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '04), pp. 587-592, 2004.- [16] V. Janeja and V. Atluri, "${LS^3}$ : A Linear Semantic Scan Statistic Technique for Detecting Anomalous Windows,"
Proc. ACM Symp. Applied Computing, 2005.- [17] V. Janeja and V. Atluri, "Random Walks to Identify Anomalous Free-Form Spatial Scan Windows,"
IEEE Trans. Knowledge and Data Eng., vol. 20, no. 10, pp. 1378-1392, Oct. 2008.- [18] M. Kulldorff, "A Spatial Scan Statistic,"
Comm. of Statistics—Theory Meth., vol. 26, no. 6, pp. 1481-1496, 1997.- [19] M. Kulldorff, W. Athas, E. Feuer, B. Miller, and C. Key, "Evaluating Cluster Alarms: A Space-Time Scan Statistic and Brain Cancer in Los Alamos,"
Am. J. Public Health, vol. 88, no. 9, pp. 1377-1380, 1998.- [20] J.-G. Lee, J. Han, and X. Li, "Trajectory Outlier Detection: A Partition-and-Detect Framework,"
Proc. IEEE 24th Int'l Conf. Data Eng. (ICDE '08), 2008.- [21] J. Lin, E. Keogh, S. Lonardi, and B. Chiu, "A Symbolic Representation of Time Series, with Implications for Streaming Algorithms,"
DMKD '03: Proc. Eighth ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery, pp. 2-11, 2003.- [22] H.J. Miller, "Tobler's First Law and Spatial Analysis,"
Annals of the Assoc. of Am. Geographers, vol. 94, no. 2, pp. 284-89, 2004.- [23] S. Mohammadi, V. Janeja, and A. Gangopadhyay, "Discretized Spatio-Temporal Scan Window,"
Proc. Ninth SIAM Int'l Conf. Data Mining, 2009.- [24] J. Naus, "The Distribution of the Size of the Maximum Cluster of Points on the Line,"
J. the Am. Statistical Assoc., vol. 60, pp. 532-538, 1965.- [25] J.I. Naus and S. Wallenstein, "Multiple Window and Cluster Size Scan Procedures,"
Methodology and Computing in Applied Probability, vol. 6, pp. 389-400, 2004.- [26] D. Neill, A. Moore, F. Pereira, and T. Mitchell, "Detecting Significant Multidimensional Spatial Clusters,"
Advances in Neural Information Processing Systems 17. pp. 969-976, MIT Press, 2005.- [27]
New Jersey Accident Data for State Routes, http://www.state.nj.us/transportation/refdata accident/, 1999.- [28] New Jersey Safe Corridor Program, 2007.
- [29] Nj State Routes Straight Line Diagrams, 2009.
- [30] S. Openshaw, "A Mark 1 Geographical Analysis Machine for the Automated Analysis of Point Data Sets,"
Int'l. J. Geographical Information Science, vol. 1, no. 4, pp. 335-358, 1987.- [31] R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules in Large Databases,"
Proc. 20th Int'l Conf. Very Large Data Bases, pp. 487-499, Sept. 1994.- [32] Sslip : Code, Datasets and Known Window Reports, http://www.umbc.edu/peopleleishi1, 2009.
- [33] O. Sugiura, "Testing Change-Points with Linear Trend," 1994.
- [34] T. Tango and K. Takahashi, "A Flexibly Shaped Spatial Scan Statistic for Detecting Clusters,"
Int'l J. Health Geographics, vol. 4, no. 1, p. 11, 2005.- [35] O. Tim, L. Firoiu, and P. Cohen, "Clustering Time Series with Hidden Markov Models and Dynamic Time Warping,"
Proc. Int'l Joint Conf. Artifical Intelligence (IJCAI '99) Workshop Sequence Learning, 1999.- [36] W. Tobler, "A Computer Model Simulation of Urban Growth in the Detroit Region,"
Economic Geography, vol. 46, no. 2, pp. 234-240, 1970.- [37] Njdot, Top 100 Intersection Crash Locations in Nj, 1998/1999.
- [38] L. Wei, E. Keogh, and X. Xi, "Saxually Explicit Images: Finding Unusual Shapes,"
ICDM '06: Proc. IEEE Sixth Int'l Conf. Data Mining, pp. 711-720, 2006.- [39] K. Yamanishi and J. ichi Takeuchi, "A Unifying Framework for Detecting Outliers and Change Points from Non-Stationary Time Series Data,"
KDD '02: Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 676-681, 2002.- [40] D. Yankov, E. Keogh, and U. Rebbapragada, "Disk Aware Discord Discovery: Finding Unusual Time Series in Terabyte Sized Datasets,"
Proc. Seventh IEEE Int'l Conf. Data Mining, 2007. |