This Article 
 Bibliographic References 
 Add to: 
Random Walks to Identify Anomalous Free-Form Spatial Scan Windows
October 2008 (vol. 20 no. 10)
pp. 1378-1392
Vandana P. Janeja, University of Maryland Baltimore County, Baltimore
Vijayalakshmi Atluri, Rutgers University, Newark
Often, it is required to identify anomalous windows reflecting unusual rate of occurrence of a specific event of interest. Spatial scan statistic approach moves scan window over the region and computes the statistic of a parameter(s) of interest, and identifies anomalous windows. While this approach has been successfully employed, earlier proposals suffer from two limitations: (i) In general, the scan window is regular shaped (e.g., circle, rectangle) identifying anomalous windows of fixed shapes only. However, the region of anomaly is not necessarily regular shaped. Recent proposals to identify windows of irregular shapes identify windows larger than the true anomalies, or penalize large windows. (ii) These techniques account for autocorrelation among spatial data, but not spatial heterogeneity often resulting in inaccurate anomalous windows. We propose a random walk based Free-Form Spatial Scan Statistic (FS3). We construct a Weighted Delaunay Nearest Neighbor graph (WDNN) to capture spatial autocorrelation and heterogeneity. Using random walks we identify natural free-form scan windows, not restricted to a predefined shape and prove that they are not random. FS3 on real datasets has shown that it identifies more refined anomalous windows with better likelihood ratio of it being an anomaly as compared to earlier spatial scan statistic approaches.

[1] L. Anselin, R. Florax, and S. Rey, “Econometrics for Spatial Models, Recent Advances,” Advances in Spatial Econometrics: Methodology, Tools, and Applications, pp. 1-25, 2004.
[2] F. Aurenhammer, “Voronoi Diagrams: A Survey of a Fundamental Geometric Data Structure,” ACM Computing Surveys, vol. 23, no. 3, pp. 345-405, 1991.
[3] M.N. Barber and B.W. Ninham, Random and Restricted Walks: Theory and Applications. Gordon and Breach Science, 1970.
[4] V. Barnett and T. Lewis, Outliers in Statistical Data, third ed. John Wiley and Sons, 1994.
[5] J. Besag and J. Newell, “The Detection of Clusters in Rare Diseases,” J. Royal Statistical Soc., vol. 154, pp. 143-155, 1991.
[6] M.M. Breunig, H.-P. Kriegel, R.T. Ng, and J. Sander, “Optics-Of: Identifying Local Outliers,” Proc. Third European Conf. Principles of Data Mining and Knowledge Discovery (PKDD '99), pp. 262-270, 1999.
[7] Y. Chen, H.B. Jr., X. Dang, and H. Peng, “Depth-Based Novelty Detection and Its Application to Taxonomic Research,” Proc. Seventh IEEE Int'l Conf. Data Mining (ICDM '07), pp. 113-122, 2007.
[8] L. Duczmal, M. Kulldorff, and L. Huang, “Evaluation of Spatial Scan Statistics for Irregularly Shaped Clusters,” J. Computational and Graphical Statistics, vol. 15, no. 2, pp. 428-442, 2006.
[9] L. Duczmal and A. Renato, “A Simulated Annealing Strategy for the Detection of Arbitrarily Shaped Spatial Clusters,” Computational Statistics and Data Analysis, vol. 45, no. 2, pp. 269-286, 2004.
[10] M. Ester, H.P. Kriegel, J. Sander, and X. Xu, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases,” Proc. Second Int'l Conf. Knowledge Discovery and Data Mining (KDD '96), pp. 44-49, 1996.
[11] A. Getis, “Reflections on Spatial Autocorrelation,” Regional Science and Urban Economics, vol. 37, no. 4, pp. 491-496, 2007.
[12] J. Glaz, J. Naus, and S. Wallenstein, Scan Statistics. Springer Verlag Series in Statistics, 2001.
[13] D. Griffith, Spatial Autocorrelation: A Primer. Assoc. of Am. Geographers, 1987.
[14] R. Haining, Spatial Data Analysis: Theory and Practice. Cambridge Univ. Press, 2003.
[15] D. Harel and Y. Koren, “Clustering Spatial Data Using Random Walks,” Proc. Seventh Int'l Conf. Knowledge Discovery and Data Mining (KDD '01), pp. 281-286, 2001.
[16] V.S. Iyengar, “On Detecting Space-Time Clusters,” Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '04), pp. 587-592, 2004.
[17] V. Janeja and V. Atluri, “$FS^{3}$ : A Random Walk Based Free-Form Spatial Scan Statistic for Anomalous Window Detection,” Proc. Fifth IEEE Int'l Conf. Data Mining (ICDM '05), pp. 661-664, 2005.
[18] V. Janeja and V. Atluri, “$LS^{3}$ : A Linear Semantic Scan Statistic Technique for Detecting Anomalous Windows,” Proc. 20th Ann. ACM Symp. Applied Computing (SAC), 2005.
[19] Y. Kou, C. Lu, and D. Chen, “Spatial Weighted Outlier Detection,” Proc. Sixth SIAM Int'l Conf. Data Mining (SDM '06), Apr. 2006.
[20] M. Kulldorff, “A Spatial Scan Statistic,” Comm. Statistics—Theory Methods, vol. 26, no. 6, pp. 1481-1496, 1997.
[21] M. Kulldorff, Spatial Scan Statistics: Models, Calculations, and Applications, 1999.
[22] M. Kulldorff, W. Athas, E. Feuer, B. Miller, and C. Key, “Evaluating Cluster Alarms: A Space-Time Scan Statistic and Brain Cancer in Los Alamos,” Am. J. Public Health, vol. 88, no. 9, pp. 1377-1380, 1998.
[23] H. Li and J.F. Reynolds, “A Simulation Experiment to Quantify Spatial Heterogeneity in Categorical Maps,” Ecology, vol. 75, no. 8, pp. 2446-2455, 1994.
[24] C. Lu, D. Chen, and Y. Kou, “Detecting Spatial Outliers with Multiple Attributes,” Proc. 15th IEEE Int'l Conf. Tools with Artificial Intelligence (ICTAI '03), p. 122, 2003.
[25] C. Lu, Y. Kou, J. Zhao, and L. Chen, “Detecting and Tracking Regional Outliers in Meteorological Data,” Information Science, vol. 177, no. 7, pp. 1609-1632, 2007.
[26] H.J. Miller, “Tobler's First Law and Spatial Analysis,” Annals of the Assoc. of Am. Geographers, vol. 94, no. 2, pp. 284-289, 2004.
[27] J. Naus, “The Distribution of the Size of the Maximum Cluster of Points on the Line,” J. Am. Statistical Assoc., vol. 60, pp. 532-538, 1965.
[28] D. Neill, A. Moore, F. Pereira, and T. Mitchell, “Detecting Significant Multidimensional Spatial Clusters,” Advances in Neural Information Processing Systems, vol. 17, pp. 969-976, 2005.
[29], “Making a Wasteland: Ford, the Feds, the Mob,” toxiclegacy.html, Oct. 2005, last accessed on July 2008.
[30] A. Okabe, B. Boots, K. Sugihara, and S. Chiu, Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. John Wiley, 2000.
[31] S. Openshaw, “A Mark 1 Geographical Analysis Machine for the Automated Analysis of Point Data Sets,” Int'l J. Geographical Information Science, vol. 1, no. 4, pp. 335-358, 1987.
[32] G.P. Patil and C. Tallie, “Geographic and Network Surveillance via Scan Statistics for Critical Area Detection,” Statistical Science, vol. 18, no. 4, pp. 457-465, 2003.
[33] L. Premo, “Local Spatial Autocorrelation Statistics Quantify Multi-Scale Patterns in Distributional Data: An Example from the Maya Lowlands,” J. Archaeological Science, vol. 31, no. 7, pp. 855-866, 2004.
[34] S. Shekhar, C.-T. Lu, and P. Zhang, “Detecting Graph-Based Spatial Outliers: Algorithms and Applications (a Summary of Results),” Proc. ACM SIGKDD '01, pp. 371-376, 2001.
[35] J. Shewchuk, “Delaunay Refinement Algorithms for Triangular Mesh Generation,” Computational Geometry: Theory and Applications, vol. 22, no. 1-3, pp. 21-74, 2002.
[36] R. Sibson, “Locally Equiangular Triangulations,” The Computer J., vol. 21, no. 3, pp. 243-245, 1978.
[37] P. Sun and S. Chawla, “On Local Spatial Outliers,” Proc. Fourth IEEE Int'l Conf. Data Mining (ICDM '04), pp. 209-216, 2004.
[38] T. Tango and K. Takahashi, “A Flexibly Shaped Spatial Scan Statistic for Detecting Clusters,” Int'l J. Health Geographics, vol. 4, no. 11, 2005.
[39] W. Tobler, “A Computer Model Simulation of Urban Growth in the Detroit Region,” Economic Geography, vol. 46, no. 2, pp.234-240, 1970.
[40] “Incidence and Mortality Web Based Report,” technical report, US Cancer Statistics Working Group and Dept. of Health and Human Services, Centers for Disease Control and Prevention and Nat'l Cancer Inst., 1999-2002.
[41] W.F. Athas and C.R. Key, “Los Alamos Cancer Rate Study: Phase I,” final report, New Mexico Dept. of Health, 1993.

Index Terms:
Spatial databases, Spatial databases and GIS, anomaly detection, scan statistics
Vandana P. Janeja, Vijayalakshmi Atluri, "Random Walks to Identify Anomalous Free-Form Spatial Scan Windows," IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 10, pp. 1378-1392, Oct. 2008, doi:10.1109/TKDE.2008.96
Usage of this product signifies your acceptance of the Terms of Use.