Subscribe
Issue No.04 - April (2013 vol.25)
pp: 790-804
Lizhen Wang , Yunnan University, Kunming
Pinping Wu , Yunnan University, Kunming
Hongmei Chen , Yunnan University, Kunming
ABSTRACT
A spatial colocation pattern is a group of spatial features whose instances are frequently located together in geographic space. Discovering colocations has many useful applications. For example, colocated plant species discovered from plant distribution data sets can contribute to the analysis of plant geography, phytosociology studies, and plant protection recommendations. In this paper, we study the colocation mining problem in the context of uncertain data, as the data generated from a wide range of data sources are inherently uncertain. One straightforward method to mine the prevalent colocations in a spatially uncertain data set is to simply compute the expected participation index of a candidate and decide if it exceeds a minimum prevalence threshold. Although this definition has been widely adopted, it misses important information about the confidence which can be associated with the participation index of a colocation. We propose another definition, probabilistic prevalent colocations, trying to find all the colocations that are likely to be prevalent in a randomly generated possible world. Finding probabilistic prevalent colocations (PPCs) turn out to be difficult. First, we propose pruning strategies for candidates to reduce the amount of computation of the probabilistic participation index values. Next, we design an improved dynamic programming algorithm for identifying candidates. This algorithm is suitable for parallel computation, and approximate computation. Finally, the effectiveness and efficiency of the methods proposed as well as the pruning strategies and the optimization techniques are verified by extensive experiments with &#x201C;real $(+)$ synthetic&#x201D; spatially uncertain data sets.
INDEX TERMS
Indexes, Probabilistic logic, Dynamic programming, Data mining, Heuristic algorithms, Approximation algorithms, Data models, approximate algorithms, Spatial colocations, spatially uncertain data set, possible worlds, probabilistic prevalent colocations (PPCs), dynamic programming
CITATION
Lizhen Wang, Pinping Wu, Hongmei Chen, "Finding Probabilistic Prevalent Colocations in Spatially Uncertain Data Sets", IEEE Transactions on Knowledge & Data Engineering, vol.25, no. 4, pp. 790-804, April 2013, doi:10.1109/TKDE.2011.256
REFERENCES
 [1] C.C. Aggarwal et al., "Frequent Pattern Mining with Uncertain Data," Proc. 15th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 29-37, 2009. [2] P. Agrawal, O. Benjelloun, A. Das Sarma, C. Hayworth, S. Nabar, T. Sugihara, and J. Widom, "Trio: A System for Data, Uncertainty, and Lineage," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 1151-1154, 2006. [3] R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules," Proc. Int'l Conf. Very Large Data Ba ses (VLDB), pp. 487-499, 1994. [4] C.C. Aggarwal and P.S. Yu, "A Survey of Uncertain Data Algorithms and Applications," IEEE Trans. Knowledge and Data Eng. (TKDE), vol. 21, no. 5, pp. 609-623, May 2009. [5] T. Bernecker, H-P Kriegel, M. Renz, F. Verhein, and A. Zuefle, "Probabilistic Frequent Itemset Mining in Uncertain Databases," Proc. 15th ACM SIGKDD Conf. Knowledge Discovery and Data Mining (KDD '09), pp. 119-127, 2009. [6] C.-K. Chui, B. Kao, and E. Hung, "Mining Frequent Itemsets from Uncertain Data," Proc. 11th Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD), pp. 47-58, 2007. [7] C.-K. Chui and B. Kao, "A Decremental Approach for Mining Frequent Itemsets from Uncertain Data," Proc. 12th Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD), pp. 64-75, 2008. [8] M. Ester, H.-P. Kriegel, and J. Sander, "Knowledge Discovery in Spatial Databases," Proc. 23rd German Conf. Artificial Intelligence (KI '99), (Invited Paper), vol. 1701, pp. 61-74, 1999. [9] T. Green and V. Tannen, "Models for Incomplete and Probabilistic Information," Data Eng. Bull., vol. 29, no. 1, pp. 25-31, 2006. [10] K. Koperski and J. Han, "Discovery of Spatial Association Rules in Geographic Information Databases," Proc. Int'l Symp. Large Spatial Databases (SSD '95), pp. 47-66, 1995. [11] A.A.-E. Laila, E.E.-S. Mohamed, M.E.-F. Laila, and K.H. Yehia, "Vertical Mining of Frequent Patterns from Uncertain Data," Computer and Information Science, vol. 3, no. 2, pp. 171-179, 2010. [12] C.K.-S. Leung, M.A.F. Mateo, and D.A. Brajczuk, "A Tree-Based Approach for Frequent Pattern Mining from Uncertain Data," Proc. 12th Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining (PAKDD), pp. 653-661, 2008. [13] C.K.-S. Leung1, B. Hao, and F. Jiang, "Constrained Frequent Itemset Mining from Uncertain Data Streams," Proc. IEEE 26th Int'l Conf. Data Eng. Workshops (ICDEW), pp. 120-127, 2010. [14] Y. Huang, H. Xiong, S. Shekhar, and J. Pei, "Mining Confident Co-Location Rules without a Support Threshold," Proc. ACM Symp. Applied Computing, pp. 497-501, 2003. [15] Y. Huang, S. Shekhar, and H. Xiong, "Discovering Co-Location Patterns from Spatial Data Sets: A General Approach," IEEE Trans. Knowledge and Data Eng., vol. 16, no. 12, pp. 1472-1485, Dec. 2004. [16] Y. Huang, J. Pei, and H. Xiong, "Mining Co-Location Patterns with Rare Events from Spatial Data Sets," Geoinformatica, vol. 10, no. 3, pp. 239-260, 2006. [17] Y. Huang, S. Shekhar, and H. Xiong, "A Framework for Mining Sequential Patterns from Spatio-Temporal Event Data Sets," IEEE Trans. Knowledge and Data Eng. (TKDE), vol. 20, no. 4, pp. 433-448, Apr. 2008. [18] Y. Lu, L. Wang, and X. Zhang, "Mining Frequent Co-Location Patterns from Uncertain Data," J. Frontiers of Computer Science and Technology, vol. 3, no. 6, pp. 656-664, 2009. [19] Y. Lu, L. Wang, H. Chen, and L. Zhao, "Spatial Co-location Patterns Mining over Uncertain Data Based on Possible Worlds," J. Computer Research and Development, 47(Suppl.), pp. 215-221, 2010. [20] Y. Morimoto, "Mining Frequent Neighboring Class Sets in Spatial Databases," Proc. Seventh ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 353-358, 2001. [21] S. Shekhar and Y. Huang, "Co-Location Rules Mining: A Summary of Results," Proc. Symp. Spatio-Temporal Databases (SSTD), pp. 236-256, 2001. [22] L. Wang, K. Xie, T. Chen, and X. Ma, "Efficient Discovery of Multilevel Spatial Association Rule Using Partition," Information and Software Technology (IST), vol. 47, no. 13, pp. 829-840, 2005. [23] L. Wang, Y. Bao, J. Lu, and J. Yip, "A New Join-Less Approach for Co-Location Pattern Mining," Proc. IEEE Eighth Int'l Conf. Computer and Information Technology (CIT), pp. 197-202, 2008. [24] L. Wang, L. Zhou, J. Lu, and J. Yip, "An Order-Clique-Based Approach for Mining Maximal Co-Locations," Information Sciences, vol. 179, no. 19, pp. 3370-3382, 2009. [25] L. Wang, H. Chen, L. Zhao, and L. Zhou, "Efficiently Mining Co-Location Rules on Interval Data," Proc. Sixth Int'l Conf. Advanced Data Mining and Applications, pp. 477-488, 2010. [26] S. Wang, G. Wang, and J. Chen, "Distributed Frequent Items Detection on Uncertain Data," Proc. Sixth Int'l Conf. Advanced Data Mining and Applications (ADMA '10), pp. 59-520, 2010. [27] X. Xiao, X. Xie, Q. Luo, and W. Ma, "Density Based Co-Location Pattern Discovery," Proc. 16th ACM SIGSPATIAL Int'l Conf. Advances in Geographic Information Systems (GIS), pp. 11-20, 2008. [28] J.S. Yoo, S. Shekhar, J. Smith, and J.P. Kumquat, "A Partial Join Approach for Mining Co-Location Patterns," Proc. 12th Ann. ACM Int'l Workshop Geographic Information Systems (GIS), pp. 241-249, 2004. [29] J.S. Yoo and S. Shekhar, "A Joinless Approach for Mining Spatial Co-Location Patterns," IEEE Trans. Knowledge and Data Eng. (TKDE), vol. 18, no, 10, pp. 1323-1337, Oct. 2006. [30] Q. Zhang, F. Li, and K. Yi, "Finding Frequent Items in Probabilistic Data," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 819-832, 2008. [31] X. Zhang, N. Mamoulis, D.W. Cheung, and Y. Shou, "Fast Mining of Spatial Co-Locations," Proc. Tenth ACM SIGKDD Conf. Knowledge Discovery and Data Mining, pp. 384-393, 2004.