The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.08 - August (2008 vol.20)
pp: 1053-1066
ABSTRACT
Impact-targeted activities are rare but lead to significant impact on the society, e.g., isolated terrorism activities may lead to a disastrous event threatening national security. Similar issues can also be seen in many other areas. Therefore, it is important to identify such particular activities before they lead to significant impact to the world. However, it is challenging to mine impact-targeted activity patterns due to its imbalanced structure. This paper develops techniques for discovering such activity patterns. First, the complexities of mining imbalanced impact-targeted activities are analyzed.We then discuss strategies for constructing impact-targeted activity sequences. Algorithms are developed to mine frequent positive-impact (P → T) and negative-impact (P → $(\bar{T})$) oriented activity patterns, sequential impact-contrasted activity patterns (P is frequently associated with both pattern P → T and P → $(\bar{T})$) in separated data sets), and sequential impact-reversed activity patterns (both P → T and PQ → $(\bar{T})$) are frequent). Activity impact modelling is also studied to quantify pattern impact on business outcomes. Social security debt-related activity data is used to test the proposed approaches. The outcomes show that they are promising for ISI applications to identify impact-targeted activity patterns in imbalanced data.
INDEX TERMS
Clustering, classification, and association rules, data mining
CITATION
Longbing Cao, Yanchang Zhao, Chengqi Zhang, "Mining Impact-Targeted Activity Patterns in Imbalanced Data", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 8, pp. 1053-1066, August 2008, doi:10.1109/TKDE.2007.190635
REFERENCES
[1] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” Proc. Int'l Conf. Very Large Data Bases (VLDB '94), pp. 487-499.
[2] L. Cao and C. Zhang, “Fuzzy Genetic Algorithms for Pairs Mining,” Proc. Ninth Pacific Rim Int'l Conf. Artificial Intelligence (PRICAI '06), pp. 711-720, 2006.
[3] L. Cao and C. Zhang, “Domain-Driven Data Mining: A Practical Methodology,” Int'l J. Data Warehousing and Mining, vol. 2, no. 4, pp. 49-65, 2006.
[4] L. Cao and C. Zhang, “Two-Way Significance of Knowledge Actionability,” Int'l J. Business Intelligence and Data Mining, vol. 4, 2007.
[5] L. Cao, Y. Zhao, C. Zhang, and H. Zhang, “Activity Mining: From Activities to Actions,” Int'l J. Information Technology and Decision Making, 2007.
[6] “Integrated Activity Management Developer Guide,” technical report, Centrelink, Sept. 1999.
[7] Centrelink Annual Report 2004-05, Centrelink, 2004.
[8] N. Chawla, N. Japkowicz, and A. Kotcz, “Editorial: Special Issue on Learning from Imbalanced Data Sets,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, June 2004.
[9] H. Chen, F. Wang, and D. Zeng, “Intelligence and Security Informatics for Homeland Security: Information, Communication, and Transportation,” IEEE Trans. Intelligent Transportation Systems, vol. 5, no. 4, pp. 329-341, 2004.
[10] H. Chen, W. Chung, J.J. Xu, G. Wang, Y. Qin, and M. Chau, “Crime Data Mining: A General Framework and Some Examples,” Computer, vol. 37, no. 4, pp. 50-56, Apr. 2004.
[11] J. Chen, H. He, G. Williams, and H. Jin, “Temporal Sequence Associations for Rare Events,” Proc. Eighth Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining (PAKDD '04), pp. 235-239, 2004.
[12] H. Chen and F. Wang, “Guest Editors' Introduction: Artificial Intelligence for Homeland Security,” IEEE Intelligent Systems, vol. 20, no. 5, pp. 12-16, Sept./Oct. 2005.
[13] H. Chen, Intelligence and Security Informatics for International Security: Information Sharing and Data Mining. Springer, 2006.
[14] G. Dong and J. Li, “Efficient Mining of Emerging Patterns: Discovering Trends and Differences,” Proc. Fifth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '99), pp. 43-52, 1999.
[15] H. Guo and H. Viktor, “Learning from Imbalanced Data Sets with Boosting and Data Generation: The DataBoost-IM Approach,” ACM SIGKDD Explorations Newsletter, special issue on learning from imbalanced datasets, vol. 6, no. 1, pp. 30-39, June 2004.
[16] V. Guralnik and J. Srivastava, “Event Detection from Time Series Data,” Proc. Fifth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '99), pp. 33-42, 1999.
[17] M. Hammori, J. Herbst, and N. Kleiner, “Interactive Workflow Mining-Requirements, Concepts and Implementation,” Data and Knowledge Eng., vol. 56, pp. 41-63, 2006.
[18] J. Han, J. Pei, and X. Yan, “Sequential Pattern Mining by Pattern-Growth: Principles and Extensions,” Recent Advances in Data Mining and Granular Computing, W.W. Chu and T.Y. Lin, eds. Springer, 2005.
[19] J. Han and M. Kamber, Data Mining: Concepts and Techniques, second ed. Morgan Kaufmann, 2006.
[20] N. Japkowicz, “Learning from Imbalanced Data Sets: A Comparison of Various Strategies,” Proc. AAAI Workshop Learning from Imbalanced Data Sets, 2000.
[21] P. Kantor et al., “Intelligence and Security Informatics,” Proc. Third IEEE Int'l Conf. Intelligence and Security Informatics (ISI '05), 2005.
[22] D. Luo, L. Cao, C. Luo, and C. Zhang, “Towards Business Interestingness in Actionable Knowledge Discovery,” Proc. PAKDD Workshop Data Mining for Business '07, 2007.
[23] J. Mena, Investigative Data Mining for Security and Criminal Detection, first ed. Butterworth-Heinemann, 2003.
[24] J. Mena, Homeland Security Techniques and Technologies (Networking Series). Charles River Media, 2004.
[25] National Strategy for Homeland Security, Office of Homeland Security, 2002.
[26] Nat'l Research Council, Making the Nation Safer: The Role of Science and Technology in Countering Terrorism. Nat'l Academy Press, 2002.
[27] W. Potts, Survival Data Mining: Modeling Customer Event Histories. Wiley and Sons, 2006.
[28] M. Sageman, Understanding Terror Networks. Univ. of Pennsylvania Press, 2004.
[29] A. Silberschatz and A. Tuzhilin, “What Makes Patterns Interesting in Knowledge Discovery Systems,” IEEE Trans. Knowledge and Data Eng., vol. 8, no. 6, pp. 970-974, Dec. 1996.
[30] M. Skop, Survival Analysis and Event History Analysis. Wiley, 2005.
[31] W.M.P. Van der Aalst et al., “Process Mining: A Research Agenda,” Computers in Industry, vol. 53, pp. 231-244, 2004.
[32] G. Wang, H. Chen, and H. Atabakhsh, “Automatically Detecting Deceptive Criminal Identities,” Comm. ACM, vol. 47, no. 3, pp. 71-76, 2004.
[33] F.-Y. Wang, C. Karleen, D. Zeng, and W. Mao, “Social Computing: From Social Informatics to Social Intelligence,” IEEE Intelligent Systems, vol. 22, no. 2, pp. 79-83, Mar./Apr. 2007.
[34] S. Wasserman and K. Faust, Social Network Analysis: Methods and Applications. Cambridge Univ. Press, 1994.
[35] G. Williams et al., “Temporal Event Mining of Linked Medical Claims Data,” Proc. Seventh Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD '03), 2003.
[36] X. Wu, C. Zhang, and S. Zhang, “Efficient Mining of Both Positive and Negative Association Rules,” ACM Trans. Information Systems, vol. 22, no. 3, pp. 381-405, 2004.
[37] Y. Yao, F. Wang, J. Wang, and D. Zeng, “Rule + Exception Strategies for Security Information Analysis,” IEEE Intelligent Systems, vol. 20, no. 5, pp. 52-57, Sept./Oct. 2005.
[38] J. Zhang, E. Bloedorn, L. Rosen, and D. Venese, “Learning Rules from Highly Unbalanced Data Sets,” Proc. Fourth IEEE Int'l Conf. Data Mining (ICDM '04), pp. 571-574, 2004.
[39] Y. Zhao, L. Cao, J. Chen, and C. Zhang, “Centrelink Improving Income Reporting Data Exploration Report,” technical report, Mar. 2006.
[40] Y. Zhao, L. Cao, Y. Morrow, Y. Ou, J. Ni, and C. Zhang, “Discovering Debtor Patterns of Centrelink Customers,” Proc. Australasian Data Mining Conf. (AusDM '06), Nov. 2006.
29 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool