The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.10 - Oct. (2013 vol.25)
pp: 2231-2244
Morten Middelfart , TARGIT, US and Denmark
Torben Bach Pedersen , Aalborg University, Aalborg
Jan Krogsgaard , TARGIT, US and Denmark
ABSTRACT
This paper proposes a highly efficient bitmap-based approach for discovery of so-called sentinels. Sentinels represent schema level relationships between changes over time in certain measures in a multidimensional data cube. Sentinels are actionable and notify users based on previous observations, for example, that revenue might drop within two months if an increase in customer problems combined with a decrease in website traffic is observed. We significantly extend prior art by representing the sentinel mining problem by bitmap operations, using bitmapped encoding of so-called indication streams. We present a very efficient algorithm, SentBit, that is 2-3 orders of magnitude faster than the state of the art, and utilizes CPU specific instructions and the multicore architectures available on modern processors. The SentBit algorithm scales efficiently to very large data sets, which is verified by extensive experiments on both real and synthetic data.
INDEX TERMS
Data mining, Time measurement, Bidirectional control, Art, Encoding, Organizations, Databases, cube-based data mining, Data mining, Time measurement, Bidirectional control, Art, Encoding, Organizations, Databases, sentinels, Pattern mining, predictive data mining
CITATION
Morten Middelfart, Torben Bach Pedersen, Jan Krogsgaard, "Efficient Sentinel Mining Using Bitmaps on Modern Processors", IEEE Transactions on Knowledge & Data Engineering, vol.25, no. 10, pp. 2231-2244, Oct. 2013, doi:10.1109/TKDE.2012.198
REFERENCES
[1] R. Agrawal, T. Imielinski, and A. Swami, "Mining Association Rules between Sets of Items in Large Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 207-216, 1993.
[2] R. Agrawal, K.I. Lin, H.S. Sawhney, and K. Shim, "Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases," Proc. 21st Int'l Conf. Very Large Databases (VLDB), pp. 490-501, 1995.
[3] R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules in Large Databases," Proc. 20th Int'l Conf. Very Large Databases (VLDB), pp. 487-499, 1994.
[4] R. Agrawal and R. Srikant, "Mining Sequential Patterns," Proc. Int'l Conf. Data Eng. (ICDE), pp. 3-14, 1995.
[5] "Advanced Micro Devices," Software Optimization Guide for AMD Family 10h Processors, Nov. 2008.
[6] P. Bosc, O. Pivert, and L. Ughetto, "On Data Summaries Based on Gradual Rules," Proc. Sixth Int'l Conf. Computational Intelligence, Theory and Applications: Fuzzy Days, pp. 512-521, 1999.
[7] S. Brin, R. Motwani, J.D. Ullman, and S. Tsur, "Dynamic Itemset Counting and Implication Rules for Market Basket Data," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 255-264, 1997.
[8] J. Han and M. Kamber, Data Mining Concepts and Techniques, second ed. Morgan Kaufmann Publishers, 2006.
[9] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. Hsu, "FreeSpan: Frequent Pattern-Projected Sequential Pattern Mining," Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 355-359, 2000.
[10] T. Imielinski, L. 'Khachiyan, and A. Abdulghani, "Cubegrades: Generalizing Association Rules," Data Mining Knowledge Discovery, vol. 6, no. 3, pp. 219-257, 2002.
[11] Intel, Intel SSE4 Programming Reference, July 2007.
[12] M. Middelfart, T.B. Pedersen, and J. Krogsgaard, "Efficient Discovery of Generalized Sentinel Rules," Proc. 21st Int'l Conf. Database and Expert Systems Applications (DEXA): Part II, pp. 32-48, 2010.
[13] M. Middelfart and T.B. Pedersen, "Using Sentinel Technology in the TARGIT BI Suite," Proc. VLDB Endowment, vol. 3, no. 2, pp. 1629-1632, 2010.
[14] M. Middelfart and T.B. Pedersen, "Implementing Sentinel Technology in the TARGIT BI Suite," Proc. IEEE 27th Int'l Conf. Data Eng. (ICDE), pp. 1187-1198, 2011.
[15] F. Nakagaito, T. Ozaki, and T. Ohkawa, "Discovery of Quantitative Sequential Patterns from Event Sequences," Proc. IEEE Int'l Conf. Data Mining Workshops (ICDM), pp. 31-36, 2009.
[16] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.C. Hsu, "PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth," Proc. 17th Int'l Conf. Data Eng. (ICDE), pp. 215-224, 2001.
[17] J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, "Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach," IEEE Trans. Knowledge Data Eng., vol. 16, no. 11, pp. 1424-1440, Nov. 2004.
[18] P. Shenoy, J.R. Haritsa, S. Sudarshan, G. Bhalotia, M. Bawa, and D. Shah, "Turbo-Charging Vertical Mining of Large Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 22-33, 2000.
[19] R. Srikant and R. Agrawal, "Mining Sequential Patterns: Generalizations and Performance Improvements," Proc. Fifth Int'l Conf. Extending Database Technology: Advances in Database Technology (EDBT), pp. 3-17, 1996.
[20] J. Yang, W. Wang, P.S. Yu, and J. Han, "Mining Long Sequential Patterns in a Noisy Environment," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 406-417, 2002.
[21] Y. Zhu and D. Shasha, "StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time," Proc. 28th Int'l Conf. Very Large Data Bases (VLDB), pp. 358-369, 2002.
239 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool