This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Toward Multidatabase Mining: Identifying Relevant Databases
July/August 2001 (vol. 13 no. 4)
pp. 541-553

Abstract—Various tools and systems for knowledge discovery and data mining are developed and available for applications. However, when we are immersed in heaps of databases, an immediate question is where we should start mining. It is not true that the more databases, the better for data mining. It is only true when the databases involved are relevant to a task at hand. In this paper, breaking away from the conventional data mining assumption that many databases be joined into one, we argue that the first step for multidatabase mining is to identify databases that are most likely relevant to an application; without doing so, the mining process can be lengthy, aimless, and ineffective. A measure of relevance is thus proposed for mining tasks with an objective of finding patterns or regularities about certain attributes. An efficient algorithm for identifying relevant databases is described. Experiments are conducted to verify the measure's performance and to exemplify its application.

[1] R. Agrawal, T. Imielinski, and A. Swami, Database Mining: A Performance Perspective IEEE Trans. Knowledge and Data Eng., vol. 5, no. 6, Dec. 1993.
[2] C.L. Blake and C.J. Merz, “UCI Repository of Machine Learning Databases,” 1998, http://www.ics.uci.edu/~mlearnMLRepository.html .
[3] L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression. Wadsworth&Brooks/Cole Advanced&Books Software, 1984.
[4] P. Clark and T. Niblett, "The CN2 Induction Algorithm," Machine Learning, vol. 3, pp. 261-283, 1989.
[5] M. Dash and H. Liu, “Feature Selection Methods for Classifications,” Intelligent Data Analysis: An Int'l J., vol. 1, no. 3, 1997 (http://www-east.elsevier.com/idafree.htm ).
[6] U.M. Fayyad, G. Piatesky-Shapiro, and P. Smith, “From Data Mining to Knowledge Discovery: An Overview,” Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, U.M. Fayyad et al., eds., pp. 1-34, 1996.
[7] U.M. Fayyad, G. Piatesky-Shapiro, and P. Smith, “From Data Mining to Knowledge Discovery: An Overview,” Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, U.M. Fayyad et al., eds., pp. 1-34, 1996.
[8] J. Han, Y. Cai, and N. Cercone, "Data-Driven Discovery of Quantitative Rules in Relational Databases," IEEE Trans. Knowledge and Data Eng., pp. 29-40, Feb. 1993.
[9] J. Han and Y. Fu, “Discovery of Multiple-Level Association Rules from Large Databases,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 420-431, Sept. 1995.
[10] J.W. Han and Y.J. Fu, "Exploration of the Power of the Attribute-Oriented Induction in Data Mining," Advances in Knowledge Discovery and Data Mining, chapter 16. MIT Press, 1995.
[11] J. Hong and C. Mao, “Incremental Discovery of Rules and Structure by Hierarchical and Parellel Clustering,” Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W.J. Frawley, eds., pp. 177-194, AAAI/The MIT Press, 1991.
[12] F. Hussain, H. Liu, E. Suzuki, and H. Lu, “Exception Rule Mining with a Relative Interestingness Measure,” Proc. Fourth Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD), 2000.
[13] M. Kamber and R. Shinghal, “Evaluating the Interestingness of Characteristic Rules,” Proc. Second Int'l Conf. Data Mining (KDD-96), pp. 263-266, 1996.
[14] K. Kira and L.A. Rendell, “The Feature Selection Problem: Traditional Methods and a New Algorithm,” Proc. 10th Nat'l Conf. Artificial Intelligence, pp. 129-134, 1992.
[15] B. Liu and W. Hsu, “Post-Analysis of Learned Rules,” Proc. 13th Nat'l Conf. Artificial Intelligence (AAAI-96), pp. 828-834, 1996.
[16] Feature Extraction, Construction and Selection: A Data Mining Perspective, H. Liu and H. Motoda, eds. Boston: Kluwer Academic, 1998.
[17] H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining. Dordrecht, The Netherlands: Kluwer Academic, 1998.
[18] H. Liu and H. Motoda, “Less Is More,” Feature Extraction, Construction and Selection: A Data Mining Perspective, H. Liu and H. Motoda, eds. Boston: Kluwer Academic, pp. 3-12, 1998.
[19] H. Liu and R. Setiono, "Feature Selection via Discretization of Numeric Attributes," IEEE Trans. Knowledge and Data Eng., vol. 9, no. 4, July/Aug. 1997.
[20] J.A. Major and J. Mangano, “Selecting among Rules Induced from a Hurricane Database,” Proc. AAAI-93 Workshop Knowledge Discovery in Databases, G. Piatetsky-Shapiro, ed., pp. 28-44, 1993.
[21] C.J. Matheus, G. Piatetsky-Shapiro, and D. McNeill, “Selecting and Reporting What Is Interesting,” Advances in Knowledge Discovery and Data Mining, U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., pp. 495-514, AAAI Press/The MIT Press, 1996.
[22] C.J. Merz and P.M. Murphy, “UCI Repository of Machine Learning Databases,” http://www.ics.uci.edu/~mlearnMLRepository.html , Dept. of Information and Computer Science, Univ. of California, Irvine, 1996.
[23] R.S. Michalski, I. Mozetic, J. Hong, and N. Lavrac, “The Multi-Purpose Incremental Learning System aq15 and Its Testing Application to Three Medical Domains,” Proc. Fifth Nat'l Conf. Artificial Intelligence, pp. 1041-1045, 1986.
[24] R. Motwani and P. Raghavan, “Randomized Algorithms,” The Computer Science and Eng. Handbook, A.B. Tucker Jr., ed., pp. 141-161, CRC Press and ACM, 1997.
[25] G. Piatetsky-Shapiro, “Discovery, Analysis, and Presentation of Strong Rules,” Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W.J. Frawley, eds., pp. 229-248, AAAI/The MIT Press, 1991.
[26] G. Piatetsky-Shapiro, C. Matheus, P. Smyth, and R. Uthurusamy, “Proc. KDD '93: Progress and Challenges,” AI Magazine, pp. 77-87, Fall 1994.
[27] J.R. Quinlan, C4.5: Programs for Machine Learning,San Mateo, Calif.: Morgan Kaufman, 1992.
[28] A. Silberschatz and A. Tuzhilin, “On Subjective Measures of Interestingness in Knowledge Discovery,” Proc. First Int'l Conf. Knowledge Discovery and Data Mining, pp. 275-281, 1995.
[29] P. Smyth and R. Goodman, "An Information Theoretic Approach to Rule Induction from Databases," IEEE Trans Knowledge and Data Eng., vol. 4, no. 4, pp. 301-316, Aug. 1992.
[30] R. Srikant and R. Agrawal, “Mining Quantitative Association Rules in Large Relational Tables,” Proc. 1996 ACM-SIGMOD Int'l Conf. Management of Data, pp. 1-12, June 1996.
[31] R. Uthurusamy, “From Data Mining to Knowledge Discovery: Current Challenges and Future Directions,” Advances in Knowledge Discovery and Data Mining, U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., pp. 561-569, AAAI Press/The MIT Press, 1996.
[32] G. Wiederhold, “Foreword: On the Barriers and Future of Knowledge Discovery,” Advances in Knowledge Discovery and Data Mining, U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., pp. vii-xi, AAAI Press/The MIT Press, 1996.
[33] R. Zembowicz and J.M. Zytkow, “From Contigency Tables to Various Forms of Knowledge in Databases,” Advances in Knowledge Discovery and Data Mining, U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., pp. 329-349, AAAI Press/The MIT Press, 1996.

Index Terms:
Multiple databases, data mining, query, relevance measure.
Citation:
Huan Liu, Hongjun Lu, Jun Yao, "Toward Multidatabase Mining: Identifying Relevant Databases," IEEE Transactions on Knowledge and Data Engineering, vol. 13, no. 4, pp. 541-553, July-Aug. 2001, doi:10.1109/69.940731
Usage of this product signifies your acceptance of the Terms of Use.