Issue No. 04 - July/August (2001 vol. 13)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/69.940731
<p><b>Abstract</b>—Various tools and systems for knowledge discovery and data mining are developed and available for applications. However, when we are immersed in heaps of databases, an immediate question is where we should start mining. It is not true that the more databases, the better for data mining. It is only true when the databases involved are relevant to a task at hand. In this paper, breaking away from the conventional data mining assumption that many databases be joined into one, we argue that the first step for multidatabase mining is to identify databases that are most likely relevant to an application; without doing so, the mining process can be lengthy, aimless, and ineffective. A measure of relevance is thus proposed for mining tasks with an objective of finding patterns or regularities about certain attributes. An efficient algorithm for identifying relevant databases is described. Experiments are conducted to verify the measure's performance and to exemplify its application.</p>
Multiple databases, data mining, query, relevance measure.
Hongjun Lu, Huan Liu, Jun Yao, "Toward Multidatabase Mining: Identifying Relevant Databases", IEEE Transactions on Knowledge & Data Engineering, vol. 13, no. , pp. 541-553, July/August 2001, doi:10.1109/69.940731