This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
July/August 2002 (vol. 4 no. 4)
pp. 31-43
Naren Ramakrishnan, Virginia Tech

Data mining has traditionally focused on the task of drawing inferences from large data sets. However, many scientific and engineering domains, such as fluid dynamics and aircraft design, are characterized by scarce data, due to the expense and complexity of associated experiments and simulations. In such data-scarce domains, it is advantageous to focus the data collection effort on only those regions deemed most important to support a particular data mining objective. This article describes a mechanism that interleaves bottom-up data mining, to uncover multilevel structures in spatial data, with top-down sampling, to clarify difficult decisions in the mining process. The mechanism exploits relevant physical properties, such as continuity, correspondence, and locality, in a unified framework. This leads to effective mining and sampling decisions that are explainable in terms of domain knowledge and data characteristics. This approach is demonstrated in two diverse applications-mining pockets in spatial data, and qualitative determination of Jordan forms of matrices.

Index Terms:
data mining, sampling, experiment design, qualitative reasoning, experimental algorithms.
Citation:
Naren Ramakrishnan, Chris Bailey-Kellogg, "Sampling Strategies for Mining in Data-Scarce Domains," Computing in Science and Engineering, vol. 4, no. 4, pp. 31-43, July-Aug. 2002, doi:10.1109/MCISE.2002.1014978
Usage of this product signifies your acceptance of the Terms of Use.