First IEEE International Conference on Data Mining (ICDM'01)
Significance Tests for Patterns in Continuous Data
San Jose, California
November 29-December 02
ISBN: 0-7695-1119-8
In this paper we consider the question of uncertainty of detected patterns in data mining. In particular, we develop statistical tests for patterns found in continuous data, indicating the significance of these patterns in terms of the probability that they have occurred by chance. We examine the performance of these tests on patterns detected in several large data sets, including a data set describing the locations of earthquakes in California and another describing flow cytometry measurements on phytoplankton.