Issue No.03 - May/June (2008 vol.12)
Josh Eno , University of Arkansas
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MIC.2008.55
Synthetic data sets can be useful for repeatable regression testing and for providing realistic — but not real — data to third parties for testing new software. In some cases, it is desirable that the synthetic data set be realistic, preserving various properties of the original data. Several synthetic data generators generate data that superficially matches known characteristics of data. This paper shows how to generate data that exhibits some of the same hidden patterns that can be discovered by data mining algorithms, in particular, decision tree patterns.
Synthetic data generation, data mining, decision trees
Josh Eno, "Generating Synthetic Data to Match Data Mining Patterns", IEEE Internet Computing, vol.12, no. 3, pp. 78-82, May/June 2008, doi:10.1109/MIC.2008.55