19th IEEE International Conference on Tools with Artificial Intelligence - Vol.2 (ICTAI 2007)
Mining Data with Rare Events: A Case Study
Paris, France
October 29-October 31
ISBN: 0-7695-3015-X
The performance of classification models can be nega- tively impacted if the data on which they are trained con- tains very rare events. While recent research has investi- gated the issue of class imbalance, few if any studies ad- dress issues related to the handling of extreme imbalance (rare events), where the minority class can account for as little as 0.1% of the training data. This work investigates the effect of dataset size and class distribution on classifi- cation performance when examples from the minority class are rare. In addition, we compare the performance improve- ment achieved by acquiring additional examples to that of applying data sampling. Our results demonstrate that data sampling is very effective at alleviating the problem of rare events.
Citation:
Chris Seiffert, Taghi M. Khoshgoftaar, Jason Van Hulse, Amri Napolitano, "Mining Data with Rare Events: A Case Study," ictai, vol. 2, pp.132-139, 19th IEEE International Conference on Tools with Artificial Intelligence - Vol.2 (ICTAI 2007), 2007