2013 IEEE 13th International Conference on Data Mining Workshops (2012)
Brussels, Belgium Belgium
Dec. 10, 2012 to Dec. 10, 2012
In this paper we explore the possibility of automatic model selection in the supervised learning framework with the use of prediction intervals. First we compare two families of non-parametric approaches of constructing prediction intervals for arbitrary regression models. The first family of approaches is based on the idea of explaining the total prediction error as a sum of the model's error and the error caused by noise inherent to the data -- the two are estimated independently. The second family assumes local similarity of the data and these approaches estimate the prediction intervals with use of the sample's nearest neighbors. The comparison shows that the first family strives to produce valid prediction intervals whereas the second family strives for optimality. We propose a statistic for model selection where we compare the discrepancy between valid and optimal prediction intervals. Experiments performed on a set of artificial datasets strongly support the hypothesis that for the correct model, this discrepancy is minimal among competing models.
Predictive models, Noise, Radio frequency, Data models, Neural networks, Computational modeling, Training, Estimation error, Machine Learning, Supervised learning, Regression analysis, Predictive models
Darko Pevec, Igor Kononenko, "Model Selection with Combining Valid and Optimal Prediction Intervals", 2013 IEEE 13th International Conference on Data Mining Workshops, vol. 00, no. , pp. 653-658, 2012, doi:10.1109/ICDMW.2012.165