Long Beach, CA, USA
Mar. 1, 2010 to Mar. 6, 2010
Barzan Mozafari , Computer Science Department, University of California at Los Angeles, USA
Carlo Zaniolo , Computer Science Department, University of California at Los Angeles, USA
To cope with bursty arrivals of high-volume data, a DSMS has to shed load while minimizing the degradation of Quality of Service (QoS). In this paper, we show that this problem can be formalized as a classical optimization task from operations research, in ways that accommodate different requirements for multiple users, different query sensitivities to load shedding, and different penalty functions. Standard non-linear programming algorithms are adequate for non-critical situations, but for severe overloads, we propose a more efficient algorithm that runs in linear time, without compromising optimality. Our approach is applicable to a large class of queries including traditional SQL aggregates, statistical aggregates (e.g., quantiles), and data mining functions, such as k-means, naive Bayesian classifiers, decision trees, and frequent pattern discovery (where we can even specify a different error bound for each pattern). In fact, we show that these aggregate queries are special instances of a broader class of functions, that we call reciprocal-error aggregates, for which the proposed methods apply with full generality. Finally, we propose a novel architecture for supporting load shedding in an extensible system, where users can write arbitrary User Defined Aggregates (UDA), and thus confirm our analytical findings with several experiments executed on an actual DSMS.
Barzan Mozafari, Carlo Zaniolo, "Optimal load shedding with aggregates and mining queries", ICDE, 2010, 2013 IEEE 29th International Conference on Data Engineering (ICDE), 2013 IEEE 29th International Conference on Data Engineering (ICDE) 2010, pp. 76-88, doi:10.1109/ICDE.2010.5447867