The Community for Technology Leaders
Green Image
Issue No. 08 - Aug. (2016 vol. 28)
ISSN: 1041-4347
pp: 2041-2056
Mei Bai , College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning, China
Junchang Xin , College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning, China
Guoren Wang , College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning, China
Luming Zhang , School of Computing, National University of Singapore, Singapore
Roger Zimmermann , School of Computing, National University of Singapore, Singapore
Ye Yuan , College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning, China
Xindong Wu , Department of Computer Science, University of Vermont, Burlington, VT 05405
ABSTRACT
A representative skyline contains $_$k$_$ skyline points that can represent its corresponding full skyline. The existing measuring criteria of $_$k$_$ representative skylines are specifically designed for static data, and they cannot effectively handle streaming data. In this paper, we focus on the problem of calculating the $_$k$_$ representative skyline over data streams. First, we propose a new criterion to choose $_$k$_$ skyline points as the $_$k$_$ representative skyline for data stream environments, termed the $_$k$_$ largest dominance skyline ( $_$k$_$ -LDS), which is representative to the entire data set and is highly stable over the streaming data. Second, we propose an efficient exact algorithm, called Prefix-based Algorithm (PBA), to solve the $_$k$_$ -LDS problem in a 2-dimensional space. The time complexity of PBA is only $_$\mathcal {O}((M-k)\times k)$_$ where $_$M$_$ is the size of the full skyline set. Third, the $_$k$_$ -LDS problem for a $_$d$_$ -dimensional ($_$d\ge 3$_$ ) space turns out to be very complex. Therefore, a greedy algorithm is designed to answer $_$k$_$ -LDS queries. To further accelerate the calculation, we propose a $_$\epsilon$_$ -greedy algorithm which can achieve an approximate factor of $_$\frac{1}{(1+\epsilon)}(1-\frac{1}{\sqrt{e}})$_$ . Experimental results on both synthetic and real-world data show that our $_$k$_$ -LDS significantly outperforms its competitors in data stream environments. Furthermore, we demonstrate that the proposed $_$\epsilon$_$ -greedy algorithm can solve $_$k$_$ -LDS efficiently and with a competitive accuracy.
INDEX TERMS
Time complexity, Greedy algorithms, Approximation algorithms, Acceleration, Algorithm design and analysis, Indexes, Decision making
CITATION

M. Bai et al., "Discovering the $k$ Representative Skyline Over a Sliding Window," in IEEE Transactions on Knowledge & Data Engineering, vol. 28, no. 8, pp. 2041-2056, 2016.
doi:10.1109/TKDE.2016.2546242
480 ms
(Ver 3.3 (11022016))