Issue No. 08 - Aug. (2016 vol. 28)
ISSN: 1041-4347
pp: 2041-2056
Mei Bai , College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning, China
Junchang Xin , College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning, China
Guoren Wang , College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning, China
Luming Zhang , School of Computing, National University of Singapore, Singapore
Roger Zimmermann , School of Computing, National University of Singapore, Singapore
Ye Yuan , College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning, China
Xindong Wu , Department of Computer Science, University of Vermont, Burlington, VT 05405
ABSTRACT
A representative skyline contains $k$ skyline points that can represent its corresponding full skyline. The existing measuring criteria of $k$ representative skylines are specifically designed for static data, and they cannot effectively handle streaming data. In this paper, we focus on the problem of calculating the $k$ representative skyline over data streams. First, we propose a new criterion to choose $k$ skyline points as the $k$ representative skyline for data stream environments, termed the $k$ largest dominance skyline ( $k$ -LDS), which is representative to the entire data set and is highly stable over the streaming data. Second, we propose an efficient exact algorithm, called Prefix-based Algorithm (PBA), to solve the $k$ -LDS problem in a 2-dimensional space. The time complexity of PBA is only $\mathcal {O}((M-k)\times k)$ where $M$ is the size of the full skyline set. Third, the $k$ -LDS problem for a $d$ -dimensional ($d\ge 3$ ) space turns out to be very complex. Therefore, a greedy algorithm is designed to answer $k$ -LDS queries. To further accelerate the calculation, we propose a $\epsilon$ -greedy algorithm which can achieve an approximate factor of $\frac{1}{(1+\epsilon)}(1-\frac{1}{\sqrt{e}})$ . Experimental results on both synthetic and real-world data show that our $k$ -LDS significantly outperforms its competitors in data stream environments. Furthermore, we demonstrate that the proposed $\epsilon$ -greedy algorithm can solve $k$ -LDS efficiently and with a competitive accuracy.
INDEX TERMS
Time complexity, Greedy algorithms, Approximation algorithms, Acceleration, Algorithm design and analysis, Indexes, Decision making
CITATION

M. Bai et al., "Discovering the $k$ Representative Skyline Over a Sliding Window," in IEEE Transactions on Knowledge & Data Engineering, vol. 28, no. 8, pp. 2041-2056, 2016.
doi:10.1109/TKDE.2016.2546242
480 ms
(Ver 3.3 (11022016))