The Community for Technology Leaders
Green Image
Issue No. 04 - April (2010 vol. 22)
ISSN: 1041-4347
pp: 493-507
Ying Zhang , University of New South Wales and NICTA, Sydney
Xuemin Lin , University of New South Wales and NICTA, Sydney
Yidong Yuan , University of New South Wales and NICTA, Sydney
Masaru Kitsuregawa , University of Tokyo, Tokyo
Xiaofang Zhou , University of Queensland, Brisbane
Jeffrey Xu Yu , Chinese University of Hong Kong, Hong Kong
Duplicates in data streams may often be observed by the projection on a subspace and/or multiple recordings of objects. Without the uniqueness assumption on observed data elements, many conventional aggregates computation problems need to be further investigated due to their duplication-sensitive nature. In this paper, we present novel, space-efficient, one-scan algorithms to continuously maintain duplicate-insensitive order sketches so that rank-based queries can be approximately processed with a relative rank error guarantee \epsilon in the presence of data duplicates. Besides the space efficiency, the proposed algorithms are time-efficient and highly accurate. Moreover, our techniques may be immediately applied to the heavy hitter problem against distinct elements and to the existing fault-tolerant distributed communication techniques. A comprehensive performance study demonstrates that our algorithms can support real-time computation against high-speed data streams.
Order statistic, data stream, duplicate insensitive, relative error.

Y. Yuan, J. Xu Yu, X. Lin, Y. Zhang, X. Zhou and M. Kitsuregawa, "Duplicate-Insensitive Order Statistics Computation over Data Streams," in IEEE Transactions on Knowledge & Data Engineering, vol. 22, no. , pp. 493-507, 2009.
82 ms
(Ver 3.3 (11022016))