2009 42nd Hawaii International Conference on System Sciences (2009)
Waikoloa, Big Island, Hawaii
Jan. 5, 2009 to Jan. 8, 2009
Within XML data streams, markup as defined e.g. in a DTD is not only being used for structuring large amounts of data, but also for efficiently searching, accessing, and processing the required parts of the data streams. However when huge amounts of XML data are involved, data reduction or compression techniques that still allow finding the required parts of the data fast may become crucial to handle data processing. We present a data reduction and compression technique for XML data streams that not only significantly reduces the amount of data, but also allows for efficient data processing without requiring a full data decompression. Our data reduction technique combines sub-tree sharing with removing structure that is known by a DTD. We have done extensive performance evaluations to compare our compression technique with other approaches to XML compression, and we show that we not only outperform the other techniques, but also outperform string compression techniques like gzip that do not support query processing on compressed data.
R. Hartel, S. Bottcher and C. Messinger, "XML Stream Data Reduction by Shared KST Signatures," 2009 42nd Hawaii International Conference on System Sciences(HICSS), Waikoloa, Big Island, Hawaii, 1899, pp. 1-10.