This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Efficient Algorithms for the Inverse Sort Transform
November 2007 (vol. 56 no. 11)
pp. 1564-1574
Ge Nong, IEEE
Sen Zhang, IEEE
As an important variant of the Burrows-WheelerTransform (BWT), the Sort Transform (ST) can speed up thetransformation by sorting only a portion of the matrix. However,because the currently known inverse ST algorithms need toretrieve the complete k-order contexts and use hash tables, theyare less efficient than the inverse BWT. In this paper, we proposethree fast and memory-efficient inverse ST algorithms. The firstalgorithm uses two auxiliary vectors to replace the hash tables.The algorithm achieves O(kN) time and space complexities for atext of N characters under the context order k. The second usestwo additional compact "alternate vectors" to further eliminatethe need to restore all the k-order contexts and achieve O(N)space complexity. And the third uses a "doubling technique" tofurther reduce the time complexity to O(N log2 k). The hallmarkof these three algorithms is that they can invert ST in a mannersimilar to inverting BWT in that they all make use of precalculatedauxiliary mapping vectors and require no hash tables.These unifying algorithms can also better explain the connectionbetween the BWT and the ST: their forward components can notonly be performed by the same algorithm framework, but theirrespective inverse components can also be efficiently conductedby the unifying algorithm framework proposed in the presentwork.

[1] M. Burrows and D.J. Wheeler, “A Block-Sorting Lossless Data Compression Algorithm,” SRC Research Report 124, Digital Systems Research Center, Calif., May 1994.
[2] D. Adjeroh et al., “DNA Sequence Compression Using the Burrows-Wheeler Transform,” Proc. IEEE CS Bioinformatics Conf., pp. 303-313, Aug. 2002.
[3] Z. Arnavut, “Lossless and Near-Lossless Compression of ECG Signals,” Proc. 23rd Ann. Int'l Conf. IEEE Eng. in Medicine and Biology Soc., vol. 3, pp. 2146-2149, Oct. 2001.
[4] A. Mukherjee et al., “Prototyping of Efficient Hardware Algorithms for Data Compression in Future Communication Systems,” Proc. 12th Int'l Workshop Rapid System Prototyping (RSP '01), pp. 58-63, June 2001.
[5] M. Schindler and B. Sebastian, “Image Compression Using Blocksort,” Proc. Data Compression Conf. (DCC '01), p. 515, Mar. 2001.
[6] Z. Arnavut, “Lossless Compression of Color-Mapped Images,” Optical Eng., vol. 38, no. 6, pp. 1001-1005, June 1999.
[7] B. Balkenhol and S. Kurtz, “Universal Data Compression Based on the Burrows-Wheeler Transformation: Theory and Practice,” IEEE Trans. Computers, vol. 49, no. 10, pp. 1043-1053, Oct. 2000.
[8] B. Balkenhol, S. Kurtz, and Y.M. Shtarkov, “Modifications of the Burrows and Wheeler Data Compression Algorithm,” Proc. Data Compression Conf. (DCC '99), pp. 188-197, Mar. 1999.
[9] G. Manzini, “The Burrows-Wheeler Transform: Theory and Practice,” Proc. 24th Int'l Symp. Math. Foundations of Computer Science (MFCS '99), pp. 34-47, Sept. 1999.
[10] G. Manzini, “An Analysis of the Burrows-Wheeler Transform,” J.ACM, vol. 48, no. 3, pp. 407-430, May 2001.
[11] Z. Arnavut and M. Arnavut, “Investigation of Block-Sorting of Multiset Permutations,” Int'l J. Computer Math., vol. 81, no. 10, pp.1213-1222, Oct. 2004.
[12] Z. Arnavut, “Generalization of the BWT Transformation and Inversion Ranks,” Proc. Data Compression Conf. (DCC '02), p. 447, Apr. 2002.
[13] H. Yokoo, “Notes on Block-Sorting Data Compression,” Electronics and Comm. in Japan (Part III: Fundamental Electronic Science), vol. 82, no. 6, pp. 18-25, 1999.
[14] K. Sadakane, “A Fast Algorithm for Making Suffix Arrays and for Burrows-Wheeler Transformation,” Proc. Data Compression Conf. (DCC '98), pp. 129-138, Mar. 1998.
[15] D. Baron and Y. Bresler, “Antisequential Suffix Sorting for BWT-Based Data Compression,” IEEE Trans. Computers, vol. 54, no. 4, pp. 385-397, Apr. 2005.
[16] J. Karkkainen and P. Sanders, “Simple Linear Work Suffix Array Construction,” Proc. 30th Int'l Colloquium on Automata, Languages, and Programming (ICALP '03), pp. 943-955, 2003.
[17] D.K. Kim, J.S. Sim, H. Park, and K. Park, “Linear-Time Construction of Suffix Arrays,” Proc. 14th Ann. Symp. Combinatorial Pattern Matching, pp. 186-199, 2003.
[18] P. Ko and S. Aluru, “Space Efficient Linear Time Construction of Suffix Arrays,” Proc. 14th Ann. Symp. Combinatorial Pattern Matching, pp. 200-210, 2003.
[19] N.J. Larsson, “The Context Trees of Block Sorting Compression,” Proc. Data Compression Conf. (DCC '98), pp. 189-198, Mar. 1998.
[20] R.M. Karp, R.E. Miller, and A.L. Rosenberg, “Rapid Identification of Repeated Patterns in Strings,” Proc. Fourth ACM Symp. Theory of Computing, pp. 125-136, 1972.
[21] F. Hongo and H. Yokoo, “Block-Sorting Data Compression and KMR Algorithm,” Proc. 20th Symp. Information Theory and Its Applications, pp. 673-676, 1997.
[22] S.J. Puglisi, W.F. Smyth, and A. Turpin, “The Performance of Linear Time Suffix Sorting Algorithms,” Proc. Data Compression Conf. (DCC '05), pp. 358-367, Mar. 2005.
[23] M. Schindler, “A Fast Block-Sorting Algorithm for Lossless Data Compression,” Proc. Data Compression Conf. (DCC '97), p. 469, Mar. 1997.
[24] M. Schindler, “Method and Apparatus for Sorting Data Blocks,” US patent 6,199,064, Mar. 2001.
[25] M. Schindler, Szip Homepage, http://www.compressconsult. comszip/, 2007.
[26] I. Witten and T. Bell, Calgary Text Compression Corpus, ftp://ftp.cpsc.ucalgary.ca/pub/projectstext.compression.corpus /, 2007.
[27] T.B.R. Arnold, “A Corpus for the Evaluation of Lossless Compression Algorithms,” Proc. Data Compression Conf. (DCC '97), pp. 201-210, , Mar. 1997.
[28] G. Nong, and S. Zhang, “Unifying the Burrows-Wheeler and the Schindler Transforms,” Proc. Data Compression Conf. (DCC '06), p.464, Mar. 2006.
[29] G. Nong and S. Zhang, “An Efficient Algorithm for the Inverse ST Problem,” Proc. Data Compression Conf. (DCC '07), p. 397, Mar. 2007.
[30] S. Deorowicz, “Second Step Algorithms in the Burrows-Wheeler Compression Algorithm,” Software—Practice and Experience, vol. 32, no. 2, pp. 99-111, Feb. 2002.
[31] U. Manber and G. Myers, “Suffix Arrays: A New Method for On-Line String Searches,” Proc. First ACM-SIAM Symp. Discrete Algorithms, pp. 319-327, 1990.

Index Terms:
Burrows-Wheeler transform, inverse sort transform, limit-order contexts, algorithm design, data compression.
Citation:
Ge Nong, Sen Zhang, "Efficient Algorithms for the Inverse Sort Transform," IEEE Transactions on Computers, vol. 56, no. 11, pp. 1564-1574, Nov. 2007, doi:10.1109/TC.2007.70762
Usage of this product signifies your acceptance of the Terms of Use.