Data Compression Conference (dcc 2008) March 25-March 27 ISBN: 978-0-7695-3121-2
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/DCC.2008.22
Burrows-Wheeler Transform (BWT) is used as the main part in block compression which has a good balance of speed and compression ratio. Suffix arrays are used in the coding phase of BWT and we focus on creating them for an alphabet larger than 256 symbols. The motivation for this work has been software project XBW-- an application for compression of large XML files using word- and syllable-based BWT. The role of BWT is to reorder input before applying other algorithms. We describe and implement three families of algorithms for encoding. The first is inspired by the work of Sadakane and further improved by Larsson. The second family includes algorithm by Seward and algorithm by Itoh further improved by Kao. Finally we present algorithm by Karkkainen and Sanders for constructing suffix arrays in linear time.
Index Terms:
suffix array sorting, Burrows-Wheeler transform, text compression, word-based compression
Citation:
Radovan ?est?, Jan L?nsk?, Michal ?emlicka, "Suffix Array for Large Alphabet," dcc, pp.543, Data Compression Conference (dcc 2008), 2008 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||