We propose a novel fast and space efficient linear suffix array construction algorithm (SACA) to break the performance and design bottlenecks for the existing linear SACAs. By sampling the fixed-size d-critical substrings to divide-and-conquer the problem, our new algorithm is very simple, for which a fully-functioning sample implementation is embodied in only about 100 lines of C code. The experimental results on the Canterbury and Manzini-Ferragina corpora show that our algorithm outperforms boththe K\"arkk\"ainen-Sanders (KS) and the Ko-Aluru (KA) algorithms: compared with the KS, ours can be more than twice faster and use more than 50% fewer space; compared with the KA, ours can be 9% faster and use 40% fewer space. To approach the lightweight space extreme, we further improve our linear algorithm to use an extra working space of only 0.25n+O(1) bytes to construct the suffix array for any size-n string of a constant or integer alphabet,where the characters of an integer alphabet are in [0..n-1]. Besides using less space, our lightweight linear algorithm stil lruns more than 1.5 times faster than the KS algorithm in the experiments.
Index Terms:
Suffix array, algorithm, linear complexity
Citation:
Sen Zhang, Ge Nong, "Fast and Space Efficient Linear Suffix Array Construction," dcc, pp.553, Data Compression Conference (dcc 2008), 2008