loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Data Compression Conference (dcc 2008)
March 25-March 27
ISBN: 978-0-7695-3121-2
We propose a novel fast and space efficient linear suffix array construction algorithm (SACA) to break the performance and design bottlenecks for the existing linear SACAs. By sampling the fixed-size d-critical substrings to divide-and-conquer the problem, our new algorithm is very simple, for which a fully-functioning sample implementation is embodied in only about 100 lines of C code. The experimental results on the Canterbury and Manzini-Ferragina corpora show that our algorithm outperforms boththe K\"arkk\"ainen-Sanders (KS) and the Ko-Aluru (KA) algorithms: compared with the KS, ours can be more than twice faster and use more than 50% fewer space; compared with the KA, ours can be 9% faster and use 40% fewer space. To approach the lightweight space extreme, we further improve our linear algorithm to use an extra working space of only 0.25n+O(1) bytes to construct the suffix array for any size-n string of a constant or integer alphabet,where the characters of an integer alphabet are in [0..n-1]. Besides using less space, our lightweight linear algorithm stil lruns more than 1.5 times faster than the KS algorithm in the experiments.
Index Terms:
Suffix array, algorithm, linear complexity
Citation:
Sen Zhang, Ge Nong, "Fast and Space Efficient Linear Suffix Array Construction," dcc, pp.553, Data Compression Conference (dcc 2008), 2008
Usage of this product signifies your acceptance of the Terms of Use.