The Community for Technology Leaders
String Processing and Information Retrieval, International Symposium on (1999)
Cancun, Mexico
Sept. 21, 1999 to Sept. 24, 1999
ISBN: 0-7695-0268-7
pp: 81
Hozumi Tanaka , Tokyo Institute of Technology
Hideo Itoh , Ricoh Company, Ltd
ABSTRACT
The suffix array is a string-indexing structure and a memory efficient alternative to the suffix tree. It has many advantages for text processing. Here we propose an efficient algorithm for sorting suffixes. We call this algorithm the two-stage suffix sort. One of our ideas is to exploit the specific relationships between adjacent suffixes. Our algorithm makes it possible to use the suffix array for much larger texts and suggests new areas of application. Our experiments on several text data sets (including 514-MB Japanese newspapers) demonstrate that our algorithm is 4.5 to 6.9 times faster than Quicksort, and 2.5 to 3.6 times faster than Sadakane's algorithm, which is considered to be the fastest algorithm in previous works.
INDEX TERMS
suffix array, indexing, sorting, text search, string processing
CITATION
Hozumi Tanaka, Hideo Itoh, "An Efficient Method for in Memory Construction of Suffix Arrays", String Processing and Information Retrieval, International Symposium on, vol. 00, no. , pp. 81, 1999, doi:10.1109/SPIRE.1999.796581
88 ms
(Ver 3.3 (11022016))