This Article 
 Bibliographic References 
 Add to: 
A Trie Compaction Algorithm for a Large Set of Keys
June 1996 (vol. 8 no. 3)
pp. 476-491

Abstract—A trie structure is frequently used for various applications, such as natural language dictionaries, database systems, and compilers. However, the total number of states (and transitions between them) of a trie becomes large so that space cost may not be acceptable for a huge key set. In order to resolve this disadvantage, this paper presents a new scheme, called "two-trie," that enables us to perform efficient retrievals, insertions, and deletions for the key sets. The essential idea is to construct two tries for both front and rear compressions of keys, which is similar to a DAWG (Directed Acyclic Word-Graph). The approach differs from a DAWG in that the two-trie approach presented can uniquely determine information corresponding to keys while a DAWG cannot. For an efficient implementation of the two-trie, two types of data structures are introduced. The theoretical and experimental observations show that the method presented is more practical than existing ones considering the use of dynamic key sets, storing information of keys, and compression of transitions.

[1] A.V. Aho and M.J. Corasick, “Efficient String Matching: An Aid to Bibliographic Search,” Comm. ACM, vol. 18, pp. 333-340, June 1979.
[2] A. Aho, J. Hopcroft, and J. Ullman, Data Structures and Algorithms.Reading, Mass: Addison-Wesley, 1983.
[3] A.V. Aho, R. Sethi, and J.D. Ullman, Compilers, Principles, Techniques and Tools.New York: Addison-Wesley, 1985.
[4] M. Ai-Suwaiyel and E. Horowitz, "Algorithms for Trie Compaction," ACM Trans. Database Systems, vol. 9, no. 2, pp. 243-263, 1984.
[5] J. Aoe, "A Method for Improving String Pattern Matching Machines," IEEE Trans. Software Eng., vol. 10, no. 1, pp. 116-120, 1984.
[6] J. Aoe and M. Fujikawa, "An Efficient Representation of Hierarchical Semantic Primitives—An Aid to Machine Translation Systems," Proc. Second Int'l Conf. Supercomputing, pp. 361-370, 1987.
[7] J. Aoe, "A Fast Digital Search Algorithm Using a Double-Array Structure," in Japanese, Trans. IEICE, vol. J71-D, no. 9, pp. 1,592-1,600, 1988.
[8] J. Aoe, "An Efficient Digital Search Algorithm by Using a Double-Array Structure," IEEE Trans. Software Eng., vol. 15, no. 9, pp. 1,066-1,077, 1989.
[9] J. Aoe, "A Practical Method for Compressing Sparse Matrices with Invariant Entries," Int'l J. Computer Mathematics, vol. 12, no. 2, pp. 97-111, 1982.
[10] J. Aoe, K. Morimoto, and M. Hase, "An Algorithm of Compressing Common Suffixes for Trie Structures," in Japanese, Trans. IEICE, vol. J75-D-II, no. 4, 1992.
[11] J. Aoe, K. Morimoto, and T. Sato, "An Efficient Implementation of Trie Structure," Software Practices and Experiences vol. 22, no. 9, pp. 695-721, 1992.
[12] A.W. Appel and G.J. Jacobson, "The World's Fastest Scrabble Program," Comm. ACM, vol. 31, no. 5, pp. 572-578, 1988.
[13] A. Blumer, J. Blumer, D. Haussler, and R. Mcconnel, "Complete Inverted Files for Efficient Text Retrieval and Analysis," J. ACM, vol. 34, no. 3, pp. 578-595, 1987.
[14] C.L. Lucchesi and T. Knowaltowski, "Applications of Finite Automata Representing Large Vocabularies," Software Practices and Experiences, vol. 23, no. 1, pp. 15-30, 1993.
[15] J.A. Dundas, "Implementing Dynamic Minimal-Prefix Tries," Software Practices and Experiences, vol. 21, no. 10, pp. 1,027-1,040, 1991.
[16] R.J. Enbody and H.C. Du, “Dynamic Hashing Schemes,” ACM Computing Surveys, vol. 20, no. 2, pp. 85-113, June 1988.
[17] R. Fagin, J. Nievergelt, N. Pippenger, and H.R. Strong, “Extendible Hashing—A Fast Access Method for Dynamic Files,” ACM Trans. Database Systems, vol. 4, no. 3, pp. 315-344, Sept. 1979.
[18] E. Fredkin, “Trie Memory,” Comm. ACM, vol. 3, no. 9, pp. 490-500, 1960.
[19] M.L. Fredman,J. Komlos,, and E. Szemeredi,“Storing a sparse table with O(1) worst case access time,” J. ACM, vol 31, pp. 538-544, July 1984.
[20] W.D. Jonge, A.S. Tanenbaum, and R.P. Reit, "Two Access Methods Using Compact Binary Trees," IEEE Trans. Software Eng., vol. 13, no. 7, pp. 799-810, 1987.
[21] D. Knuth, The Art of Computer Programming, vol. 3: Sorting and Searching. Addison-Wesley, 1973.
[22] W.A. Litwin, N. Roussopolulos, G. Levy, and W. Hong, "Trie Hashing with Controlled Load," IEEE Trans. Software Eng., vol. 17, no. 7, pp. 678-691, 1991.
[23] K. Maly, "Compressed Tries," Comm. ACM, vol. 19, no. 7, pp. 409-415, 1976.
[24] T. Matsukawa, J. Nakamura, and M. Nagao, "An Algorithm of Word Clustering from Co-Occurrence Data Using DM Decomposition and Statical Estimation," in Japanese, Research Report, Information Processing Society of Japan, 89-NL-72-8, 1989.
[25] K. Morimoto and J. Aoe, "A Method for Building Morphological and Co-Occurrence Dictionaries by Trie Structures," in Japanese, Research Report, Information Processing Society of Japan, 91-NL-85-3, 1991.
[26] M. Nagao, J. Tuji, A. Yamagami, and S. Tatebe, "Data-Structure of a Large Japanese Dictionary and Morphological Analysis by Using It," in Japanese, J. Information Processing Society of Japan. vol. 19, no. 6, pp. 514-521, 1978.
[27] A. Nakajima and R. Sugimura, "Japanese Morphological Analyzer with TRIE Structure Dictionary and Graph Stack for Local Ambiguity Packing," in Japanese, Proc. 39th Nat'l Conf. Information Processing Society of Japan, 1F-4, 1989.
[28] J.L. Peterson, Computer Programs for Spelling Correction, Lecture Notes in Computer Science. New York: Springer-Verlag, 1980.
[29] Longman Dictionary of Contemporary English, P. Procter, ed., Longman Group, 1984.
[30] T.A. Standish, Data Structure Techniques.Reading Mass.: Addison-Wesley, ch. 3, 1980.
[31] R.E. Tarjan and A.C. Yao, "Storing a Sparse Table," Comm. ACM, vol. 22, no. 11, pp. 606-611, 1979.

Index Terms:
Key search techniques, trie structures, digital search, key retrieval algorithm, data structure, natural language dictionaries.
Jun-ichi Aoe, Katsushi Morimoto, Masami Shishibori, Ki-Hong Park, "A Trie Compaction Algorithm for a Large Set of Keys," IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 3, pp. 476-491, June 1996, doi:10.1109/69.506713
Usage of this product signifies your acceptance of the Terms of Use.