Fifth International Conference on Information Technology: New Generations (itng 2008)
An Experiment Study on Text Transformation for Compression Using Stoplists and Frequent Words
April 07-April 09
ISBN: 978-0-7695-3099-4
The paper presents a new text transform algorithm suitable for embedding in compression algorithms. The strategy the new algorithm employed to increase performance of text compression is to replace words with predefined codes. Instead of using a huge dictionary containing exhaustive words as in previous works, the new algorithm uses a list of stoplists and/or frequent words. The research devised different encoding schemes for such a list. It then made experiments of using these schemes with different compression algorithms on standard texts. The result shows that each scheme gives increasing compression when using with specific compression algorithms.
Index Terms:
Text transformation, Text preprocessing, Star encoding, LPT, RLPT, SCLPT, LIPT
Citation:
Jirapond Tadrat, Veera Boonjing, "An Experiment Study on Text Transformation for Compression Using Stoplists and Frequent Words," itng, pp.709-713, Fifth International Conference on Information Technology: New Generations (itng 2008), 2008