loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Sixth Mexican International Conference on Computer Science (ENC'05)
Combining Structural and Textual Contexts for Compressing Semistructured Databases
Puebla, Mexico
September 26-September 30
ISBN: 0-7695-2454-0
Joaquin Adiego, Universidad de Valladolid, Valladolid, Espana
Pablo De la Fuente, Universidad de Valladolid, Valladolid, Espana
Gonzalo Navarro, Universidad de Chile, Santiago, Chile
We describe a compression technique for semistructured documents, called SCMPPM, which combines the Prediction by Partial Matching technique with Structural Contexts Model (SCM) technique. SCMPPM takes advantage of the context information usually implicit in the structure of the text. The idea is to use a separate PPM model to compress the text that lies inside each different structure type (e.g., different XML tag). The intuition is that the distribution of the texts that belong to a given structure type should be similar, and different from that of other structure types. This should allow PPM to make better predictions. We test our idea against plain PPM modelling, as well as against other structure-aware techniques. Results show that the new compression method obtains significant improvements in compression ratios.
Index Terms:
PPM, Compression Model, Semistructured Documents.
Citation:
Joaquin Adiego, Pablo De la Fuente, Gonzalo Navarro, "Combining Structural and Textual Contexts for Compressing Semistructured Databases," enc, pp.68-73, Sixth Mexican International Conference on Computer Science (ENC'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.