
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Wee Keong Ng, Chinya V. Ravishankar, "BlockOriented Compression Techniques for Large Statistical Databases," IEEE Transactions on Knowledge and Data Engineering, vol. 9, no. 2, pp. 314328, MarchApril, 1997.  
BibTex  x  
@article{ 10.1109/69.591455, author = {Wee Keong Ng and Chinya V. Ravishankar}, title = {BlockOriented Compression Techniques for Large Statistical Databases}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {9}, number = {2}, issn = {10414347}, year = {1997}, pages = {314328}, doi = {http://doi.ieeecomputersociety.org/10.1109/69.591455}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Knowledge and Data Engineering TI  BlockOriented Compression Techniques for Large Statistical Databases IS  2 SN  10414347 SP314 EP328 EPD  314328 A1  Wee Keong Ng, A1  Chinya V. Ravishankar, PY  1997 KW  Database compression KW  data compression KW  physical organization KW  statistical database. VL  9 JA  IEEE Transactions on Knowledge and Data Engineering ER   
Abstract—Disk I/O has long been a performance bottleneck for very large databases. Database compression can be used to reduce disk I/O bandwidth requirements for large data transfers. In this paper, we explore the compression of large statistical databases and propose techniques for organizing the compressed data such that standard database operations such as retrievals, inserts, deletes and modifications are supported. We examine the applicability and performance of three methods. Two of these are adaptations of existing methods, but the third, called Tuple Differential Coding (TDC) [16], is a new method that allows conventional access mechanisms to be used with the compressed data to provide efficient access. We demonstrate how the performance of queries that involve large data transfers can be improved with these database compression techniques.
[1] P. Alsberg, "Space and Time Savings through Large Database Compression and Dynamic Restructuring," Proc. IEEE, vol. 63, pp. 1,1141,122, Aug. 1975.
[2] M.A. Bassiouni, "Data Compression in Scientific and Statistical Databases," IEEE Trans. Software Eng., vol. 11, no. 10, pp. 1,0471,058, Oct. 1985.
[3] M.A. Bassiouni and K. Hazboun, "Utilization of Character Reference Locality for Efficient Storage of Databases," Proc. Second Int'l Workshop Statistical Database Management, pp. 338344, Sept. 1983.
[4] D.S. Batory, "On Searching Transposed Files," ACM Trans. Database Systems, vol. 4, no. 4, pp. 531544, Dec. 1979.
[5] D.S. Batory, "Index Coding: A Compression Technique for Large Statistical Databases," Proc. Second Int'l Workshop Statistical Database Management, pp. 306314, Sept. 1983.
[6] T.C. Bell, J.G. Cleary, and I.H. Witten, Text Compression.Englewood Cliffs, N.J.: Prentice Hall, 1990.
[7] G.V. Cormack, "Data Compression on a Database System," Comm. ACM, vol. 28, no.12, pp. 1,3361,342, Dec. 1985.
[8] S.J. Eggers and A. Shoshani, "Efficient Access of Compressed Data. Proc. Sixth Int'l Conf. Very Large Databases, pp. 205211, 1980.
[9] S.J. Eggers, F. Olken, and A. Shoshani, "A Compression Technique for Large Statistical Databases," Proc. Seventh Int'l Conf. Very Large Data Bases, pp. 424434, 1981.
[10] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NPCompleteness.New York: W.H. Freeman, 1979.
[11] S.W. Golomb, "RunLength Encoding," IEEE Trans. Information Theory, vol. IT12, no. 3, 1966, pp. 399401.
[12] R. Katz, G. Gibson, and D. Patterson, “Disk System Architectures for High Performance Computing,” Proc. IEEE, vol. 77, no. 12, pp. 1,842–1,858, Dec. 1989.
[13] G.G. Langdon, "An Introduction to Arithmetic Coding," IBM J. Research and Development, vol. 28, no. 2, pp. 135149, 1984.
[14] J.Z. Li, D. Rotem, and H.K.T. Wong, "A New Compression Method with Fast Searching on Large Databases," Proc. 13th Int'l Conf. Very Large Databases, pp. 311318, 1987.
[15] W.K. Ng and C.V. Ravishankar, "Attribute Enumerative Coding: A Compression Technique for Tuple Data Structures," Proc. Fourth Data Compression Conf., p. 461,Snowbird, Utah, Mar.2931, 1994.
[16] W.K. Ng and C.V. Ravishankar, "Data Compression System and Method Representing Records as Differences between Sorted Domain Ordinals Representing Field Values," U.S. Patent No. 5,603,022, Feb. 1997.
[17] W.K. Ng and C.V. Ravishankar, "A Physical Storage Model for Efficient Statistical Query Processing," Proc. Seventh IEEE Int'l Working Conf. Statistical and Scientific Databases, pp. 97106,Charlottesville, Va., Sept2830, 1994.
[18] W.K. Ng and C.V. Ravishankar, "Relational Database Compression Using Augmented Vector Quantization," Proc. 11th IEEE Int'l Conf. Data Eng., pp. 540549Taipei, Taiwan, Mar.610, 1995.
[19] "Census of Population and Housing, 1990: Public Use Microdata Samples U.S.," machine readable data files prepared by the Bureau of the Census. Washington, D.C., 1992.
[20] J.J. Rissanen and G.G. Langdon, "Universal Modeling and Coding," IEEE Trans. Information Theory, vol. 27, no. 1, pp. 1223, 1981.
[21] F. Rubin, "Experiments in Text File Compression," Comm. ACM, vol. 19, no. 11, pp. 617623, Nov. 1976.
[22] D.G. Severance, "A Practitioner's Guide to Database Compression: Tutorial," Information Systems, vol. 8, no. 1, pp. 5162, 1983.
[23] C.E. Shannon, "A Mathematical Theory of Communication," Bell System Technical J., vol. 27, no. 3, pp.379423, 1948.
[24] A. Shoshani and H.K.T. Wong, “Statistical and Scientific Database Issues,” IEEE Trans. Software Eng., Oct. 1985.
[25] H. Tanaka and A. LeonGarcia, "Efficient RunLength Encodings," IEEE Trans. Information Theory, vol. 28, no. 6, pp. 880890, June 1982.
[26] T.A. Welch, "A Technique for High Performance Data Compression," Computer, vol. 17, no. 6, pp. 819, June 1984.
[27] G. Wiederhold, Database Design, Computer Science Series, second ed. New York: McGraw Hill, 1983.
[28] R.N. Williams, Adaptive Data Compression.Boston: Kluwer Academic, 1991.
[29] H.K.T. Wong, H.F Liu, F. Olken, D. Rotem, and L. Wong, "Bit Transposed Files," Proc. 11th Int'l Conf. Very Large Databases, pp. 448457, 1985.
[30] J. Ziv and A. Lempel, "A Universal Algorithm for Sequential Data Compression," IEEE Trans. Information Theory, vol. 23, no. 3, pp. 337343, 1977.
[31] J. Ziv and A. Lempel, "Compression of Individual Sequence via VariableRate Coding," IEEE Trans. Information Theory, vol. 24, no. 5, pp. 530536, 1978.