
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Christopher Jermaine, "Online Random Shuffling of Large Database Tables," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 1, pp. 7384, January, 2007.  
BibTex  x  
@article{ 10.1109/TKDE.2007.13, author = {Christopher Jermaine}, title = {Online Random Shuffling of Large Database Tables}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {19}, number = {1}, issn = {10414347}, year = {2007}, pages = {7384}, doi = {http://doi.ieeecomputersociety.org/10.1109/TKDE.2007.13}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Knowledge and Data Engineering TI  Online Random Shuffling of Large Database Tables IS  1 SN  10414347 SP73 EP84 EPD  7384 A1  Christopher Jermaine, PY  2007 KW  Sampling methods KW  database systems. VL  19 JA  IEEE Transactions on Knowledge and Data Engineering ER   
[1] R. Motwani and P. Raghavan, Randomized Algorithms. Cambridge Univ. Press, 2004.
[2] A.R. Barron, “The Convergence in Information of Probability Density Estimators,” Proc. IEEE Symp. Information Theory, vol. 38, pp. 14371454, 1988.
[3] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An Efficient Data Clustering Method for Very Large Databases,” Proc. ACM SIGMOD Conf., pp. 103114, 1996.
[4] J.M. Hellerstein, P.J. Haas, and H.J. Wang, “Online Aggregation,” Proc. ACM SIGMOD Conf., pp. 171182, 1997.
[5] P.J. Haas and J.M. Hellerstein, “Ripple Joins for Online Aggregation,” Proc. ACM SIGMOD Conf., pp. 287298, 1999.
[6] P.J. Haas and C. Koenig, “A BiLevel Bernoulli Scheme for Database Sampling,” Proc. ACM SIGMOD Conf., pp. 275286, 2004.
[7] S. Chaudhuri, G. Das, and U. Srivastava, “Effective Use of BlockLevel Sampling in Statistics Estimation,” Proc. ACM SIGMOD Conf., pp. 287298, 2004.
[8] H. GarciaMolina, J.D. Ullman, and J. Widom, Database System Implementation. PrenticeHall, 2000.
[9] C. Jermaine, A. Datta, and E. Omiecinski, “A Novel Index Supporting High Volume Data Warehouse Insertion,” Proc. Very Large Databases Conf., pp. 235246, 1999.
[10] C. Jermaine, E. Omiecinski, and W.G. Yee, “The Partitioned Exponential File for Database Storage Management,” VLDB J., 2006.
[11] H.V. Jagadish, P.P.S. Narayan, S. Seshadri, S. Sudarshan, and R. Kanneganti, “Incremental Organization for Data Recording and Warehousing,” Proc. Very Large Databases Conf., pp. 1625, 1997.
[12] F. Olken and D. Rotem, “Random Sampling from B+ Trees,” Proc. Very Large Databases Conf., pp. 269277, 1989.
[13] F. Olken and D. Rotem, “Random Sampling from Database Files: A Survey,” Proc. Int'l Conf. Scientific and Statistical Database Management, pp. 92111, 1990.
[14] G. Antoshenkov, “Random Sampling from PseudoRanked B+Trees,” Proc. Very Large Databases Conf., pp. 375382, 1992.
[15] P.E. O'Neil, E. Cheng, D. Gawlick, and E.J. O'Neil, “The LogStructured MergeTree (LSMTree),” Acta Informatica, vol. 33, no. 4, pp. 351385, 1996.
[16] L. Arge, “The Buffer Tree: A New Technique for Optimal I/OAlgorithms (Extended Abstract),” Proc. Int'l Workshop Algorithms and Data Structures, pp. 334345, 1995.
[17] W.G. Cochran, Sampling Techniques. Wiley Series in Probability and Statistics, 1977.
[18] G. Luo, C. Ellmann, P.J. Haas, and J.F. Naughton, “A Scalable Hash Ripple Join Algorithm,” Proc. ACM SIGMOD Conf., pp. 252262, 2002.
[19] N.L. Johnson, S. Kotz, and A.W. Kemp, Univariate Discrete Distributions, second ed. John Wiley and Sons, 1994.
[20] P.S. Bradley, U.M. Fayyad, and C. Reina, “Scaling Clustering Algorithms to Large Databases,” Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining, pp. 915, 1998.
[21] T. Scheffer and S. Wrobel, “Finding the Most Interesting Patterns in a Database Quickly by Using Sequential Sampling,” J. Machine Learning Research, vol. 3, pp. 833862, 2002.
[22] C. Meek, B. Thiesson, and D. Heckerman, “The LearningCurve Sampling Method Applied to ModelBased Clustering,” J. Machine Learning Research, vol. 2, pp. 397418, 2002.
[23] P. Domingos and G. Hulten, “A General Method for Scaling Up Machine Learning Algorithms and Its Application to Clustering,” Proc. Int'l Conf. Machine Learning, pp. 106113, 2001.
[24] F.J. Provost, D. Jensen, and T. Oates, “Efficient Progressive Sampling,” Proc. Int'l Conf. Knowledge Discovery and Data Mining, pp. 2332, 1999.
[25] M. SaarTsechansky and F.J. Provost, “Active Sampling for Class Probability Estimation and Ranking,” Machine Learning, vol. 54, no. 2, pp. 153178, 2004.
[26] D.W.L. Cheung, J. Han, V. Ng, and C.Y. Wong, “Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique,” Proc. Int'l Conf. Data Eng., pp. 106114, 1996.
[27] J.X. Yu, Z. Chong, and H.L. A. Zhou, “False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams,” Proc. Very Large Databases Conf., pp. 204215, 2004.
[28] E.L. Lehmann, Testing Statistical Hypotheses. Springer Texts in Statistics, 1997.
[29] C.E. Sarndal, B. Swensson, and J. Wretman, Model Assisted Survey Sampling. Springer Series in Statistics, 2003.
[30] http://stat.fsu. edu/pubdiehard/, 2006.
[31] A. Kawaguchi, D. Lieuwen, I.S. Mumick, D. Quass, and K.A. Ross, “Concurrency Control Theory for Deferred Materialized Views,” Proc. Int'l Conf. Database Theory, pp. 306320, 1997.