Issue No.01 - January (2007 vol.19)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2007.13
Many applications require a randomized ordering of input data. Examples include algorithms for online aggregation, data mining, and various randomized algorithms. Most existing work seems to assume that accessing the records from a large database in a randomized order is not a difficult problem. However, it turns out to be extremely difficult in practice. Using existing methods, randomization is either extremely expensive at the front end (as data are loaded), or at the back end (as data are queried). This paper presents a simple file structure which supports both efficient, online random shuffling of a large database, as well as efficient online sampling or randomization of the database when it is queried. The key innovation of our method is the introduction of a small degree of carefully controlled, rigorously monitored nonrandomness into the file.
Sampling methods, database systems.
Christopher Jermaine, "Online Random Shuffling of Large Database Tables", IEEE Transactions on Knowledge & Data Engineering, vol.19, no. 1, pp. 73-84, January 2007, doi:10.1109/TKDE.2007.13