This Article 
 Bibliographic References 
 Add to: 
External Sorting: Run Formation Revisited
July/August 2003 (vol. 15 no. 4)
pp. 961-972
Per-?ke Larson, IEEE Computer Society

Abstract—External mergesort begins with a run formation phase creating the initial sorted runs. Run formation can be done by a load-sort-store algorithm or by replacement selection. A load-sort-store algorithm repeatedly fills available memory with input records, sorts them, and writes the result to a run file. Replacement selection produces longer runs than load-sort-store algorithms and completely overlaps sorting and I/O, but it has poor locality of reference resulting in frequent cache misses and the classical algorithm works only for fixed-length records. This paper introduces batched replacement selection: a cache-conscious version of replacement selection that works also for variable-length records. The new algorithm resembles AlphaSort in the sense that it creates small in-memory runs and merges them to form the output runs. Its performance is experimentally compared with three other run formation algorithms: classical replacement selection, Quicksort, and AlphaSort. The experiments show that batched replacement selection is considerably faster than classic replacement selection. For small records (average 100 bytes), CPU time was reduced by about 50 percent and elapsed time by 47-63 percent. It was also consistently faster than Quicksort, but it did not always outperform AlphaSort. Replacement selection produces fewer runs than Quicksort and AlphaSort. The experiments confirmed that this reduces the merge time whereas the effect on the overall sort time depends on the number of disks available.

[1] V. Estivill-Castro and D. Wood, A Survey of Adaptive Sorting Algorithms Computing Surveys, vol. 24, no. 4, pp. 441-476, 1992.
[2] V. Estivill-Castro and D. Wood, Foundations of Faster External Sorting Proc. 14th Conf. Foundations of Software Technology and Theoretical Computer Science, pp. 414-425, 1994.
[3] R.J. Dinsmore, Longer Strings for Sorting Comm. ACM, vol. 8, no. 1, p. 48, 1965.
[4] W. Dobosiewicz, Replacement Selection in 3-Level Memories The Computer J., vol. 27, no. 4, pp. 331-339, 1981.
[5] W.D. Frazer and C.K. Wong, Sorting by Natural Selection Comm. ACM, vol. 15, no. 10, pp. 910-913, 1972.
[6] D.E. Knuth, The Art of Computer Programming, Volume 1, 3rd ed. Addison-Wesley, 1997.
[7] D.E. Knuth, The Art of Computer Programming, Volume 3, 2nd ed. Addison-Wesley, 1998.
[8] A. LaMarca and R.E. Ladner, The Influence of Caches on the Performance of Heaps ACM J. Experimental Algorithmics, vol. 1, no. 4, 1996.
[9] A. LaMarca and R.E. Ladner, The Influence of Caches on the Performance of Sorting Proc. Eighth Ann. ACM-SIAM Symp. Discrete Algorithms, 1997.
[10] P.-Å. Larson and G. Graefe, Memory Management during Run Generation in External Sorting Proc. SIGMOD, pp. 472-483, 1998.
[11] C. Nyberg, T. Barclay, Z. Cvetanovic, J. Gray, and D.B. Lomet, AlphaSort: A RISC Machine Sort Proc. SIGMOD, pp. 233-242, 1994.
[12] V.S. Pai and P.J. Varman, "Prefetching with Multiple Disks for External Mergesort: Simulation and Analysis," Eighth Int'l Conf. Data Eng., pp. 273-282, Feb. 1992.
[13] B. Salzberg, Merging Sorted Runs Using Large Main Memory Acta Informatica, vol. 27, no. 3, pp. 195-215, 1989.
[14] P. Sanders, Fast Priority Queues for Cached Memory ACM J. Experimental Algorithmics, vol. 5, Aug. 2000.
[15] T.C. Ting and Y.W. Wang, Multiway Replacement Selection Sort with a Dynamic Reservoir The Computer J., vol. 20, no. 4, pp. 298-301, 1977.
[16] L. Wegner and J.I. Teuhola, The External Heapsort IEEE Trans. Software Eng., vol. 5, no. 7, pp. 917-925, July 1989.
[17] W.E. Wright, A Refinement of Replacement Selection Information Processing Letters, vol. 70, no. 3, pp. 107-111, 1999.
[18] W. Zhang and P.-Å. Larson, Dynamic Memory Adjustment for External Mergesort Proc. Very Large Data Bases Conf., pp. 376-385, 1997.
[19] W. Zhang and P.-Å. Larson, Buffering and Read-Ahead Strategies for External Mergesort Proc. Very Large Data Bases Conf., pp. 523-533, 1998.
[20] L. Zheng and P.-Å. Larson, Speeding Up External Mergesort IEEE Trans. Knowledge and Data Eng., vol. 8, no. 2, pp. 322-332, 1996.

Index Terms:
External sorting, merge sort, replacement selection, run formation.
Per-?ke Larson, "External Sorting: Run Formation Revisited," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 4, pp. 961-972, July-Aug. 2003, doi:10.1109/TKDE.2003.1209012
Usage of this product signifies your acceptance of the Terms of Use.