Speeding Up External Mergesort
April 1996 (vol. 8 no. 2)
pp. 322-332

Abstract—External mergesort is normally implemented so that each run is stored contiguously on disk and blocks of data are read exactly in the order they are needed during merging. We investigate two ideas for improving the performance of external mergesort: interleaved layout and a new reading strategy. Interleaved layout places blocks from different runs in consecutive disk addresses. This is done in the hope that interleaving will reduce seek overhead during merging. The new reading strategy precomputes the order in which data blocks are to be read according to where they are located on disk and when they are needed for merging. Extra buffer space makes it possible to read blocks in an order that reduces seek overhead, instead of reading them exactly in the order they are needed for merging. A detailed simulation model was used to compare the two layout strategies and three reading strategies. The effects of using multiple work disks were also investigated. We found that, in most cases, interleaved layout does not improve performance, but that the new reading strategy consistently performs better than double buffering and forecasting.

Index Terms:
Sorting, external sorting, mergesort, run placement, buffering strategy.
LuoQuan Zheng, Per-Åke Larson, "Speeding Up External Mergesort," IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 2, pp. 322-332, April 1996, doi:10.1109/69.494169
