This Article 
 Bibliographic References 
 Add to: 
Speeding Up External Mergesort
April 1996 (vol. 8 no. 2)
pp. 322-332

Abstract—External mergesort is normally implemented so that each run is stored contiguously on disk and blocks of data are read exactly in the order they are needed during merging. We investigate two ideas for improving the performance of external mergesort: interleaved layout and a new reading strategy. Interleaved layout places blocks from different runs in consecutive disk addresses. This is done in the hope that interleaving will reduce seek overhead during merging. The new reading strategy precomputes the order in which data blocks are to be read according to where they are located on disk and when they are needed for merging. Extra buffer space makes it possible to read blocks in an order that reduces seek overhead, instead of reading them exactly in the order they are needed for merging. A detailed simulation model was used to compare the two layout strategies and three reading strategies. The effects of using multiple work disks were also investigated. We found that, in most cases, interleaved layout does not improve performance, but that the new reading strategy consistently performs better than double buffering and forecasting.

[1] A. Aggarwal and J. S. Vitter, The Input/Output Complexity of Sorting and related Problems Comm. ACM, vol. 31, no. 9, pp. 1116-1127, 1988.
[2] S.G. Akl, Parallel Sorting Algorithms.Orlando, Fla.: Academic Press Inc., 1985.
[3] J.L. Baer, Computer Systems Architecture, pp. 255-257.Rockville, Md.: Computer Science Press, 1980.
[4] M. Beck, D. Bitton, and W.K. Wilkinson, "Sorting Large Files on a Backend Multiprocessor," IEEE Trans. Computers, vol. 37, pp. 769-778, 1988.
[5] D. Bitton, "Design, Analysis, and Implementation of Parallel External Sorting Algorithms," PhD dissertation, Univ. of Wisconsin-Madison, TR 464, Jan. 1982.
[6] J.L. Hennessy and D.A. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann, San Mateo, Calif., 1990.
[7] B. Iyer, G. Ricard, and P. Varman,“Percentile finding algorithm for multiple sorted runs,”inProc. 15th Int. Conf. Very Large Databases, Amsterdam, The Netherlands, Aug. 1989, pp. 135–144.
[8] D. Knuth, The Art of Computer Programming, vol. 3: Sorting and Searching. Addison-Wesley, 1973.
[9] S.C. Kwan and J.L. Baer, "The I/O Performance of Multiway Mergesort and Tag Sort," IEEE Trans. Computers, vol. 34 Special Issue on Sorting, pp. 383-387, 1985.
[10] M.H. Nodine and J.S. Vitter, "Greed Sort: An Optimal External Sorting Algorithm for Multiple Disks," Dept. of Computer Science, Brown Univ., Technical Report CS-90-04, Feb. 1990.
[11] V.S. Pai and P.J. Varman, "Prefetching with Multiple Disks for External Mergesort: Simulation and Analysis," Eighth Int'l Conf. Data Eng., pp. 273-282, Feb. 1992.
[12] M.J. Quinn, "Parallel Sorting Algorithms for Tightly Coupled Multiprocessors," Parallel Computing, pp. 349-357. North-Holland, 1988.
[13] B. Salzberg, "Merging Sorted Runs Using Large Main Memory," Acta Informatica, vol. 27, pp. 195-215, 1989.
[14] B. Salzberg, A. Tsukerman, J. Gray, M. Stewart, S. Uren, and B. Vaughan, "FastSort: A Distributed Single-Input Single-Output External Sort," Proc. 1990 ACM SIGMOD Int'l Conf. Management of Data, pp. 94-101, May, 1990.
[15] P.J. Varman,S.D. Scheufler,B.R. Iyer, and G.R. Ricard,"Merging multiple lists on hierarchical-memory multiprocessors," J. Parallel and Distributed Computing, vol. 12, pp. 171-177, 1991.
[16] A.I. Verkamo, "Performance Comparison of Distributive and Mergesort as External Sorting Algorithms," J. Systems and Software, vol. 10, pp. 187-200, 1989.
[17] J.S. Vitter and E.A. Shriver, "Optimal Disk I/O with Parallel Block Transfer," Symp. Theory of Computing, pp. 159-169, May, 1990.
[18] L.Q. Zheng, "Speeding Up External Mergesort," Master's thesis, Univ. of Waterloo, 1992.

Index Terms:
Sorting, external sorting, mergesort, run placement, buffering strategy.
LuoQuan Zheng, Per-Åke Larson, "Speeding Up External Mergesort," IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 2, pp. 322-332, April 1996, doi:10.1109/69.494169
Usage of this product signifies your acceptance of the Terms of Use.