Issue No. 06 - December (1994 vol. 6)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/69.334883
<p>Efficient algorithms for processing large volumes of data are very important both for relational and new object-oriented database systems. Many query-processing operations can be implemented using sort- or hash-based algorithms, e.g. intersections, joins, and duplicate elimination. In the early relational database systems, only sort-based algorithms were employed. In the last decade, hash-based algorithms have gained acceptance and popularity, and are often considered generally superior to sort-based algorithms such as merge-join. In this article, we compare the concepts behind sort- and hash-based query-processing algorithms and conclude that (1) many dualities exist between the two types of algorithms, (2) their costs differ mostly by percentages rather than by factors, (3) several special cases exist that favor one or the other choice, and (4) there is a strong reason why both hash- and sort-based algorithms should be available in a query-processing system. Our conclusions are supported by experiments performed using the Volcano query execution engine.</p>
query processing; sorting; file organisation; relational databases; object-oriented databases; database theory; relational database systems; object-oriented database systems; query-processing operations; sort-based algorithms; hash-based algorithms; intersections; joins; duplicate elimination; merge-join algorithm; dualities; costs; Volcano query execution engine; value matching; performance
G. Graefe, L. Shapiro and A. Linville, "Sort vs. Hash Revisited," in IEEE Transactions on Knowledge & Data Engineering, vol. 6, no. , pp. 934-944, 1994.