Physical Join Operators in SQL Server – Hash Operator

Hash Match used when multiple lookups are the 2nd best alternative

In Query 9 below, both inputs are not small enough (P ~500 rows, SOD ~120,000 rows) to make nested loops efficient. It’s interesting to see that even though Product ID is indexed, nested loops will probably not be a good choice here. The main reason is that the index on Product ID does not cover the query and OrderQty will need to be looked up for each order which sums up to ~120,000 lookups.

Text Box: SELECT	SOD.SalesOrderID,
            P.Name,
            P.color,
            SOD.OrderQty
            FROM		Production.Product AS P
            INNER JOIN
            Sales.SalesOrderDetail AS SOD
            ON P.ProductID = SOD.ProductID

 

 


Query 9

Execution Plan 9

*Exercise: In the demo code, you will find the same query without the OrderQty column in the select list (Query 9b). Try to guess what join operator will the optimizer choose when the need for lookups is eliminated? Try it…


Physical vs. Logical scan order

I would also like to draw your attention to another interesting property of the execution plan. If you look at the properties of either index scan above, you will see the value ‘false’ for the scan property ‘ordered’.

This means that the storage engine is not required to follow the logical chain links between the index leaf pages that point to the ‘next’ and ‘previous’ pages. Since the order of the retrieval of the rows is of no significance to the hash match operator, the storage engine will optimize the scan performance by scanning the index pages in their physical order (by following the IAM pages). This might prove to be a significant performance gain, especially for indexes with a high level of logical fragmentation. Go back and look at this property for index scans of the execution plans of the merge examples above.

Another thing we should note about hash joins is the fact that, in contrast to both nested loops and merge operators where the join operator may immediately start outputting joined rows for further processing, no rows can be outputted when using hash match until the whole ‘blue input’ is fully hashed.

 

Summary

In a nutshell, we can sum up the main properties of the three physical operators available to the SQL Server optimizer when carrying out Equi-Inner-Joins.

 

Nested Loops

Merge

Hash

Good choice when

Small outer input

Inner input well indexed

Large pre-sorted inputs

Sorting required anyway

Large inputs

Inputs not well indexed

CPU consumption

Low

Low

*Unless requires sorting

High

Memory usage

Low

Low

*Unless requires sorting

High

Logical reads

High

Low

Low

* ‘Hidden’ cost of probes

Output matched rows

Fast

Fast

Slow

I hope this short discussion of the seemingly simple SQL construct raised your curiosity. There are many more aspects and considerations to joins which were not even mentioned in this article. This is a fascinating and highly complex subject. If you are interested to dive deeper and learn more on joins and their implementation in SQL Server, I highly recommend Craig Freedman’s SQL Server Blog on MSDN. Craig has published a series of excellent, in depth articles regarding many more aspects and types of joins.

I wish to thank my Editor, Bill Griffin for the excellent editing of this article and to Michael Zilberstein for a great technical review.

]]>

Leave a Reply

Your email address will not be published. Required fields are marked *