Hash Match used when multiple lookups are the 2^nd best alternative

In Query 9 below, both inputs are not small enough (P ~500 rows, SOD ~120,000 rows) to make nested loops efficient. It’s interesting to see that even though Product ID is indexed, nested loops will probably not be a good choice here. The main reason is that the index on Product ID does not cover the query and OrderQty will need to be looked up for each order which sums up to ~120,000 lookups.

Query 9

Execution Plan 9

*Exercise: In the demo code, you will find the same query without the OrderQty column in the select list (Query 9b). Try to guess what join operator will the optimizer choose when the need for lookups is eliminated? Try it…

Physical vs. Logical scan order

I would also like to draw your attention to another interesting property of the execution plan. If you look at the properties of either index scan above, you will see the value ‘false’ for the scan property ‘ordered’.

This means that the storage engine is not required to follow the logical chain links between the index leaf pages that point to the ‘next’ and ‘previous’ pages. Since the order of the retrieval of the rows is of no significance to the hash match operator, the storage engine will optimize the scan performance by scanning the index pages in their physical order (by following the IAM pages). This might prove to be a significant performance gain, especially for indexes with a high level of logical fragmentation. Go back and look at this property for index scans of the execution plans of the merge examples above.

Another thing we should note about hash joins is the fact that, in contrast to both nested loops and merge operators where the join operator may immediately start outputting joined rows for further processing, no rows can be outputted when using hash match until the whole ‘blue input’ is fully hashed.

Summary

In a nutshell, we can sum up the main properties of the three physical operators available to the SQL Server optimizer when carrying out Equi-Inner-Joins.

	Nested Loops	Merge	Hash
Good choice when	Small outer input Inner input well indexed	Large pre-sorted inputs Sorting required anyway	Large inputs Inputs not well indexed
CPU consumption	Low	Low *Unless requires sorting	High
Memory usage	Low	Low *Unless requires sorting	High
Logical reads	High	Low	Low * ‘Hidden’ cost of probes
Output matched rows	Fast	Fast	Slow

I hope this short discussion of the seemingly simple SQL construct raised your curiosity. There are many more aspects and considerations to joins which were not even mentioned in this article. This is a fascinating and highly complex subject. If you are interested to dive deeper and learn more on joins and their implementation in SQL Server, I highly recommend Craig Freedman’s SQL Server Blog on MSDN. Craig has published a series of excellent, in depth articles regarding many more aspects and types of joins.

I wish to thank my Editor, Bill Griffin for the excellent editing of this article and to Michael Zilberstein for a great technical review.

]]>

Physical Join Operators in SQL Server – Hash Operator

Hash Match used when multiple lookups are the 2^nd best alternative

Physical vs. Logical scan order

Summary

Leave a comment Cancel reply

Hash Match used when multiple lookups are the 2nd best alternative

Physical vs. Logical scan order

Summary

Leave a comment Cancel reply

Hash Match used when multiple lookups are the 2^nd best alternative