Physical Join Operators in SQL Server – Merge Operator
In Part I of this series on physical join operators we looked at Nested Loops. We now turn our attention to the Merge operator.
For this article series I am using an analogy of two sets of standard playing cards. One set with
a blue back and another with a red back that need to be ‘joined’
together according to various join conditions. The two sets of cards simply
represent the rows from the two joined tables, the red input and the blue input
– “Now Neo – which input do you choose”?
A merge operator can be used only when both sets of rows are
pre-sorted according to the join expression(s). For example, a Product
table index and Order Detail table index both sorted by Product ID
(recall Execution Plan 3 from the Nested Loops
articles). The algorithm is extremely
simple, elegant, and efficient:
Flowchart 2 – Merge
Since the rows are sorted according to the join
expression(s), we can immediately begin the matching process. Simply get the
first row from the blue input and the first row from the red input. If they
match, output them and continue to the next row from the red input. If not,
fetch the next row from the blue input and repeat the processes until all rows
from the blue input have been processed. If we were to join our cards this way,
we would first lay both sets on the table, sorted according to the join
condition, let’s say suit and rank. We will probably spread them one row below
the other and simply start picking the cards, moving from left to right on both
rows and matching as we progress. Of course, we could decide to sort them just
for this purpose even if they weren’t sorted to begin with.
Merge is a highly efficient operator
The merge join is probably the most efficient of all three
operators. It combines the advantage of hash match where the actual data needs
to be accessed only once with the advantages of nested loops – low CPU
consumption and enabling of fast output of matched rows for further query
processing. Moreover, it tops them both by eliminating the potential for
a-sequential page flushes. Since the days of SQL Server 6.5, I have witnessed
how the query optimizer tends to favor merge joins more and more with every new
release. In SQL 2000, we would see merge joins almost exclusively for joins
that had the appropriately pre-sorted indexes, and for queries that included an
ORDER BY clause that required the sort. In SQL 2008 the optimizer cleverly
realizes that the advantages of this join operator justify pre-sorting of one
or both inputs just for the sake of using merge in many more cases than
previous versions did.
Let’s check out a few examples that use the Merge join operator.
Merge join with Clustered Indexes
The most obvious example, as can be seen in Query
5 , is a join that uses the keys of the clustered indexes of both tables.
Since a clustered index is actually the table itself, sorted in the order of
the clustered index key(s), the clustered index covers all queries and can be
used to retrieve any (non BLOB) column in the table you specify in your SELECT
clause. Even the SELECT * in the query below will not require any additional
table lookups. But remember that there is still a penalty to pay for retrieving
all columns unnecessarily as both tables will need to be fully loaded into
memory and sent over the network.
Pages: 1 2