Physical Join Operators in SQL Server – Nested Loops
SQL Server implements three different physical operators to perform joins. In this article series we will examine how each operators works, its advantages and challenges. We will try to understand the logic behind the optimizer’s decisions on which operator to use for various joins using (semi) real life examples and how to avoid common pitfalls.
I’m sure you are all very familiar with T-SQL joins so I’ll skip the introductory part this time. For this article, I’ll focus on the most common type of join, the ‘E-I-J’ (or ‘Equi-Inner-Join’) which is just a cool sounding name for an inner join that uses only equality predicates for the join conditions. With sadness, I will skip some extremely interesting issues concerning joins such as logical processing order of outer joins, NULL value issues, join parallelism and others. This article focuses on helping you make your E-I-J joins faster. We’ll accomplish this by helping you understand how the SQL Server optimizer decides which physical operators it will use to carry out your query joins. We’ll show you situations where the SQL Server optimizer is occasionally tricked into choosing a slower method by the characteristics of the data and SQL you wrote. If you understand these pitfalls, you can code to overcome them, thus speeding up your queries – and impressing your colleagues and bosses.
As amazing as the SQL Server optimizer can sometimes be, it’s not making its decisions intuitively. It’s using the information it has to work with. Recall that a join is basically ‘looking for matching rows’ from two inputs, based on the join condition. The creators of the optimizer have made three techniques available to it for carrying out these joins:
- Nested Loops
- Merge Hash
For this article series I will use an analogy of two sets of standard playing cards. One set with a blue back and another with a red back that need to be ‘joined’ together according to various join conditions. The two sets of cards simply represent the rows from the two joined tables, the red input and the blue input - “Now Neo – which input do you choose”?
Probably the simplest and most obvious operator is the nested loops operator. The algorithm for this operator is as follows:
The ‘outer loop’ consists of going through all rows from the blue input and for each row; some mysterious ‘inner operator’ is performed to find the matching rows from the red input. If we would use this operator to join our sets of cards, we would browse through all the blue-back cards and for each one we would have to go through all the red-back cards to find its matches. Not the most efficient way of matching cards but it probably would be most people’s first choice… if there are only a handful of cards to match.
Let’s take a closer look and see when nested loops are a good choice. The first parameter to consider, as with any iterative loop, is the number of required iterations. For nested loops to be efficient, it requires at least one relatively small row set to be used as the outer loop input that determines the number of iterations. Remember that this does not necessarily reflect the number of rows in the table but the number of rows that satisfy the query filters.
The second parameter to consider is the (intentionally) vague “Find matching rows in red input” part in Flow Chart 1 above. Assuming that we do have a small input for the outer loop, hence a small number of required iterations, we now must consider how much work is required to actually find the matching rows in every iteration. ”Find matching rows” might consist of a highly efficient index seek but it might require a full table scan if the join columns are not properly indexed; making nested loops a far less optimal plan. Since it is very common for joins to be performed on one-to-many relationships, and since the parent node in these relationships is often the smaller input (the ‘one’ in the ‘one-to-many’), and since this parent node is always indexed (must be a primary key or a unique constraint), the well known best practice rule that requires that FK columns should be indexed, now makes perfect sense, doesn’t it?
Let’s see a few examples:
Small outer loop with a well indexed inner input
The most obvious case for nested loops is when one input is very small (is 1 row small enough?) and the other input is ideally indexed for the join. In Execution Plan 1 you can see that the optimizer retrieved the single row for product 870 from the Product table using a PK clustered index seek (needed to retrieve the Product Name) and all the matching Order IDs were retrieved using the non clustered index on the Product ID column of the Order Detail table.