Beware of the ‘Estimation Error Devil’
One of the most common pitfalls of the query optimizer is under-estimating the number of required iterations for a nested loops operator due to (partial list):
- Statistical deviations
- Outdated statistical data
- ‘Hard to estimate’ predicates such as compound expressions, sub queries, functions or type conversions on filter columns
- Multiple predicates estimation errors
When analyzing a poorly performing query, one of the first things I do is to look at the ‘Estimated number of rows’ vs. the ‘Actual number of rows’ figures of the outer input of the nested loop operator. The outer input is the top one in the graphical query execution plan. See for example Execution Plan 4 . I’ve seen many cases where the optimizer estimated that only a few dozen iterations will be required, making nested loops operator a very good choice for the join, but in fact tens (or even hundreds) of thousands of rows satisfied the filters, causing the query to perform very badly.
For Query 4 below, the optimizer estimated that ~1.3 products will satisfy the query filters when in fact, 44 products did. This seemingly small error led the optimizer to believe that only 616 rows will be matched from the Order Detail table and that estimation made it choose the same plan as the previous example, using an index seek and additional lookups. The optimizer estimated that it will require only 616 key lookups when in fact, 20,574 were required. This might seem at first like a flaw in the optimizer, but only to the degree that mind reading is hard to program. You’ll see why in the answer to Challenge #1 at the end of this section.
*Challenge #1: Can you guess what misled the optimizer to make the estimation error? See the answer at the end of this section.
*Exercise: The demo code contains Query 4 as shown here, plus hints that force the optimizer to use nested loops, hash match or merge joins respectively (Queries 4b, 4c and 4d). Use the demo code to execute these query variations. Trace the executions with profiler to benchmark these alternatives and see if the optimizer was correct to prefer nested loops for this query. Remember that logical reads might be misleading so pay close attention to both CPU and duration when evaluating the queries’ true efficiency.
Wicked a-sequential page cache flushes
One more issue we need to consider with nested loops is the issue of sequential vs. a-sequential page access patterns. You might recall from my previous article that nested loops are characterized with a high number of logical reads, as the same data page might be accessed multiple times. For small row sets, this is usually not an issue as the pages are cached once and subsequent reads are performed in memory. But, if the system experiences memory pressure and in the common case where the outer loop consists of a large number of rows where the distribution of the data (in respect to the order of the join columns retrieved for the outer loop) within the pages is more or less random, a real performance nightmare might occur when a page is repeatedly flushed from the buffer cache only to be physically retrieved a few seconds later for retrieval of another row, for the same join! If you were the query optimizer – go with the flow for a moment – how would you avoid such a performance ‘nightmare’?
Feel free to discuss your ideas for potential solutions in the comments section.
*Challenge #1 answer: The filter consists of three predicates. If you inspected each predicate by itself (armed with the statistical distribution histograms of the values in each column), you would see that the predicates are highly selective, meaning that they filter out a large percentage of the rows as the values used are very close to the maximum values for each of the columns. Using AND logical operators for these three highly selective predicates leads the optimizer to estimate that only a few rows will satisfy all three predicates. This is correct from a statistical perspective under the assumption that there is no correlation between the predicates. But… unbeknownst to the optimizer, the values in these three predicates are actually closely related. The query filters for the products that take the longest to manufacture, making them the products with the highest cost, and naturally the ones with the highest list price. So it’s more or less the same group of products that satisfy each predicate. These and other conditions which make selectivity extremely hard to estimate are much more common in production systems than most people would think.
I want to use this opportunity to congratulate the Microsoft SQL Server optimizer team for producing an unbelievably intelligent optimization engine which is (IMHO) by far the best of its kind on the market, and it just keeps getting better with every new version. It is getting harder and harder for me to come up with such examples where I manage to make it err.
If you are interested to dive deeper and learn more on joins and their implementation in SQL Server, I highly recommend Craig Freedman’s SQL Server Blog on MSDN. Craig has published a series of excellent, in depth articles regarding many more aspects and types of joins.
Part II of this series continues with a look at the Merge Operator