SQL Server Optimizer – Under the Hood Part 1

When viewing the Execution Plan in SSMS, you can reveal the following details about this stream aggregate operator by hovering your mouse over its icon:

Figure 2 – Properties of Stream Aggregate operator [2]

Look at the Group By section at the bottom. It indicates a grouping of the output by SalesPersonID!? This seems to contradict the query’s explicit instruction to group by First and Last Name. Now look at Output List section just above it. Ah… there is a hint… The Output list, a list of SalesPersonIDs, is only a partial aggregate (“Partialagg1005”). So the Optimizer knows that the aggregation (group by) task is not yet complete.

What is it up to?

Let’s step into the Optimizer’s shoes for a minute. It does not possess human logic. Rather, it works with the facts available to it, and deduces its best approach.

What facts is it working with?

·         There are only a handful of sales persons that made orders. 17 to be precise.

Note:  The count of sales people is available from the distribution statistics on the SalesPersonID column. To confirm this count for yourself, run the following query:

-- Verify Number of Sales Person Who Had Orders
SELECT	COUNT(DISTINCT SalesPersonID)
FROM	Sales.SalesOrderHeader
GO

Teaser:    If you look at the ‘Estimated Number of rows’, you will see that the Optimizer estimated that 36 rows will be returned when in fact only 18 were returned. Do you think this is a reasonable estimation error? Can you guess why such errors happen and perhaps how to avoid them?
Quiz:    Why does the ‘Actual Number of Rows’ indicate 18 rows when we know for a fact that there are only 17 sales person that performed orders? Where did we ‘lose’ a sales person?
Feel free to share your thoughts in the comments section below.

·         The query requires only the count of the orders per sales person name, and no data about individual orders.

·         There is an index on SalesPersonID.

·         There are 31,465 orders made by these 17 people.

·         The SalesPersonID is being joined to the BusinessEntityID using an equity predicate.

·         The BusinessEntityID is the primary key of the Person table.

From these last two points, the Optimizer can safely conclude that grouping by SalesPersonID will create a super-set of any other potential grouping that will be performed on any columns from the Person table.

Now things are getting clearer.

If the SQL Server Optimizer were to follow the simplest approach, as detailed in the ‘expected steps’ list above, it would have joined all of the 31,465 rows from the SalesOrderHeader table with the Person table. But it was smart enough to recognize an opportunity to perform a colossal shortcut, and it seized it.

Pre-grouping by SalesPersonID (the Partial Aggregate) enabled the Optimizer to join only 17 output-rows from the scan on SalesOrderheader table instead of all 31,465 rows. It already had the order count for each sales person, which it easily obtained by performing an ordered scan on the SalesPersonID index. This low row count also made the nested loops operator the clear winner in the contest for best join operator, as only 17 index-seek iterations were required for the join – a huge time saver.

After the join was performed, the only remaining task was to sort and group these 17 group rows again. A child’s play. The second grouping had to be performed in case there were two different sales people (each with his/her unique SalesPersonID) who had the same first and last names. This would have (perhaps undesirably) summed together the order counts for the company’s two men named “John Smith”, and perhaps for the first time made “his” sales figures top the charts.

The second grouping (aggregation) is clearly detailed in the properties of the second Stream Aggregate operator [3] we saw in the Actual Execution Plan (Figure ).

Description: C:UsersAMI~1.DBSAppDataLocalTempSNAGHTML6d475ed.PNG

Figure 3 – Properties of Stream Aggregate operator [3]

In the Output List section of this details box, you can see the reference to a global aggregate operation (“globalagg1006”).  This indicates that the Optimizer is performing the final Group By action – that is, by the First Name and Last Name columns requested in the original query. The relationship between the partial aggregate and the global aggregate can be seen in the ‘Defined Values’ row in the properties window of the stream aggregate operator:

Description: C:UsersAMI~1.DBSAppDataLocalTempSNAGHTML1554060.PNG

Figure 4 – Full Properties Window of Stream Aggregate operator [3]

The order counts of each sales person ID are summed up into the new (sub) groups, by the sales person names.  In our example, the two groups happen to be identical.

Tip: Opening the properties window: In the graphical execution plan pane of the query window in SSMS, click to select the left ‘stream aggregate’ operator (3) and then press F4 to open its fully detailed properties window.

Makes you wonder just how much smarter the SQL Optimizer can be?  That will be explored in the second part of the “Under the Hood” series, when we’ll see even more delicious treats from the kitchen of the SQL Server query Optimizer team.

Your comments in the section below are most welcome!

]]>

Leave a comment

Your email address will not be published.