SQL Server XML Statistics and Execution Plans

Because of the very high estimated row count from the XML source, the optimizer employs an execution plan with a table scan to TableA instead of an index seek, and a hash join instead of a loop join. Both the scan and the hash join have high costs not suitable for low row count operations.
Suppose it is known that the true row count from the XML will always be a much lower number, for example 10 rows. Employing a TOP 10 in XML query yields the following plan.

The TOP n provides SQL Server with a hard row count feeding into the join, and generates the proper execution plan for the anticipated row count.

The plan below shows an INSERT in to a table using a SELECT from XML. Because of the high row count estimate, the execution plan shows Sort operators before each index.

By using the TOP clause, the optimizer knows the row count is lower, hence does not perform the Sort.
It is unclear whether this has any actual performance impact. There may a significant fixed cost for setting up the Sort operations even if only a few rows actually require sorting.

In the event it is not possible to employ the TOP clause to hint the row count to SQL Server, another option is to first insert the rows from the XML query into an intermediate table variable, or a temp table, so that the subsequent query uses the intermediate table. There is significant overhead in employing an intermediate table, which may or may not by offset in achieving an otherwise lower cost execution plan.

The intermediate table approach may be worthwhile if it can prevent the optimizer from performing a large table scan or expensive hash or many-to-many merge join operation. If only the Sort operation is eliminated, it may not be worthwhile. Both of these assumptions should be tested.

The intermediate table approach should also be considered to eliminate XML queries in cursors and While loops.

 

Continues…

Leave a comment

Your email address will not be published.