SQL Server XML Statistics and Execution Plans
One of SQL Server’s deficiencies when using XML queries is the lack of statistics capability in the XML driver. This is surprising because the ODBC API contains a function for providing statistics on remote data sources. SQL Server defaults to the fixed assumed row count values for remote servers, which is unreasonably high for XML in a transaction processing environment. A query to a remote SQL Server however, does pass back requested statistics data.
The example below shows a stand-alone SELECT query with no WHERE clause SARG. The estimated row count from the remote scan is 10,000 rows.
The example below shows the same query except with a WHERE clause specified. There are indexes that can be used, but a filter operation is applied to reduce the estimated row count to 1000 rows.
In simple single table only XML queries, very high estimated row counts do not cause performance problems. Only the plan cost is reported as being much higher than a normal index seek for low row counts.
A more serious issue can occur if two XML tables are joined, as shown below.
The row count estimate for each individual source remains 10,000 rows. However, SQL Server apparently assumes only a single distinct value, so that the output of the join is 10M rows (10Kx10K), instead of a one-to-one join yielding 10,000 output. Note that the join type is a many-to-many merge join, which may cause performance problems in tempdb. Another possible problem is that the high plan cost may result in a parallel execution plan when then actual query involves relatively few rows.
Additional serious problems can occur in the joining XML results to normal tables, as in the following example. The SELECT query for the plan shown below has a SARG on an indexed column.