A First Look at Execution Plan Costs in Yukon Beta 1
Now, one might speculate on why the old bookmark lookup cost for each additional row might be rated as much more expensive that the incremental cost of scanning one page in a table scan. Recall that in the early 1990’s, memory was very expensive compared to today. A 16MB memory module cost around $600. A reasonable assumption at that time would have been that a typical database server may have only sufficient memory to cache a small fraction of the overall data. Then each row in a bookmark lookup would likely involve a disk IO. Furthermore, the bookmark lookup for each successive row in a single query probably does not involve successive rows in the table, so each successive row may involve data residing in different pages. A table scan on the other hand, could potentially make frequent use of reading entire extents (eight pages) in a single disk IO. It is possible that the physical disk IO cost depends more on the number of distinct IO operations than amount of data. If so, then this is one possible explanation for the bookmark lookup cost per incremental row relative to the table scan cost per incremental page.
The above explanation is purely speculative. However, if it were the case, then it would also suggest that the cost formulas depend on the percentage of required data residing in the buffer cache. In this case, the cost formula should depend on an indicator that reflects the likelihood of finding data in cache, such as the buffer cache hit ratio, rather than the amount memory in the system. A server with 512MB of memory and a 256MB database is much likely to have a high buffer cache hit ration than a server with 4GB of memory and a 100GB database. So it is quite easy to understand that absolute memory size alone is a poor choice on which to vary the plan cost formulas.
The SQL Server 7.0 and 2000 bookmark lookup cost formulas have been replaced by the old loop join cost formulas, which are now used for both bookmark lookups and loop joins. The cost formulas for the complete loop join have their own unusual behaviors. The loop join component itself is relatively simple formula of 0.00000418 CPU cost per row. The complexity is in the inner source (IS) index seek operation. First, there is at least three separate observed behaviors for the loop join inner source cost formula, as shown in Figure 5.
Figure 5. Known loop join cost formulas.
The first case applies to certain situations where a search argument (SARG) is specified on both tables involved in the join, where the index on the IS leads with the IS SARG, followed by the join condition with the result that the range of data required from the IS is in a small number of pages. The second case was observed where a SARG was specified only on the outer source (OS) table, the IS table has an index leading with the join condition. The third observed case applies to where a SARG was specified on both tables, as in the first case, but the inner source was not limited to small range. In all three of the above cases, the incremental cost per row for the entire loop join operation is approximately 0.00015 per row for the first 132 rows. For greater than 132 rows, the second and third cases exhibits a very sharp jump in the plan cost, by as much 30 times in going from 132 to 133 rows. There is no rational reason for a loop join involving 133 rows to be 30 times more expensive than a loop join on the same tables involving 132 rows. The first case does not exhibit the unusual jump at between 132 and 133 rows, but continues at the approximate incremental cost of 0.00015 per row. There is one more known case where the IS table is small, and the incremental cost per row is 0.0000796.
For bookmark lookup operations, the cost formula that applies is the second case, because there is no explicit SARG on the IS table, and the IS table cannot be small if many rows are involved. The net result in the change from SQL Server 7.0 and 2000 to Yukon is that the bookmark lookup cost has changed from being several times more expensive than the incremental table scan page cost to being several times less expensive, except if more than 132 rows are involved.
The table below summarizes the approximate plan costs and actual observed costs per incremental row for bookmark lookups and loop join operations, and the incremental cost per page in a table scan (not in rowlock mode). The plan costs for the table scan and loop join applies for SQL Server 7.0, 2000 and Yukon Beta 1. The bookmark lookup plan costs below applies for SQL Server 7.0 and 2000. The bookmark lookup plan costs for Yukon Beta 1 is the same as the loop join cost shown below for up to 132 rows. For 133 or more rows, the loop join plan cost per row is much higher. The actual costs below apply for SQL Server 2000. No measurements have been taken on Yukon at this time.
|Plan Cost||Actual Cost|
|Bookmark – Heap (old)||~0.00625||~11K||~13K|
|Bookmark – Clust. (old)||~0.00625||~15K||~21K|
|Loop Join (=132 rows)||~0.00015||~17K||~24K|
|Table Scan (per page)||~0.0007407||~24K||~24K|
The plan costs are in unspecified units. Actual cost are in CPU-Cycles, more precisely, time units relative the CPU clock. For the Pentium III platforms, CPU clock speeds in the range of 600 to 733MHz. For the Xeon platforms, CPU clock speeds of 2.0 and 2.4GHz were used.
The actual cost for the incremental bookmark lookup and loop join row and the incremental table scan page are not drastically different. There are differences between the Pentium III and Xeon processors. The bookmark lookup cost is moderately lower for heap organized tables than for tables with a clustered index.
The net change from SQL Server 7.0 and 2000 to Yukon Beta 1 is that the cross-over point from an index seek and bookmark lookup execution plan to a table used to be grossly underestimated to being possible overestimated, except if more than 132 rows are involved. The consolidation of the bookmark lookup operation with the loop join is in principle a valid and positive change. However, the loop join operation is in serious need of recalibration. To be more accurate, slightly different formulas should be used between the RID (heap organized tables) and clustered index seek. While this difference is not large, if the graphical execution plan makes effort to show different operations, certainly the cost formulas can also use the most appropriate values. It is also desirable if the plan cost formulas can take into account differences in the processor architecture. Another desirable feature is if the cost formula is based on the cache hit ratio rather than raw memory size. Hopefully, the final release version of Yukon will implement at least some of the necessary improvements in the plan cost formulas.
Published with the express written permission of the author. Copyright 2003 Joe Chang. All rights reserved.
Pages: 1 2