Large Data Operations in SQL Server

At the 1154MB server memory configuration, with sufficient memory to hold the entire table and both indexes, once the data is loaded into memory, there is no disk activity on all four queries. The in-memory table scan rate of ~93,000 pages per second is the expected performance for a single 2.4GHz Xeon processor (with max degree of parallelism 1) where the cost per page of a table scan is approximately 25K CPU-cycles. Both of the Index Seek and Bookmark Lookup plans are much lower cost than the table scan when all data is in memory. The Bookmark Lookup for sequential rows is about four times faster than the table scan and the distributed rows case is just less than 3 times faster than the table scan.

The execution plan cost formula (>1GB) rates the index seek followed by a bookmark lookup to be 3.6 times more expensive (in execution time) than a table scan. The execution plan cost does not take into account whether the data is currently in memory, or whether the data can fit in memory, even though these factors have enormous impact of the cost of the query. It is reasonable for SQL Server not to take into account the location distribution of data because the SQL Server statistics data does keep this information.

When the entire data must be read from disk, the actual table scan cost, employing a sequential disk read, is 16 times faster than the index and bookmark plan using effectively random disk I/O. Compare this to the 3.6X advantage predicted by the execution plan cost formula. The 15K RPM disk used here probably has 2 times the random I/O capability of a 7200RPM drive available 8 years ago, but in the range of 8 times the sequential throughput of the older drives. It is quite possible that an older 7200 RPM disk would have produced exactly the result predicted by the execution plan.

If the number disk drives for the data partition were increased, both sequential and random I/O performance would increase by approximately the same degree initially. When there are sufficient drives to reach a sequential transfer rate of 800MB/sec (10X higher than the current transfer rate), the table scan performance should become limited by the processor speed. However, there is good reason to believe that more disk drives will continue to increase the random I/O performance to well over 10,000 I/O per sec.

Update Queries

The same four types of queries are tested for the Update statement. The first pair of queries updates a sequential block of 100,000 rows. The second pair updates 100,000 rows where each row is in a different 8KB page. A checkpoint is executed immediately after the Update statement. The combined Update and Checkpoint time determines the actual time required to write changes to disk. Even though a Query Analyzer session may show the statement as completed, the combined time is a more accurate assessment of the true cost.

— Sequential rows, index seek

UPDATE m SET randMoney = 1.0 FROM M3C_01 m WHERE SeqID = 91

GO

CHECKPOINT

GO

— Sequential rows, table scan

UPDATE m SET randMoney = 1.0 FROM M3C_01 m WITH(INDEX(0)) WHERE SeqID = 91

GO

CHECKPOINT

GO

— Distributed rows, index seek

UPDATE M3C_01 SET randMoney = 5000.0 WHERE DistID = 91

GO

CHECKPOINT

GO

— Distributed rows, table scan

UPDATE m SET randMoney = 1.0 FROM M3C_01 m WITH(INDEX(0)) WHERE DistID = 5

GO

CHECKPOINT

GO

The execution plan for the first pair of Update statements are shown below. Notice that the plan without hints uses an Index Seek to find the rows to be updated. The index can only identify the rows that need to be updated, but does not actually load the data page into memory. There is no indication that the first plan actually takes into account the cost for retrieving the actual data page that needs to be modified. Before SQL Server can write to a page, the page first has to be loaded into the buffer cache. The hint INDEX(0) in the second plan is a directive to use a table scan.

The Index Seek has the same cost as in the Select query. The Top and Compute Scalar operations each add 0.010000 CPU cost. The Table Update operation has 0.010068 I/O cost and a CPU cost of 0.100000 corresponding to cost of 0.0000001 per row. As discussed earlier, the execution plan shows the same Update I/O cost regardless of the number of rows or pages involved. The Index Seek cannot guarantee that the rows updated are in order.

The detail boxes below are for the Update statement using a hint to force a Table Scan. The Table Scan cost detail is the same as in the Select statement. In this case, the Top operation adds a cost of 4.809997 even though the I/O and CPU costs are the same as in the previous Update operation. The Table Update operator has the same cost structure in both cases.

Continues…

Leave a comment

Your email address will not be published.