Large Data Operations in SQL Server

The net result of the execution plan cost formulas is that the default plan using an Index Seek is less expensive than the Table Scan plan by a factor of 238. This is entirely due to the difference in cost between an Index Seek for the 100,000 rows in a high-density index and a Table Scan of 101,012 pages and 10M rows. The table below shows the actual Update and the subsequent checkpoint time for 256M and 1154M configurations.

UPDATE query – 100K rows

Sequential rows

Sequential rows

Distributed rows

Distributed rows

256M server mem

Index

Table Scan

Index

Table Scan

Query time (sec)

1.3

12.6

476.6

28

Checkpoint time (sec)

0.4

0.6

14.5

8

Rows /sec

57,471

7,576

203

2,778

1154MB server mem

Query time (sec)

0.8

1.3

0.9

1.5

Checkpoint time (sec)

0.2

0.1

23

23

Rows /sec

100,000

71,429

4,184

4,082

The index plan is indeed faster when all modified rows are in adjacent pages, but is much slower if the modified rows are distributed across many pages. Since SQL Server statistics do not determine the distribution of the affected pages, the execution plan cost formula should not assume that rows are in adjacent pages. There is no expectation for rows from a common nonclustered index value to be stored in adjacent rows. SQL Server however, always chose the index plan if an index on the SARG exists because the execution plan does not take into account the cost of retrieving the data pages in a manner similar to the Bookmark Lookup operation. It is clear that the true cost does reflect the cost of retrieving the actual data row. This discrepancy could lead to a very slow plan compared to the table scan method when no indexes are present.

A simple change of the UPDATE query in the SET condition to randMoney = randMoney + 1.0 yields the plan show below. Now the default plan uses a table scan, and forcing the index on the SARG shows a plan with the bookmark lookup operation.

Delete Queries

The delete queries are shown below. The first two queries involve rows in sequential pages, and the second pair involves rows distributed across all pages as in the other examples.

— Sequential rows, index seek

DELETE M3C_01 WHERE SeqID = 91

GO

CHECKPOINT

GO

— Sequential rows, table scan

DELETE m FROM M3C_01 m WITH(INDEX(0)) WHERE SeqID = 71

GO

CHECKPOINT

GO

— Distributed rows, index seek

DELETE M3C_01 WHERE DistID = 91

GO

CHECKPOINT

GO

— Distributed rows, table scan

DELETE m FROM M3C_01 m WITH(INDEX(0)) WHERE DistID = 27

GO

CHECKPOINT

GO

Without hints, the execution plan uses an Index Seek. The I/O cost for reading the actual data page and row is not taken into account by the execution plan, similar to the Update statement. Forcing a table scan shows a plan that is much more expensive. After the Table Delete, there is Spool operation followed by Index Delete operations for the two indexes on this table.

The Index Seek and Table Scan operations are the same as the previous Select and Update queries. The Top operation in each case of the two above Delete statements are also the same as the corresponding Update statements. Unlike the Update, the Delete has no Compute Scalar operation. However, the Index Seek has cost of 0.250619, the Top add 0.010000 for a total of 0.260619, the Table Delete adds 0.110068, yet the Table Delete subtree cost is 0.380687, so somehow an additional amount of 0.010000 crept into the Table Delete subtree cost. The Table Delete operation has a fixed I/O cost 0.010068 irrespective of the number of rows and pages involved.

Continues…

Leave a comment

Your email address will not be published.