Large Data Operations in SQL Server

Delete query – 100K rows

Sequential rows

Sequential rows

Distributed rows

Distributed rows

256M server mem

Index

Table Scan

Index

Table Scan

Query time (sec)

4.8

88.52

282

41

Checkpoint time (sec)

8.4

4.52

8.4

14

Rows / sec

7,576

1,075

340

1,800

1154MB server mem

Query time (sec)

4.1

6.4

4.2

5.3

Checkpoint time (sec)

3.7

3.9

28.6

28.6

Rows /sec

12,821

9,708

3,048

2,949

The Delete plan for the same table without indexes is shown below. The component operation costs are the same as the corresponding operations in the plan discussed earlier.

The measured query and checkpoint times for a 100,000 row Delete from a 10M row, 101,012 page table are shown below.

Delete query, no index 100K rows

Sequential rows

Distributed rows

256M server mem

Table Scan

Table Scan

Query time (sec)

11.5

26

Checkpoint time (sec)

0.1

4

Rows / sec

8,621

3,300

1154MB server mem

Query time (sec)

1.9

1.5

Checkpoint time (sec)

0.2

22

Rows /sec

47,619

4,255

As expected, the Delete operation on a table without indexes (which must use a Table Scan) is faster than a Delete using the Table Scan plan that must also delete from indexes. However, the difference in cost between the two, accounting for the cost of the index delete, is not large, except for the unusual behavior in deleting from sequential rows involving a table scan from disk.

Many people have noticed that it can be faster to update or delete a large number of rows by first dropping existing indexes, update or delete from the unindexed table, and then recreating the indexes, than updating or deleting from a table with indexes in-place. It is now clear that the reason for this is that the execution plan does not account for the cost of retrieving the data pages and incorrectly uses an Index Seek plan when a Table Scan plan is much more efficient. The index build time for the two indexes used in the examples above are around 90 seconds total (30 sec for the SeqID and 60 sec for DistID). An Update or Delete with indexes in place using a hint to force a Table Scan will in most cases is faster than dropping the indexes before the Update/Delete, and then recreating the indexes afterwards.

At 256M server memory, the Delete query time is slightly longer than the Select query for sequential rows, and approximately 2.5 times longer for distributed rows. The checkpoint is negligible for sequential rows were relatively few pages are involved (1010 data and 185 index), but more substantial if many pages must be written to.

Summary

Several important observations can be made from the examination of the SQL Server execution plans, and the actual query costs for both in-memory and disk intensive configurations.

There is good reason to believe that the SQL Server execution plan cost values have units of time in seconds and are based on at least partially disk bound operations. The cost values were fixed in the SQL Server 7.0 period and have not been updated. The SQL Server optimizer under estimates current generation sequential disk transfer rates relative to random disk I/O times. When data resides in memory, the Bookmark Lookup cost per row is less expensive than the Table Scan cost per page. Query time will depend significantly on whether the affected data currently resides in memory or disk access is required. For disk intensive operations, the characteristics of the disk sequential and random I/O performance are also important. The media technology of current disk drives allows a 7200RPM drive to have similar sequential transfer performance compared to a high-end 15K drive, but only one-half the random I/O performance.

In principle, a much better execution plan for the specific circumstance can be derived if it can be determined whether the data is currently in memory. Even a less accurate estimate using recent Buffer Cache Hit Ratio and an estimate of the size of the data relative to the Target Server Memory should produce a better plan than rigidly applying fixed estimates based on system memory alone.

The most important observation is that the SQL Server Insert, Update and Delete operations may not properly account for the full I/O costs involved. This can cause Update and Delete statements to use an Index Seek plan that is substantially less efficient than a Table Scan plan. It is not necessary to drop indexes prior to large deletes, and pay the cost of recreating indexes afterwards. A simple index hint to force a Table Scan is the best solution in many cases.

]]>

Leave a comment

Your email address will not be published.