Understanding Scans And Seeks


How Do These Operations Work?

It took me some time to understand what the seek and scan operations are and how they both work.
DBA’s who have used scan operations could have noticed a note similar to this: “Scanning a index, entirely or only a range“. Yes, that is the key. There are many blogs, articles and forum replies saying “scan touches every row in the table/index whether or not it qualifies.” That is wrong! Scan is a type of operation which starts from somewhere (usually either from the beginning or the end) and scans the index until the conditions are met – in my case until the required row count is met.
The seek operation basically goes directly to a position. By reading the root and possible intermediate pages, the seek operation identifies where the data resides and retrieves that data.
Contrary to the thinking of many, the number of rows searched does not make any
difference in determining whether a scan or seek should be used.

Let us try a different example:
SELECT TransactionID

FROM Production.TransactionHistory

SELECT TransactionID

FROM Production.TransactionHistory

WHERE ProductID >= 1

Both queries return all rows. Still the first query uses index scan and the
second uses index seek. By adding the where clause we may be able to force SQL
Server to use Index seek. Still the cost is same in the execution plan. The
pages read too the same.

(113443 row(s) affected)
Table 'TransactionHistory'. Scan count 1, logical reads 159, physical reads 0, read-ahead reads 0

(1 row(s) affected)

(113398 row(s) affected)
Table 'TransactionHistory'. Scan count 1, logical reads 159, physical reads 0, read-ahead reads 0

(1 row(s) affected)

Interestingly, the description given on the seek operator says this: “Scan
a particular range of rows…”

Even though Microsoft decided to use the word Scan , (and I believe it is with a
purpose) this article uses the word search when it comes to the actual event 
to avoid confusion over the logical and physical operator.

Note that seek is not limited to a single row or page. It can occur when a
particular range is searched. It is true that both seek and scan operations
search the index/table, but when there is some ambiguity the scan operation is
selected. On the problem I had, SQL Server had no clue as to where to start so
it selected the scan operation. But when the WHERE clause is added, that
ambiguity got removed leads to the selection of the seek operation.

 

Is Seek Better Than Scan?

Most of the time, the examples used for comparing seek and scan are related to usage of the right index. If the proper index in not in place, SQL Server may be forced to scan the entire table/index, to get the rows required by the query if the query expects only a few rows. By adding the proper index, the query will be able to use the query to read only the required data.
Additionally, when there is some ambiguity over the data to be read, there is a possibility of reading more pages than required. There is also a high possibility of additional operations like filter to be used along with scan operations. Due to these reasons seek is generally faster than scan. But this is not always the case – in certain scenarios scan performs as well as seek.
Even though seek is faster when the search criteria is covered in an index and the rows searched are smaller when compared to the table, scan is the best choice when a larger percentage of the data is searched.

In our last example, both queries needed to search
the entire table as the minimum ProductID available in the table was 1. However,
as the second query clearly defines where to start the operation, it will use
the seek operation whereas due to the lack of a WHERE condition, the first query
will force a scan operation. However, it didn’t make any difference. 

Users will see a similar execution plan even when another condition is added. Please note that both operators were executed only once. This could be identified with ‘Estimated number of executions’ and ‘number of executions’ in the query plan.

SELECT TransactionID
FROM Production.TransactionHistory
WHERE TransactionID >= 100000
      AND TransactionDate = ’01 Jan 2004′
ORDER BY TransactionID

SELECT TransactionID
FROM Production.TransactionHistory
WHERE TransactionDate = ’01 Jan 2004′
ORDER BY TransactionID

Please note that the new condition in the where clause is not covered by any
index. That means SQL Server will need to read all pages covered by the
clustered index and filter out the rows not matching the criteria. In other
words, it may read and discard some pages. However, as the first query gives a
clue on where to start the operation, it becomes a seek operation.

Note: Please ignore the missing index hint as it is
irrelevant to the content of the article.

When there is ambiguity over where to start the search, it is called “Scan” and
otherwise “Seek”. However, when there is an ambiguity over where to stop it does
not impact the decision of using scan or seek.

SELECT TransactionID

FROM Production.TransactionHistory

WHERE ProductID / 3
= 0

SELECT TransactionID

FROM Production.TransactionHistory

WHERE ProductID > 1

      AND ProductID / 3
= 0


As I use an arithmetic operation here against the ProductID to verify where the
search ends, I am forcing SQL Server to read all pages (in this case index page)
to validate all the data. (Note: Using mathematical operators or functions used
against a column  generally forces SQL Server to use a scan) The second
query, even though it reads the same number of pages, uses a seek operation
because of the additional condition. This shows, that even though SQL Server
needs to search all data pages, if it can determine the start of the search, it
will choose a seek operation.

But users can see that the number of pages read remain same

Continues…

Leave a comment

Your email address will not be published.