Understanding Scans And Seeks



How Do These Operations Work?

It took me some time to understand what the seek and scan operations are and how they both work.
DBA’s who have used scan operations could have noticed a note similar to this: “Scanning a index, entirely or only a range“. Yes, that is the key. There are many blogs, articles and forum replies saying “scan touches every row in the table/index whether or not it qualifies.” That is wrong! Scan is a type of operation which starts from somewhere (usually either from the beginning or the end) and scans the index until the conditions are met – in my case until the required row count is met.
The seek operation basically goes directly to a position. By reading the root and possible intermediate pages, the seek operation identifies where the data resides and retrieves that data. Contrary to the thinking of many, the number of rows searched does not make any difference in determining whether a scan or seek should be used.

Let us try a different example:
SELECT TransactionID
FROM Production.TransactionHistory

SELECT TransactionID
FROM Production.TransactionHistory
WHERE ProductID >= 1


Both queries return all rows. Still the first query uses index scan and the second uses index seek. By adding the where clause we may be able to force SQL Server to use Index seek. Still the cost is same in the execution plan. The pages read too the same.

(113443 row(s) affected)
Table 'TransactionHistory'. Scan count 1, logical reads 159, physical reads 0, read-ahead reads 0

(1 row(s) affected)

(113398 row(s) affected)
Table 'TransactionHistory'. Scan count 1, logical reads 159, physical reads 0, read-ahead reads 0

(1 row(s) affected)
Interestingly, the description given on the seek operator says this: “Scan a particular range of rows…”

Even though Microsoft decided to use the word Scan , (and I believe it is with a purpose) this article uses the word search when it comes to the actual event  to avoid confusion over the logical and physical operator.
Note that seek is not limited to a single row or page. It can occur when a particular range is searched. It is true that both seek and scan operations search the index/table, but when there is some ambiguity the scan operation is selected. On the problem I had, SQL Server had no clue as to where to start so it selected the scan operation. But when the WHERE clause is added, that ambiguity got removed leads to the selection of the seek operation.

 

Is Seek Better Than Scan?

Most of the time, the examples used for comparing seek and scan are related to usage of the right index. If the proper index in not in place, SQL Server may be forced to scan the entire table/index, to get the rows required by the query if the query expects only a few rows. By adding the proper index, the query will be able to use the query to read only the required data.
Additionally, when there is some ambiguity over the data to be read, there is a possibility of reading more pages than required. There is also a high possibility of additional operations like filter to be used along with scan operations. Due to these reasons seek is generally faster than scan. But this is not always the case – in certain scenarios scan performs as well as seek.
Even though seek is faster when the search criteria is covered in an index and the rows searched are smaller when compared to the table, scan is the best choice when a larger percentage of the data is searched.

In our last example, both queries needed to search the entire table as the minimum ProductID available in the table was 1. However, as the second query clearly defines where to start the operation, it will use the seek operation whereas due to the lack of a WHERE condition, the first query will force a scan operation. However, it didn’t make any difference. 

Users will see a similar execution plan even when another condition is added. Please note that both operators were executed only once. This could be identified with ‘Estimated number of executions’ and ‘number of executions’ in the query plan.

SELECT TransactionID
FROM Production.TransactionHistory
WHERE TransactionID >= 100000
      AND TransactionDate = ’01 Jan 2004′
ORDER BY TransactionID

SELECT TransactionID
FROM Production.TransactionHistory
WHERE TransactionDate = ’01 Jan 2004′
ORDER BY TransactionID

Please note that the new condition in the where clause is not covered by any index. That means SQL Server will need to read all pages covered by the clustered index and filter out the rows not matching the criteria. In other words, it may read and discard some pages. However, as the first query gives a clue on where to start the operation, it becomes a seek operation.

Note: Please ignore the missing index hint as it is irrelevant to the content of the article.
When there is ambiguity over where to start the search, it is called “Scan” and otherwise “Seek”. However, when there is an ambiguity over where to stop it does not impact the decision of using scan or seek.

SELECT TransactionID
FROM Production.TransactionHistory
WHERE ProductID / 3 = 0

SELECT TransactionID
FROM Production.TransactionHistory
WHERE ProductID > 1
      AND ProductID / 3 = 0


As I use an arithmetic operation here against the ProductID to verify where the search ends, I am forcing SQL Server to read all pages (in this case index page) to validate all the data. (Note: Using mathematical operators or functions used against a column  generally forces SQL Server to use a scan) The second query, even though it reads the same number of pages, uses a seek operation because of the additional condition. This shows, that even though SQL Server needs to search all data pages, if it can determine the start of the search, it will choose a seek operation.
But users can see that the number of pages read remain same

Continues…

Pages: 1 2 3




Related Articles :

  • No Related Articles Found

No comments yet... Be the first to leave a reply!

Software Reviews | Book Reviews | FAQs | Tips | Articles | Performance Tuning | Audit | BI | Clustering | Developer | Reporting | DBA | ASP.NET Ado | Views tips | | Developer FAQs | Replication Tips | OS Tips | Misc Tips | Index Tuning Tips | Hints Tips | High Availability Tips | Hardware Tips | ETL Tips | Components Tips | Configuration Tips | App Dev Tips | OLAP Tips | Admin Tips | Software Reviews | Error | Clustering FAQs | Performance Tuning FAQs | DBA FAQs |