SQL Server Performance

Where does data in an index

Discussion in 'Performance Tuning for DBAs' started by rach_kai@hotmail.com, Sep 5, 2006.

  1. Hello,

    I am currently studying the architecture of indexes. However, I came to a certain point which got me confused.

    For example, I have a table with 2 columns A and b
    I create a clustered index on A

    Assume that the query optimizer will use the clustered index.
    1.For clustered index, what data does the intermediate node hold? Are they row locators or do they contain the actual value of A?

    2. Does the statement "the leaf node contains the actual data" mean that the leaf node of the clustered index is also the data page? And the index page would not contain a duplicate data page in this case?

    3. If the database engine uses a clustered index, will it always use the leaf node of the index?

    Second example, I have a table with 2 columns A and b and C
    I create a nonclustered index on A and B, no clustered index has been defined. I issue the select statement: SELECT A, B from table

    Assume that the query optimizer will use the nonclustered index.
    1. For nonclustered index, what data does the intermediate node hold? Are they row locators or do they contain the actual values of A and B? How about the leaf level? Are they row locators or can they contain the actual values of A and B. I've read articles stating that if i include all index keys in my query, then the database engine will only traverse the index and not the actual data. Does this mean that in issuing the query above, I no longer need to go to the actual page (using the row locator found in the leaf node).

    I hope I am able to state my questions clearly. These are architecture-related questions and I've found some confusing documentations so far. Anyone's expert insights will surely be of great help. Thank you!
  2. FrankKalis Moderator

    Have you already read about this topic in the SQL Server help file Books Online? It should cover all your questions. [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br /><br />--<br />Frank Kalis<br />Moderator<br />Microsoft SQL Server MVP<br />Webmaster:<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a>
  3. Hello Frank,

    Actually, yes. The SQL Books Online pretty much explain what each node contains. However, it does not explicitly state how much of the table's data can already be contained in the index (so the database engine need not go to the actual data page (in the clustered index's case) or to the row locator or clustered index key (in the nonclustered index keys).

    To make my question clearer... say I have table with column A and column A values are from 1 to 10,000,000. If I place a clustered index on A (and assume that SQL will only create one intermediate/branch node), will the nodes contain all values 1 to 10,000,000 (divided among different intermediate nodes of same level), say node1 contains (1,2,...1000)? Or will it contain only a certain sequence, say node1 (1,500,1000), in which case if I need to query a range, i'd still need to get to the data page?
  4. FrankKalis Moderator

    Here's a quote from "Inside SQL Server 2000" by Kalen Delaney:<br /><blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><br />An index consists of a tree with a root from which the navigation begins, possible intermediate index levels, and bottom-level leaf pages. You use the index to find the correct leaf page. The number of levels in an index will vary depending on the number of rows in the table and the size of the key column or columns for the index. If you create an index using a large key, fewer entries will fit on a page, so more pages (and possibly more levels) will be needed for the index. On a qualified select, update, or delete, the correct leaf page will be the lowest page of the tree in which one or more rows with the specified key or keys reside. A qualified operation is one that affects only specific rows that satisfy the conditions of a WHERE clause, as opposed to accessing the whole table. In any index, whether clustered or nonclustered, the leaf level contains every key value, in key sequence.<br /><br />Clustered Indexes<br />The leaf level of a clustered index contains the data pages, not just the index keys. Another way to say this is that the data itself is part of the clustered index. A clustered index keeps the data in a table ordered around the key. The data pages in the table are kept in a doubly linked list called the page chain. The order of pages in the page chain, and the order of rows on the data pages, is the order of the index key or keys. Deciding which key to cluster on is an important performance consideration. When the index is traversed to the leaf level, the data itself has been retrieved, not simply pointed to.<br /><br />Because the actual page chain for the data pages can be ordered in only one way, a table can have only one clustered index. The query optimizer strongly favors a clustered index because such an index allows the data to be found directly at the leaf level. Because it defines the actual order of the data, a clustered index allows especially fast access for queries looking for a range of values. The query optimizer detects that only a certain range of data pages must be scanned.<br /><br />Most tables should have a clustered index. If your table will have only one index, it generally should be clustered. Many documents describing SQL Server indexes will tell you that the clustered index physically stores the data in sorted order. This can be misleading if you think of physical storage as the disk itself. If a clustered index had to keep the data on the actual disk in a particular order, it could be prohibitively expensive to make changes. If a page got too full and had to be split in two, all the data on all the succeeding pages would have to be moved down. Sorted order in a clustered index simply means that the data page chain is in order. If SQL Server follows the page chain, it can access each row in clustered index order, but new pages can be added by simply adjusting the links in the page chain. I'll tell you more about page splitting and moving rows in Chapter 9 when I discuss data modification.<br /><br />In SQL Server 2000, all clustered indexes are unique. If you build a clustered index without specifying the unique keyword, SQL Server forces uniqueness by adding a uniqueifier to the rows when necessary. This uniqueifier is a 4-byte value added as an additional sort key to only the rows that have duplicates of their primary sort key. You'll be able to see this extra value when we look at the actual structure of index rows later in this chapter.<br /><br />Nonclustered Indexes<br />In a nonclustered index, the lowest level of the tree (the leaf level) contains a bookmark that tells SQL Server where to find the data row corresponding to the key in the index. As you saw in Chapter 3, a bookmark can take one of two forms. If the table has a clustered index, the bookmark is the clustered index key for the corresponding data row. If the table is a heap (in other words, it has no clustered index), the bookmark is a RID, which is an actual row locator in the form File#<img src='/community/emoticons/emotion-4.gif' alt=':p' />age#<img src='/community/emoticons/emotion-7.gif' alt=':S' />lot#. (In contrast, in a clustered index, the leaf page is the data page.)<br /><br />The presence or absence of a nonclustered index doesn't affect how the data pages are organized, so you're not restricted to having only one nonclustered index per table, as is the case with clustered indexes. Each table can include as many as 249 nonclustered indexes, but you'll usually want to have far fewer than that.<br /><br />When you search for data using a nonclustered index, the index is traversed and then SQL Server retrieves the record or records pointed to by the leaf-level indexes. For example, if you're looking for a data page using an index with a depth of three—a root page, one intermediate page, and the leaf page—all three index pages must be traversed. If the leaf level contains a clustered index key, all the levels of the clustered index then have to be traversed to locate the specific row. The clustered index will probably also have three levels, but in this case remember that the leaf level is the data itself. There are two additional index levels separate from the data, typically one less than the number of levels needed for a nonclustered index. The data page still must be retrieved, but because it has been exactly identified there's no need to scan the entire table. Still, it takes six logical I/O operations to get one data page. You can see that a nonclustered index is a win only if it's highly selective.<br /><br />Figure 8-2 illustrates this process without showing you the individual levels of the B-trees. I want to find the first name for the employee named Anson, and I have a nonclustered index on the last name and a clustered index on the employee ID. The nonclustered index uses the clustered keys as its bookmarks. Searching the index for Anson, SQL Server finds that the associated clustered index key is 7. It then traverses the clustered index looking for the row with a key of 7, and it finds Kim as the first name in the row I'm looking for.<br /><hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote"><br /><br />--<br />Frank Kalis<br />Moderator<br />Microsoft SQL Server MVP<br />Webmaster:<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a>
  5. Hello Frank,

    This is very helpful. The explanation from the book is very clear. =)

    Now, getting the same example from above. If I dont have a clustered index on EmployeeID and my query is plainly:

    SELECT LastName from TableA where LastName = 'Anson'

    Do i need to go to the leaf page for the row locator? Or will the index key already satisfy my query? I'm asking this in iteration of my question above "I've read articles stating that if i include all index keys in my query, then the database engine will only traverse the index and not the actual data."

    Thank you so much Frank for patiently answering my questions. =)
  6. FrankKalis Moderator

    You are speaking of a very special case of query where there is an index "covering" the query. Check out mmarovic's explanation of covering indexes here:http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=16508

    Also have a look here:http://www.sql-server-performance.com/covering_indexes.asp

    Covering indexes are probably the fastest way to retrieve data. SQL Server doesn't have to touch the data pages, but is able to satisfy a query by navigating the index. So, what you have read in other articles is true as long as we speak of covering indexes. When you change your query to

    SELECT someothercolumn, LastName FROM TableA WHERE LastName = 'Anson'

    an index alone cannot satisfy the query. So SQL Server will search the index to the leaf level, find the index key there along with a hint where to find the corresponding data. The form of that hint depends on the presence on a clustered index. If there is also a clustered index on that table, the hint is the clustering index key. If there isn't a clustered index present, the hint is the RID.

    --
    Frank Kalis
    Moderator
    Microsoft SQL Server MVP
    Webmaster:http://www.insidesql.de
  7. Hi Frank,

    Finally! It's all clear to me. This would surely help me in probing into my indexing strategies. Thanks a lot, Frank. You have been very, very, very helpful. =)
  8. FrankKalis Moderator

    Thanks. [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br /><br />Here is some more reading material:<br /<a target="_blank" href=http://support.microsoft.com/default.aspx?scid=kb;EN-US;325024>http://support.microsoft.com/default.aspx?scid=kb;EN-US;325024</a><br /<a target="_blank" href=http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql2k/html/statquery.asp>http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql2k/html/statquery.asp</a><br /><br /><br />--<br />Frank Kalis<br />Moderator<br />Microsoft SQL Server MVP<br />Webmaster:<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a>

Share This Page