Composite clustered index | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Composite clustered index

Want to check my thinking with you folks… I have a table with a clustered composite index, consisting of 3 columns, which together form a unique key. For illustration, the columns are C1, C2 & C3. Distinct counts of values for columns are C1 425, C2 300,000 & C3 4,000,000 C3 is effectively number of seconds since 01/01/1970. The usage of the table is typically, insert a row, do something else, then update it. Currently, the index columns are ordered C3,C1,C2. Fill factor of 90%. My thinking is that this composite index is better ordered C1,C2,C3. My reasoning is that having C3 as the leading column, biases all the inserts towards one side of the indexes underlying B-tree, causing page splits. Also, there’ll be a bunch of "wasted" space across the tree, as the values going into C3 only ever get bigger (like an identity), so the space due to the fill factor in lower values never gets used. Monitoring has only shown inserts & updates and selects of single rows. I’ve tested the select of a single row with both configurations of the index, and the cost is identical. Welcome your thoughts. http://www.tradeco.co.uk
The general recommendation is not to make a clustered index a composite as ever non-clustered index will grow in size relative to the width of the clustered index columns involved, this is due to the space concern in the database. The statistics are based on the first column which should be you’re most unique data or you run the risk of a different index being used when the composite is more appropriate. BTW how big is the database? Satya SKJ
Microsoft SQL Server MVP
Writer, Contributing Editor & Moderator
http://www.SQL-Server-Performance.Com
This posting is provided AS IS with no rights for the sake of knowledge sharing. Knowledge is of two kinds. We know a subject ourselves or we know where we can find information on it.
Satya, Yeah, I take the point about not having composite clustered indexes. "The statistics are based on the first column which should be you’re most unique data or you run the risk of a different index being used when the composite is more appropriate." Hmmm, again, I’ve read the same, but I’ve had the same situation with another table for example, txn_date_in_seconds, county, city. Assuming (as was the case) that there are more cities than counties, when I changed the index order from txn_date_in_seconds, county, city to county, city, txn_date_in_seconds the select queries (both for single rows and for a date range) went from using index scans & aggregates to index seeks – reducing query cost by 90%. Thoughts? I wonder if there’s a misconception about what the "first" column is? Am trying to think this through in terms of B-tree searching. If the "first" part of the search is to find one vlue out of 425, then that’s a quick way of eliminating 424/425ths of the remaining tree??? http://www.tradeco.co.uk
Sry – database is 7Gb. Both tables I’m talking about have about 4 million rows. http://www.tradeco.co.uk
How quickly the rows are returned with this query? When you said changed the index order, have you recompiled & reindexed or did some sort of benchmarking testing on returning the rows. Again that change of order depends on the datatype used for City & county (which Ipresume varchar). So if the rows are volatile in this case then index might slow down and having fewer indexes is better in this case. Satya SKJ
Microsoft SQL Server MVP
Writer, Contributing Editor & Moderator
http://www.SQL-Server-Performance.Com
This posting is provided AS IS with no rights for the sake of knowledge sharing. Knowledge is of two kinds. We know a subject ourselves or we know where we can find information on it.
"How quickly the rows are returned with this query?" The second case? Almost instantaneously. "When you said changed the index order, have you recompiled & reindexed" Drop and re-create PK constraint. "of benchmarking testing on returning the rows" Ran query against each PK configuration – query plan shows 90% IO cost reduction For county and city – fixed length – char 4. http://www.tradeco.co.uk
I guess you know this one:<a target="_blank" href=http://msdn2.microsoft.com/en-us/library/ms190639.aspx>http://msdn2.microsoft.com/en-us/library/ms190639.aspx</a><br /><br />As with everything, these are only general guidelines to be verified (or not) in your own specific environment. When you find, that your second approach gives you good performance, I would go for it. [<img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ />]<br /><br />–<br />Frank Kalis<br />Microsoft SQL Server MVP<br />Contributing Editor, Writer & Forum Moderator<a target="_blank" href=http://www.sql-server-performance.com>http://www.sql-server-performance.com</a><br />Webmaster:<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a>
Yeah Frank, that’s the page that had me scratching my head. In the county, city, seconds case, I have "between" searches on the seconds column, so that page suggests seconds should be first. http://www.tradeco.co.uk
Btw, just read a little bit closer through the link.<br /><blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><br />rows with subsequent indexed values are guaranteed to be physically adjacent<br /><hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote"><br />This is NOT true. How could it? If it was true, it would mean that rows that are inserted are pushed around on disk until the engine finds their place, which in turn can change again when the next row needs to be placed on disk. Sounds to me, that SQL Server would be very busy with itself in this case and heavy INSERT activity. I also wonder how this can be true in case a table is spread on multiple disks. [<img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ />]<br /><br />To the best of my knowledge, SQL Server tries to be as adjacent as possible (at least upon reindexing), but not not at all costs. The above case would be far too costly. However, what is guaranteed to be correct is the page chain, by which records in the file can be located, even if it means that the disk heads needs to spin back and forth. Itzik Ben-Gan has posted some time ago some very interesting comments on clustered indices:<a target="_blank" href=http://www.sqlmag.com/article/articleid/92886/sql_server_blog_92886.html>http://www.sqlmag.com/article/articleid/92886/sql_server_blog_92886.html</a><br /><br />Sorry if I am drifting off-topic a bit. [<img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ />]<br /><br />–<br />Frank Kalis<br />Microsoft SQL Server MVP<br />Contributing Editor, Writer & Forum Moderator<a target="_blank" href=http://www.sql-server-performance.com>http://www.sql-server-performance.com</a><br />Webmaster:<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a>
BTW, for the non-clustered indexes, the "first column which should be you’re most unique data" applies. Tested it out for a secondary index on another date expressed in seconds on the same table (call it processed_date) http://www.tradeco.co.uk
Aah, just found the links I had in mind:
http://www.sqlskills.com/blogs/kimberly/2007/02/02/TheClusteredIndexDebateContinues.aspx
http://www.sqlskills.com/blogs/kimb…lusteringKeyTheClusteredIndexDebateagain.aspx
Frank Kalis
Microsoft SQL Server MVP
Contributing Editor, Writer & Forum Moderatorhttp://www.sql-server-performance.com
Webmaster:http://www.insidesql.de
Frank – not read those articles yet, but I guess you’re right. But in the case of non-clustered indices, it could be true for little cost – because you’d just split the index page…. http://www.tradeco.co.uk
Indexes, no doubt you will get more from Kimberly’s articles that too for SQL 2005. <br /<a target="_blank" href=http://www.sql-server-performance.com/indexes_not_equal.asp>http://www.sql-server-performance.com/indexes_not_equal.asp</a> fyi to revise basics [<img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ />].<br /><br /><b>Satya SKJ</b><br />Microsoft SQL Server MVP<br />Writer, Contributing Editor & Moderator<br /<a target="_blank" href=http://www.SQL-Server-Performance.Com>http://www.SQL-Server-Performance.Com</a><br /><center><font color="teal"><font size="1">This posting is provided AS IS with no rights for the sake of <i>knowledge sharing. <hr noshade size="1">Knowledge is of two kinds. We know a subject ourselves or we know where we can find information on it.</i></font id="size1"></font id="teal"></center>
Oh I dunno – Kimberley is melting my head <img src=’/community/emoticons/emotion-5.gif’ alt=’;)‘ /><br /><br />So I’m gonna have a look at the query plans of the county, city, seconds again, and perhaps post ’em back here.<br /><br /><br />Maybe you were having a dig at me "revise the basics".<br /><br />But I think this is the key bit from that article<br /><br />"Non-clustered indexes are great if the index is highly selective, especially if you will be returning one record"<br /><br />Well, the county, city, seconds <b>is</b> highly selective – for any given combination, only 1 row is returned.<br /><br />I think that maybe it’s the tree structure that’s the issue when it comes to a composite index.<br /><br /><br />edit – actually, I missed the point – that article discusses non-clustered indexes – mine’s clustered.<br /><br /><br /><br /<a target="_blank" href=http://www.tradeco.co.uk>http://www.tradeco.co.uk</a>
[<img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ />] Hm, I think I have a good question to ask Kimberly in Tech-ed by taking your thread as an example.<br /><br /><b>Satya SKJ</b><br />Microsoft SQL Server MVP<br />Writer, Contributing Editor & Moderator<br /<a target="_blank" href=http://www.SQL-Server-Performance.Com>http://www.SQL-Server-Performance.Com</a><br /><center><font color="teal"><font size="1">This posting is provided AS IS with no rights for the sake of <i>knowledge sharing. <hr noshade size="1">Knowledge is of two kinds. We know a subject ourselves or we know where we can find information on it.</i></font id="size1"></font id="teal"></center>
Ahhhhhh!!!!! Right, I’ve been misleading. In the county, city, seconds example, the original index of txn_date_in_seconds, county, city was not clustered. I changed it to clustered, and the performance was worse. So then I changed the order to county, city, txn_date_in_seconds with it clustered, and the performance was good. I didn’t do a non-clustered clustered direct comparison (am working on that now) http://www.tradeco.co.uk
Please keep us informed here. I’m always interested in such stuff. [<img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ />]<br /><br />–<br />Frank Kalis<br />Microsoft SQL Server MVP<br />Contributing Editor, Writer & Forum Moderator<a target="_blank" href=http://www.sql-server-performance.com>http://www.sql-server-performance.com</a><br />Webmaster:<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a>
quote:Originally posted by twoboats Ahhhhhh!!!!! Right, I’ve been misleading.
In the county, city, seconds example, the original index of
txn_date_in_seconds, county, city
was not clustered.
I changed it to clustered, and the performance was worse.
So then I changed the order to
county, city, txn_date_in_seconds
with it clustered, and the performance was good.
I didn’t do a non-clustered clustered direct comparison (am working on that now)
http://www.tradeco.co.uk

It might be because of the way data is sorted…
There are only <500 countries in the world and each country will have few duplicate city names then seconds… If you do the opposite you may need to search more pages/extents….
If the clustered index is unique you should not get the different times…
Is Clustered index is unique? MohammedU.
Moderator
SQL-Server-Performance.com All postings are provided “AS IS” with no warranties for accuracy.

I would suggest to compare the queries with that changeover of clustered & non-clustered using query hints with the "Display Estimated Execution Plan." Overall it depends upon the statistics built upon the columns and in this case you might need to recreate the stats by rebuilding that index or update stats, though not in the case of SQL 2005 anyway. Satya SKJ
Microsoft SQL Server MVP
Writer, Contributing Editor & Moderator
http://www.SQL-Server-Performance.Com
This posting is provided AS IS with no rights for the sake of knowledge sharing. Knowledge is of two kinds. We know a subject ourselves or we know where we can find information on it.
Mohammed – index is unique. As to whether it will be clustered or not, have yet to decide. http://www.tradeco.co.uk
Have decided to go for a clustered index on the txn_date_in_seconds column. Reasons: 1 Not convinced I have all the facts about column order in a composite clustered index
2 This meets the criterea in the article Satya pointed out, and the other guy’s thoughts too
3 Any (reasonably formed) clustered index on the table is improving query performance (although that’s not difficult given the poor logical design of db and quality of code). Having served most of my time on 6.5, it goes against the grain setting up a potential hot spot for inserts at the "end" of the table, but will monitor it. Thanks for thoughts so far. http://www.tradeco.co.uk
<blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><br />Having served most of my time on 6.5, it goes against the grain setting up a potential hot spot for inserts at the "end" of the table, but will monitor it.<br /><hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote"><br />Kimberly adresses this issue in one of the links I mentioned. [<img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ />]<br /><br />–<br />Frank Kalis<br />Microsoft SQL Server MVP<br />Contributing Editor, Writer & Forum Moderator<a target="_blank" href=http://www.sql-server-performance.com>http://www.sql-server-performance.com</a><br />Webmaster:<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a>
Frank – yeah, I saw that, and it made me think some. If you have a clustered index on an incrementing column, wouldn’t having a fill factor become largely redundant i.e. all the new pages you need would be at the end of the table, so having space available throught the index tree be pointless (or at least, add overhead in reorganising it)?
http://www.tradeco.co.uk
How can I miss to share my knowledge [<img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ />] and new findings. [<img src=’/community/emoticons/emotion-5.gif’ alt=’;)‘ />]<br /><blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><i>Originally posted by FrankKalis</i><br /><br />Please keep us informed here. I’m always interested in such stuff. [<img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ />]<br /><br />–<br />Frank Kalis<br />Microsoft SQL Server MVP<br />Contributing Editor, Writer & Forum Moderator<a target="_blank" href=http://www.sql-server-performance.com>http://www.sql-server-performance.com</a><br />Webmaster:<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a><br /><hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote"><br /><br /><b>Satya SKJ</b><br />Microsoft SQL Server MVP<br />Writer, Contributing Editor & Moderator<br /<a target="_blank" href=http://www.SQL-Server-Performance.Com>http://www.SQL-Server-Performance.Com</a><br /><center><font color="teal"><font size="1">This posting is provided AS IS with no rights for the sake of <i>knowledge sharing. <hr noshade size="1">Knowledge is of two kinds. We know a subject ourselves or we know where we can find information on it.</i></font id="size1"></font id="teal"></center>
Staya – anything from Kimberley on this then? Tradeco wholesalers, drop shippers & trade network
]]>