Performance hit using UPDATE CASCADE? | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Performance hit using UPDATE CASCADE?

Hi,
I need to use UPDATE CASCADE on a table. Is this a major performance hit? It will be a quite small database, so I expect it won’t. But is there a problem with huge databases? Thanks!
Do you need data integrity in a database? Well of course you do – that’s what it’s all about. There are those who prefer not to use natural keys, and use IDENTITY as a substitute key and as the RK for FK relationships, because it is extremely unlikely to change over time, so cascading updates will not be necessary. On the other hand, it means your queries must always include the RK table if you need to filter on the natural key. If your database is a "black box" to the end user, then of course you can go for the solution that performs best. If users can run queries themselves, then you can give them views that present only the natural keys, or you can opt for natural keys. With natural keys, it is good practice to add an IDENTITY column and make that the clustered index, before adding a primary key or unique constraint to the natural key.
quote:Originally posted by Adriaan
With natural keys, it is good practice to add an IDENTITY column and make that the clustered index, before adding a primary key or unique constraint to the natural key.
You have my vote too. Roji. P. Thomas
Microsoft SQL Server MVP
http://toponewithties.blogspot.com

Me three. [<img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ />]<br /><br />–<br />Frank Kalis<br />Moderator<br />Microsoft SQL Server MVP<br />Webmaster:<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a>
Hey guys, isn’t this supposed to be a forum for discussion?[<img src=’/community/emoticons/emotion-5.gif’ alt=’;)‘ />]
Who said that discussion always has to be controverse? [<img src=’/community/emoticons/emotion-4.gif’ alt=’:p‘ />]<br /><br />–<br />Frank Kalis<br />Moderator<br />Microsoft SQL Server MVP<br />Webmaster:<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a>
<blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><i>Originally posted by Adriaan</i><br /><br />Hey guys, isn’t this supposed to be a forum for discussion?[<img src=’/community/emoticons/emotion-5.gif’ alt=’;)‘ />]<br /><hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote"><br />You expected some of us to sound like CELKO ? <br /><br /<a target="_blank" href=http://tinyurl.com/rwlzh>http://tinyurl.com/rwlzh</a><br /><br />Roji. P. Thomas<br />Microsoft SQL Server MVP<br /<a target="_blank" href=http://toponewithties.blogspot.com>http://toponewithties.blogspot.com</a><br />
quote:Originally posted by Roji. P. Thomas
You expected some of us to sound like CELKO ?
Or am I beginning to sound like Celko myself? Please let me know.
<blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><i>Originally posted by Adriaan</i><br />Or am I beginning to sound like Celko myself? <i>Please</i> let me know.<br /><hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote"><br />Not really. But you have the potential [<img src=’/community/emoticons/emotion-2.gif’ alt=’:D‘ />]<br /><br />Roji. P. Thomas<br />Microsoft SQL Server MVP<br /<a target="_blank" href=http://toponewithties.blogspot.com>http://toponewithties.blogspot.com</a><br />
Hey guys, today seems to be a drifting off-topic day. Don’t let us hijack yet another thread. [<img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ />]<br /><br />–<br />Frank Kalis<br />Moderator<br />Microsoft SQL Server MVP<br />Webmaster:<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a>
I was already afraid I might. Keep me on the straight and narrow!
…. DELETED because I re-read question and realized I was answer CASCADE DELETE, not update. Sorry Panic, Chaos, Disorder … my work here is done –unknown
<blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><i>Originally posted by FrankKalis</i><br /><br />Hey guys, today seems to be a drifting off-topic day. Don’t let us hijack yet another thread. [<img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ />]<br /><hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote"><br /><br />So we have the consensus that we will leave this thread and find another victim.[<img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ />]<br /><br />Roji. P. Thomas<br />Microsoft SQL Server MVP<br /<a target="_blank" href=http://toponewithties.blogspot.com>http://toponewithties.blogspot.com</a><br />
quote:Originally posted by Roji. P. Thomas
quote:Originally posted by Adriaan
With natural keys, it is good practice to add an IDENTITY column and make that the clustered index, before adding a primary key or unique constraint to the natural key.
You have my vote too. Roji. P. Thomas
Microsoft SQL Server MVP
http://toponewithties.blogspot.com

Care to explain why you would want data sorted by an arbitrary (identity) value which is unlikely to be used for sorting? I’m interested to know the advatange, other than that the clustering index key would be an int. Panic, Chaos, Disorder … my work here is done –unknown
quote:Originally posted by SQL_Guess
Care to explain why you would want data sorted by an arbitrary (identity) value which is unlikely to be used for sorting? I’m interested to know the advatange, other than that the clustering index key would be an int.

Do you mind if I point you to a URL by Kimberly L Tripp where the reasons are decently documneted. ? Ever-increasing clustering key – the Clustered Index Debate……….again! Roji. P. Thomas
Microsoft SQL Server MVP
http://toponewithties.blogspot.com

The advantage is that new entries will always be added at the end of the clustered index. If you have a clustered index on a natural key, new entries are more likely to have a non-incremental value, which means they can probably not be added at the end. Also, a numeric column is usually small in width, which is ideal for a clustered index. Remember that a clustered index is the founding stone for all other indexes on the same table. Don’t confuse the sorting order of data in an index with the sorting order of data in your queries. Even when the natural key is set as the clustered PK of your table, then you still have to use ORDER BY in your queries to get the results in that order – otherwise there is no guarantee of the order of the data.
quote:Originally posted by Roji. P. Thomas
quote:Originally posted by SQL_Guess
Care to explain why you would want data sorted by an arbitrary (identity) value which is unlikely to be used for sorting? I’m interested to know the advatange, other than that the clustering index key would be an int.

Do you mind if I point you to a URL by Kimberly L Tripp where the reasons are decently documneted. ? Ever-increasing clustering key – the Clustered Index Debate……….again! Roji. P. Thomas
Microsoft SQL Server MVP
http://toponewithties.blogspot.com

quote:Originally posted by Adriaan
The advantage is that new entries will always be added at the end of the clustered index. If you have a clustered index on a natural key, new entries are more likely to have a non-incremental value, which means they can probably not be added at the end. Also, a numeric column is usually small in width, which is ideal for a clustered index. Remember that a clustered index is the founding stone for all other indexes on the same table. Don’t confuse the sorting order of data in an index with the sorting order of data in your queries. Even when the natural key is set as the clustered PK of your table, then you still have to use ORDER BY in your queries to get the results in that order – otherwise there is no guarantee of the order of the data.
Thanks guys. I’ve always been under the impression that the best use of a clustering index, where possible, is to arrange your data in the most frequent order requirement – for example if data is always accessed in data order, then that would be a good clustering key, since the overhead of using that data type as a clustering key is outwieghed by the benefits when doing sorts… I’ll have a read of those threads. It’s always good to elarn something new! I do know that the Order by is a requirement, but my understanding is that a clustering index on the ORDER BY key, where possible, will give significant advantages in access path. Panic, Chaos, Disorder … my work here is done –unknown
quote:Originally posted by SQL_Guess
Care to explain why you would want data sorted by an arbitrary (identity) value which is unlikely to be used for sorting? I’m interested to know the advatange, other than that the clustering index key would be an int.

Also note that, its just a myth that, the data is sorted in the order of clustered index. The Facts About Clustered Indexes Roji. P. Thomas
Microsoft SQL Server MVP
http://toponewithties.blogspot.com

Dang, again thanks, Roji. Neither was I aware of David Portas blog, nor that by Itzik. —
Frank Kalis
Moderator
Microsoft SQL Server MVP
Webmaster:http://www.insidesql.de
quote:I’ve always been under the impression that the best use of a clustering index, where possible, is to arrange your data in the most frequent order requirement – for example if data is always accessed in data order, then that would be a good clustering key, since the overhead of using that data type as a clustering key is outwieghed by the benefits when doing sorts
BOL adds to the confusion: "A clustered index is particularly efficient on columns that are often searched for ranges of values." This probably holds true for static data.
quote:Originally posted by Adriaan
BOL adds to the confusion: "A clustered index is particularly efficient on columns that are often searched for ranges of values." This probably holds true for static data.
Also true about a static unique chronological datetime column. Roji. P. Thomas
Microsoft SQL Server MVP
http://toponewithties.blogspot.com

Not to sound like Celko <img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ /> — but I’ll jump in and argue that this is not actually a good idea across the board. It might be useful in certain cirumstances, but I think Kimberly’s argument misses some key points.
merrillaldrich, You made the jump, now you take the plunge: what is "not actually a good idea across the board"? Inquiring minds want to know!
OK, I’ll do my best to explain my thinking – I’m not a complete expert, so I might well be wrong about some nuance of this. As I understand it the idea behind the sequential clustered index is two-fold: 1. Prevent both page splits and the fragmentation that comes from page splits, because both the splitting action and the fragmentation degrade performance. 2. Speed up joins and selection and make them less expensive by providing keys that are narrow (take less memory) and can be processed more rapidly (e.g. Int comparison is faster than string comparison). That all sounds great, but I don’t think it helps in all cases. Consider this: If I have a table clustered on an ever-increasing and unchangeable column (like Identity) then it is true there will be fewer page splits on the table object itself as the table gets inserts / updates. I think the only splits would occur as the result of variable-length columns’ content being modified, which could be counterbalanced with appropriate fill factor. But, there must be indexes. The table will have to have indexes, because the clustering column is not meaningful, therefore selections from the table will need to have indexes if they are to perform at all well. Those indexes are stored in index order, so they will fragment as the table is modified, even though the table is clustered on identity. Page splits — and more indexes will mean more of them. There must also be a natural key, which then has to have a separate unique index, which will have to be maintained and could exhibit fragmentation if they keys are subject to change. Lastly, consider whether the prevalent joins in use on the system join on the identity column or on the natural key. If the natural key is the join criteria, then all processing has to happen against a secondary index that refers back to the table via identity (or a table scan). This is not too nice. The base table is never in an order that helps the engine join to it, it’s always in another, arbitrary order. This isn’t a problem as far as select + order by, as has already been pointed out, because the ordering is not intended to help delivere ordered results — it does seem like a problem for joining to the table, though, forcing the system through an extra index to get to the data. In summary, adding an identity column arbitrarily will have these consequences: 1. A additional index will have to be created / maintained / defragged for the real key
2. All selections that use the natural key will have to operate via that index to get to the data, rather than fetching it directly from the table
3. Joins against the table using the natural key will likewise have to hit the index first rather than just the table. These concerns indicate to me that it’s not automatically a net gain. I read a really interesting blog on this very topic which I will try to locate again and post here.
Here’s that article – long, but very interesting: http://www.simple-talk.com/sql/performance/sql-server-performance-testing/ A success story for composite-key performance
quote:Originally posted by merrillaldrich OK, I’ll do my best to explain my thinking – I’m not a complete expert, so I might well be wrong about some nuance of this.
Same for me too
quote:
As I understand it the idea behind the sequential clustered index is two-fold: 1. Prevent both page splits and the fragmentation that comes from page splits, because both the splitting action and the fragmentation degrade performance. 2. Speed up joins and selection and make them less expensive by providing keys that are narrow (take less memory) and can be processed more rapidly (e.g. Int comparison is faster than string comparison).
The idea here is that you will define your surrogate identity key both as clustered and PRIMARY KEY. and you will be defining a UNIQUE Constraint/INDEX on the natural key. This helps you having relatively smaller data and index sizes.
quote:
That all sounds great, but I don’t think it helps in all cases.
Completely agree. There are cases where a different approach will be more efficient.
quote:
Consider this: If I have a table clustered on an ever-increasing and unchangeable column (like Identity) then it is true there will be fewer page splits on the table object itself as the table gets inserts / updates. I think the only splits would occur as the result of variable-length columns’ content being modified, which could be counterbalanced with appropriate fill factor. But, there must be indexes. The table will have to have indexes, because the clustering column is not meaningful, therefore selections from the table will need to have indexes if they are to perform at all well. Those indexes are stored in index order, so they will fragment as the table is modified, even though the table is clustered on identity. Page splits — and more indexes will mean more of them.
This argument holds true even if you have clustered index on your natural key. That doesnt stop you from the fragmentation of non-clustered indexes.
quote:
There must also be a natural key, which then has to have a separate unique index, which will have to be maintained and could exhibit fragmentation if they keys are subject to change.
Fragmentation of a non-clustered index is not as bad as the fragmentation of a clustered index.
quote:
Lastly, consider whether the prevalent joins in use on the system join on the identity column or on the natural key. If the natural key is the join criteria, then all processing has to happen against a secondary index that refers back to the table via identity (or a table scan). This is not too nice. The base table is never in an order that helps the engine join to it, it’s always in another, arbitrary order. This isn’t a problem as far as select + order by, as has already been pointed out, because the ordering is not intended to help delivere ordered results — it does seem like a problem for joining to the table, though, forcing the system through an extra index to get to the data.
Since we define the surrogate key as the PRIMARY KEY, most of the joins should be based on the surrogate key. And Joins on the natural key can use the index on the natural key. A performance problem may arise when the join has to be performed on the index on natural key and then do a book mark look up on the data pages to get the desired data. But then this is true in the case of every non-clustered index. And this can be solved either by a covering index or index intersection.
quote:
In summary, adding an identity column arbitrarily will have these consequences: 1. A additional index will have to be created / maintained / defragged for the real key
True
quote:
2. All selections that use the natural key will have to operate via that index to get to the data, rather than fetching it directly from the table
Not a problem. Using a small index is always better than using the clustered index.
quote:
3. Joins against the table using the natural key will likewise have to hit the index first rather than just the table.
In case of Bookmark lookup, see the options mentioned above.
Roji. P. Thomas
Microsoft SQL Server MVP
http://toponewithties.blogspot.com

quote:
1. A additional index will have to be created / maintained / defragged for the real key
True. But "naturally" compact as so not that much storage space wasted.
quote:
2. All selections that use the natural key will have to operate via that index to get to the data, rather than fetching it directly from the table
Yes, but since the natural key is likely to be very selective by nature, you’re likely to get an index seek along with the jump to the table. Which should be more effective than a scan.
quote:
3. Joins against the table using the natural key will likewise have to hit the index first rather than just the table.
When you implement the "concept" of artificial keys with "IDENTITY" PRIMARY KEYS you do this all over the place, not just for one single table. In this case it is not very "smart" to join one the natural key, instead of joining on the artificial key. —
Frank Kalis
Moderator
Microsoft SQL Server MVP
Webmaster:http://www.insidesql.de
All valid points. I wonder what you guys think of the article I posted (if you have time to look it over) ?
Aside from page splitting and fragmentation argument, the main dissadvantage of natural primary keys is that key is subject to updates and when ‘natural foreign key’ points to ‘natural primary key’ and ‘natural primary key’ is changed you have to update rows in related tables all over the place. *
[hijacking]
Just for the record: I agree with everything Adriaan said in this thread. There is joke in Serbia about the child that never said a word untill he was 3 years old. All of suddent he asked for salt and explained later that he was mute because everything was perfect.
[/hijacking]
Also true – and bringing us back to the thread topic <img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ />. I wonder if anyone else has done a side-by-side test (it would take quite a lot of work to build two versions of a system, one with and one without, to test them)
Merill, I like the article. I alreday knew that composite primary key is good idea in scenario described. It was used in former company on a few big tables. Bad idea is to have natural keys, especially composite natural primary keys.
]]>