SQL Server Performance Forum – Threads Archive
singe primary key better than composite pk?Hi !
http://www.sql-server-performance.com/clustered_indexes.asp Reading this article i made a conclusion that it much wizer use singe int column as a primary key than combine couple of columes for unique identifier. Otherwise you can’t create clustered index on composite PK.
And this can lead to bad performance with a big tables.
Am i right?
True in terms of performance.
Get yourself a cup of coffee, sit back, relax and enjoy this post from the great Joe Celko on the MS newsgroups [<img src=’/community/emoticons/emotion-1.gif’ alt=’‘ />]<br /><br /><blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><br />>> I’m trying to change an existing field [sic] to an<br />IDENTITY field [sic], but I’m getting syntax errors. It’s got to be<br />simple, but somewhere I’m missing something. <<<br /><br />What you are missing is the basic concepts of the relational model.<br />Columns are not fields; rows are not records; tables are not files. An<br />IDENTITY property cannot be a key by definition. A key is a subset of<br />attributes that uniquely define an entity in your data model. <br /><br />The IDENTITY column is a holdover from the early programming language<br />which were very close to the hardware. For example, the fields in a<br />COBOL or FORTRAN program were assumed to be physically located in main<br />storage in the order they were declared in the program. <br /><br />The early SQLs were based on existing file systems. The data was kept<br />in physically contiguous disk pages, in physically contiguous rows, made<br />up of physically contiguous columns. In short, just like a deck of<br />punch cards or a magnetic tape. <br /><br />But physically contiguous storage is only one way of building a<br />relational database and it is not always the best one. But aside from<br />that, the whole idea of a relational database is that user is not<br />supposed to know how things are stored at all, much less write code that<br />depends on the particular physical representation in a particular<br />release of a particular product. <br /><br />One of the biggest errors is the IDENTITY column (actually property) in<br />the Sybase family (SQL Server and Sybase). People actually program with<br />this "feature" and even use it as the primary key for the table! Now,<br />let’s go into painful details as to why this thing is bad. <br /><br />The practical considerations are that IDENTITY is proprietary and<br />non-portable, so you know that you will have maintenance problems when<br />you change releases or port your system to other products. <br /><br />But let’s look at the logical problems. First try to create a table<br />with two columns and try to make them both IDENTITY. If you cannot<br />declare more than one column to be of a certain datatype, then that<br />thing is not a datatype at all, by definition. It is a property which<br />belongs to the PHYSICAL table, not the data in the table. <br /><br />Next, create a table with one column and make it an IDENTITY. Now try<br />to insert, update and delete different numbers from it. If you cannot<br />insert, update and delete rows from a table, then it is not a table by<br />definition. <br /><br />Finally create a simple table with one IDENTITY and a few other columns.<br />Use a few statements like <br /><br />INSERT INTO Foobar (a, b, c) VALUES (‘a1’, ‘b1’, ‘c1’); <br />INSERT INTO Foobar (a, b, c) VALUES (‘a2’, ‘b2’, ‘c2’); <br />INSERT INTO Foobar (a, b, c) VALUES (‘a3’, ‘b3’, ‘c3′); <br /><br />to put a few rows into the table and notice that the IDENTITY<br />sequentially numbered them in the order they were presented. If you<br />delete a row, the gap in the sequence is not filled in and the sequence<br />continues from the highest number that has ever been used in that column<br />in that particular table. <br /><br />But now use a statement with a query expression in it, like this:<br /><br />INSERT INTO Foobar (a, b, c)<br />SELECT x, y, z<br /> FROM Floob;<br /><br />Since a query result is a table, and a table is a set which has no<br />ordering, what should the IDENTITY numbers be? The entire, whole,<br />completed set is presented to Foobar all at once, not a row at a time.<br />There are (n!) ways to number (n) rows, so which one do you pick? The<br />answer has been to use whatever the physical order of the result set<br />happened to be. That non-relational phrase "physical order" again. <br /><br />But it is actually worse than that. If the same query is executed<br />again, but with new statistics or after an index has been dropped or<br />added, the new execution plan could bring the result set back in a<br />different physical order. Can you explain from a logical model why the<br />same rows in the second query get different IDENTITY numbers? In the<br />relational model, they should be treated the same if all the values of<br />all the attributes are identical. <br /><br />Using IDENTITY as a primary key is a sign that there is no data model,<br />only an imitation of a sequential file system. Since this number exists<br />only as a result of the state of particular piece of hardware at a<br />particular time, how do you verify that an entity has such a number in<br />the reality you are modeling? <br /><br />To quote from Dr. Codd: "..Database users may cause the system to<br />generate or delete a surrogate, but they have no control over its value,<br />nor is its value ever displayed to them …"(Dr. Codd in ACM TODS, pp<br />409-410) and Codd, E. (1979), Extending the database relational model to<br />capture more meaning. ACM Transactions on Database Systems, 4(4). pp.<br />397-434. This means that a surogate ought ot act like an index; created<br />by the user, managed by the system and NEVER seen by a user. That means<br />never used in queries. <br /><br />Codd also wrote the following:<br /><br />"There are three difficulties in employing user-controlled keys as<br />permanent surrogates for entities.<br /><br />(1) The actual values of user-controlled keys are determined by users<br />and must therefore be subject to change by them (e.g. if two companies<br />merge, the two employee databases might be combined with the result that<br />some or all of the serial numbers might be changed.).<br /><br />(2) Two relations may have user-controlled keys defined on distinct<br />domains (e.g. one uses social security, while the other uses employee<br />serial numbers) and yet the entities denoted are the same.<br /><br />(3) It may be necessary to carry information about an entity either<br />before it has been assigned a user-controlled key value or after it has<br />ceased tohave one (e.g. and applicant for a job and a retiree).<br /><br />These difficulties have the important consequence that an equi-join on<br />common key values may not yield the same result as a join on common<br />entities. A solution – proposed in part  and more fully in  – is<br />to introduce entity domains which contain system-assigned surrogates.<br />Database users may cause the system to generate or delete a surrogate,<br />but they have no control over its value, nor is its value ever displayed<br />to them….." (Codd in ACM TODS, pp 409-410).<br /><br />References<br /><br />Codd, E. (1979), Extending the database relational model to capture more<br />meaning. ACM Transactions on Database Systems, 4(4). pp. 397-434<br />& lt;br />The most common use that a Newbie makes of IDENTITY is to use it as a<br />record number (under the error that a record nubmer is a key!), so that<br />he does not have to think about keys, DRI, check digits, proper data<br />types, international standards and all that hard stuff. <br /><br />While this was meant as an abstract example, I also fear that you have<br />not read ISO-11179 because of the silly, redundant, dangerous prefixes<br />on your code. <br /><br />–CELKO–<br /><hr height="1" noshade id="quote"></blockquote id="quote"></font id="quote"><br /><br />Take it for what it is worth [<img src=’/community/emoticons/emotion-1.gif’ alt=’‘ />]<br /><br /><br />Frank<br /<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a>
Frank Thanks for the reference which explains as a whole.
Keep it up. _________
I actually never used identity as a primary key.
But just now i realize why it is good idea generate
primary key instead of using identity.
Hi ya, Whilst in theory an identity column is not really a datatype and is not portable, in practice they are a good/easy way to get great performance out of database. Surrogate keys are merely a physical implementation of a logical relationship. I have been using identity fields as primary keys for almost a decade now and they have never let me down. Cheers
I agree with Twan. The identity property is in the product for a good reason, it’s there to be used, it has usefulness, and it does not necessarily cause problems. I think the theory needs to catch up with the practice. Tom Pullen
DBA, Oxfam GB
<blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><br />I agree with Twan. The identity property is in the product for a good reason, it’s there to be used, it has usefulness, and it does not necessarily cause problems. I think the theory needs to catch up with the practice.<br /><hr height="1" noshade id="quote"></blockquote id="quote"></font id="quote"><br />[<img src=’/community/emoticons/emotion-2.gif’ alt=’‘ />] now that we are alone….<br /><br />I do too. Makes life much easier<br /><br />Frank<br /<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a>
Actually i remember i had a problem with identity column during
replication when i actually created the structure of db first and then replicated data.
Once there was any kind of error identity sequence whas broken and i had to drop identity
workaround and set identity up again.
I found it too much headache and i guess since there i always generate single int column
rather than use identity. I was always wandering if it is better to use natural PK ,but now i realise that natural key very often to big ,and sometimes require a couple of column .
After reading this article listed above i realise that using single int is the best.
You can still use IDENTITY by Using NOT FOR REPLICATION option. If you are using transactional replication with the immediate-updating Subscribers option, do not use the IDENTITY NOT FOR REPLICATION design. Instead, create the IDENTITY property at the Publisher only, and have the Subscriber use just the base data type (for example, int). Then, the next identity value is always generated at the Publisher. _________
Also an ideal PK should NEVER be updated… <img src=’/community/emoticons/emotion-5.gif’ alt=’;-)’ /><br /><br />Cheers<br />Twan
Sure but nothing in this life is ideal…