SQL Server Performance Forum – Threads Archive
Identity vs. Unique Identifier PerformanceI want to add an identity or unique identifier column to a table in order to ensure that each row in the table is unique. From a performance perspective, which one should I use?
One of the key aspects of database table design is to ensure what is called entity integrity. What this means is that you need to ensure that each row in your database tables is unique. If you don’t take the proper precautions, it is possible that one or more rows of your table might be the same. If this happens, what does this really mean (duplicate records)?
One of the more common ways to guarantee entity integrity is to create a primary key. A primary key, which is an index created on one or more columns, is used to guarantee that no duplicate records are entered into a table. If you try to enter a duplicate record, SQL Server won’t let you and will give you an error message. Some tables lend themselves to uniqueness and it is easy to find a column that you know will always be unique, and adding a primary key to the column is no problem. But in other cases, there is no single column that is unique. Instead, it may take two, three, four, or more columns to uniquely identify a column. If you want, you can create a primary key on multiple columns. Unfortunately, creating a primary key on multiple columns has a major drawback. And that is the index that is created to enforce each row’s uniqueness may get very large. The wider the index, the larger the physical index will be. This can hurt performance because it takes SQL Server more resources to maintain or to query wider indexes than it takes it to maintain or to query narrow indexes. So is there any way to avoid creating primary keys with wide indexes? Yes, and that brings us to the point of our question. Instead of creating a primary key with a wide index, what you can do is to add either an identity or a unique identifier column to the table, and use this column as the primary key. This will greatly reduce the width of the index created, reducing the physical size of the index, and speeding up SQL Server’s access to the table. What an identify field or a unique identify column does is to automatically create a unique value for each row created in your database. You don’t have to worry about creating these values for yourself, this is done automatically by SQL Server. But what you must decide is whether to use an identify field or a unique identity column. Ideally, from a performance perspective, you want to use the one that creates the smallest possible physical index in order to maximize performance. So which one of these types of columns should you use? Let’s first look at identify values. An identify value is an automatically incrementing integer that starts counting from a base seed value (such as 1, or 100, or 1000, whatever integer you choose), and increments by a value you specify, which is called an increment (such as 1, -1, 10, whatever integer you choose). The default seed is 1, and the default increment is 1. Because the values are never the same (they increment), an identify value often makes for a good column on which to base a primary key. When you create an identity column in a table, you must also specify the data type as some form of integer. As you probably know, SQL Server supports several different integer types, each with different sizes (widths). They include: Bigint 8 bytes -263 to 263-1 (SQL 2000 only)
Int 4 bytes -2,147,483,648 to 2,147,483,647
Smallint 2 bytes -32,768 to 32,767
Tinyint 1 byte 0 to 255
When creating an identify column, always select the narrowest integer type that will meet your needs. If an Int will do, then don’t use Bigint, as you will just unnecessary be increasing the physical size of the index, which can hurt SQL Server’s performance. A unique identifier column, like an identify column, is used to create unique values for each column in your table. But instead of using an incrementing value, instead it creates a unique value based on a special internal algorithm that creates what are called a Globally Unique Identifier (GUID), which is 16 bytes in size. Right away, you should notice which unique column type–identify or unique identifier–will provide the best performance as a primary key. Because the unique identifier column will always be 16 bytes wide, it will always create a primary key index that is larger than any of the identity column options, and unless you have a special reason that is outside the scope of this question, should not be used as the basis as a primary key. Instead, use one of the variations of the identity column, selecting the smallest one that meets your data’s needs. This way, you can ensure that index created by the primary key is as small as it can, helping to ensure overall better performance of your database, and at the same time ensuring entity integrity of your database tables.
one drawback of using identity columns as primary key is that all inserts go to the end of table, and last page of course. so when multiple users try to insert their row against the table there is feight to get the last page lock. one the other hand alwayes inserting to last page has an advantage if your index is clustered. sql server do not need to reorganize its B-tree for those inserts.
like real life, there is always a trade of.
one benefit of using identity columns as primery key is that if your index is clustered