Identity Property Range Checking in SQL Server
The IDENTITY property for a column of a numerical data type is a frequently used method to achieve system-generated “uniqueness” for each row in a table. Such a column then in turn is a quite popular choice for the PRIMARY KEY constraint. Most of the times one would choose the data type int for the underlying column. However, the IDENTITY property can be defined on any integer-like data type and even on the decimal data type as long as the chosen scale is 0. By default SQL Server picks only the positive values unless you specify otherwise. So, when you opt to start with a negative seed value, this is perfectly fine for SQL Server and by doing so, you essentially double the range of possible values for most of the available data types. It may hurt one’s aesthetic experience, but if you take negative values into account, this gives you the following range of possible values:
If you decide to use a decimal data type such as decimal(38, 0) this gives you a range of -10^38
to 10^38-1 possible values, which, for almost any practical purposes should be more than enough.
But what can actually happen if you are about to exceed this range?
Let’s create a very simple test case:
CREATE TABLE dbo.id_overflow ( col1 int IDENTITY(2147483647,1) ); GO
The above script creates a new table dbo.id_overflow with only one column col1. This column is of type int with the IDENTITY property defined on it. The seed value is chosen to be the maximum value for the int type which is 2147483647. I just arbitrarily picked the int data type, I could have chosen any other eligible data type, the result would still be the same. So, when we now insert into this table, the very first insert statement is likely to succeed, while any subsequent one will fail with an arithmetic overflow error.
--This insert will succeed INSERT INTO dbo.id_overflow DEFAULT VALUES; --This insert will fail INSERT INTO dbo.id_overflow DEFAULT VALUES; (1 row(s) affected) Msg 8115, Level 16, State 1, Line 2 Arithmetic overflow error converting IDENTITY to data type int. Arithmetic overflow occurred.
So far, everything is as expected and when we look at the content of the table we only see the one row from the first insert.
SELECT * FROM dbo.id_overflow; col1 2147483647 (1 row(s) affected)
But what do you do in such a case? You can’t insert any more rows into this table. Even if there might be gaps in the sequence of the existing IDENTITY values, these gaps won’t be reused automatically. Once allocated, SQL Server doesn’t care about them and if an insert doesn’t succeed for whatever reason, this just freshly allocated value is gone.
Essentially, the only feasible solution to this problem is to choose a “bigger” data type. So, a very simplified change script to change the data type in our example to bigint would look like this:
IF OBJECT_ID('dbo.id_overflow') IS NOT NULL DROP TABLE dbo.id_overflow; GO CREATE TABLE dbo.id_overflow ( col1 int IDENTITY(2147483647,1) ) GO --This insert will succeed INSERT INTO dbo.id_overflow DEFAULT VALUES; --Now change the data type to a bigger one. ALTER TABLE dbo.id_overflow ALTER COLUMN col1 bigint; --This insert will now succeed as well INSERT INTO dbo.id_overflow DEFAULT VALUES; SELECT * FROM dbo.id_overflow;
If you run this batch, it will finish without an error and yield the expected resultset of 2 rows. But, as mentioned above, a change script in almost any real-world database would be much more complex. Indexes would have to be changed, referencing tables would have to be changed, code parts where the value of that column is assigned to a variable of type int, etc…
It is not hard to predict, that you’re in deep trouble when this table is one of your main tables in a database and is referenced by many other tables and/or in many places in your code.
I was bitten by a similar scenario not that long ago. Fortunately it was “only” a lookup table with an IDENTITY column on a smallint data typed column. And I was fortunate that I could simply reseed the IDENTITY value because the last +7000 inserts failed due to a misunderstanding between the developers of the calling application and me on how a certain parameter to a procedure should be used. But it still was enough trouble for me to decide to write a small check script that is now part of my weekly scripts and that gives me all the tables having such an IDENTITY column along with the last value consumed as well as the buffer I have left before I run out of values again. Here it is:
;WITH TypeRange AS ( SELECT 'bigint' AS [name], 9223372036854775807 AS MaxValue, -9223372036854775808 AS MinValue UNION ALL SELECT 'int', 2147483647, -2147483648 UNION ALL SELECT 'smallint', 32767, -32768 UNION ALL SELECT 'tinyint', 255, 0 ), IdentBuffer AS ( SELECT OBJECT_SCHEMA_NAME(IC.object_id) AS [schema_name], O.name AS table_name, IC.name AS column_name, T.name AS data_typ, CAST(IC.seed_value AS decimal(38, 0)) AS seed_value, IC.increment_value, CAST(IC.last_value AS decimal(38, 0)) AS last_value, CAST(TR.MaxValue AS decimal(38, 0)) - CAST(ISNULL(IC.last_value, 0) AS decimal(38, 0)) AS [buffer], CAST(CASE WHEN seed_value < 0 THEN TR.MaxValue - TR.MinValue ELSE TR.maxValue END AS decimal(38, 0)) AS full_type_range, TR.MaxValue AS max_type_value FROM sys.identity_columns IC JOIN sys.types T ON IC.system_type_id = T.system_type_id JOIN sys.objects O ON IC.object_id = O.object_id JOIN TypeRange TR ON T.name = TR.name WHERE O.is_ms_shipped = 0) SELECT IdentBuffer.[schema_name], IdentBuffer.table_name, IdentBuffer.column_name, IdentBuffer.data_typ, IdentBuffer.seed_value, IdentBuffer.increment_value, IdentBuffer.last_value, IdentBuffer.max_type_value, IdentBuffer.full_type_range, IdentBuffer.buffer, CASE WHEN IdentBuffer.seed_value < 0 THEN (-1 * IdentBuffer.seed_value + IdentBuffer.last_value) / IdentBuffer.full_type_range ELSE (IdentBuffer.last_value * 1.0) / IdentBuffer.full_type_range END AS [identityvalue_consumption_in_percent] FROM IdentBuffer ORDER BY [identityvalue_consumption_in_percent] DESC;
Since SQL Server 2005 it has been really easy to get this information. As you can see from the script, I have omitted the decimal(38,0) alternative. For me, a bigint column with a negative seed value is more than I would possibly ever need.
I got into the habit of running this daily to monitor how many values we have left in the buffer before it blows up again and to get a feeling for “how urgent” it is to look at the inevitable changes to the database. Possible other variations would be to send out an alert when a certain threshold is reached but that I leave up to your fantasy.