Natural vs. Surrogate Keys in SQL Server : Getting the Proper Perspective

Performance Testing

Three different examples highlighting three different
aspects of the performance issue are tested.  As you will see, neither
synthetic nor natural keys win in all cases.  Note: the examples shown here are
‘corner cases’, designed to highlight performance differences.  In real world
databases, the differences you see are likely to be smaller than those shown
here.

Test Case 1: OLTP Data Insertion

A sample employee table is created, using SSN as primary
key.  Two client sessions ( on separate machines) are simultaneously started,
each inserting 100,000 rows of random test data.    The test is then rerun with
an IDENTITY column as primary key, and a unique constraint added on SSN.

Test Table 1: Natural Key

CREATE
TABLE Employees

(

      SSN               CHAR(9) PRIMARY KEY,

      Firstname         VARCHAR(50),

      Lastname          VARCHAR(50),

      Date1             DATETIME NOT NULL DEFAULT GETDATE(),

      Int1              INTEGER
NOT NULL DEFAULT 0,

      Char1             Varchar(50),

)

Test Table 2: Surrogate Key

CREATE
TABLE Employees

(

      EmployeeID        INT IDENTITY PRIMARY KEY,

      SSN               CHAR(9) UNIQUE NOT NULL,

      Firstname         VARCHAR(50),

      Lastname          VARCHAR(50),

      Date1             DATETIME NOT NULL DEFAULT GETDATE(),

      Int1              INTEGER
NOT NULL DEFAULT 0,

      Char1             Varchar(50),

)

Test Results (average of three runs)

·        
Natural Key Insert:          39.1
sec.
·        
Surrogate Key Insert:     46.5 sec.   (19%
slower)

Before I ran this test, I expected the vast majority of any
performance differential would be due maintaining two indexes, rather than
one.  However, when I reran it without the unique constraint on SSN, the
difference was almost exactly half the original, meaning contention on the
IDENTITY column is a significant factor.

Test Case #2: Narrowing a Wide Index

A query is used to join a child table to its parent by a 40-byte
three-column foreign key.   The child table contains one million rows, and the
parent 300K rows.   Key values are created randomly.  The test is then rerun
using a 4-byte IDENTITY column to perform the JOIN.  To force an index seek
(rather than an index scan) a WHERE clause is used to limit rows retrieved to 1%
of the table total.

Table Schema:

CREATE
TABLE Parent

(

      SurrogateKey      INT IDENTITY NOT NULL,

      NatKeyPart1       CHAR(32) NOT NULL,

      NatKeyPart2       INT NOT NULL,

      NatKeyPart3       INT NOT NULL,

      MiscData          VARCHAR(100),

)

CREATE
INDEX ix_Parent ON
Parent(ID)

CREATE
INDEX ix_ParentParts ON
Parent(KeyPart1,KeyPart2,KeyPart3)

CREATE
TABLE Child

(

      SurrogateKey      INT NOT NULL,

      NatKeyPart1       CHAR(32) NOT NULL,

      NatKeyPart2       INT NOT NULL,

      NatKeyPart3       INT NOT NULL,

      ChildData         VARCHAR(100) PRIMARY KEY CLUSTERED

)

Test Results (average of three runs)

·        
Natural Key Insert:          11.1 sec. (21%
slower)
·        
Surrogate Key Insert:     9.2 sec.  

Test Case #3: Join Elimination via Natural Key

The same data and methodology from Test Case #2 are used. 
The difference here is that the query references only natural key values from
the Parent table, rather than all columns.  This means when the natural key is
used as the foreign key, no join to the Parent table is necessary; the query
can be fully serviced by the Child table.

Test Results (average of three runs):

·        
Natural Key Insert:          2.9 sec

·        
Surrogate Key Insert:       9.2 sec.   (320%
slower)

Conclusion

This article has hopefully demonstrated that the choice of
natural vs. surrogate key is a complex issue, with no single answer that fits
all scenarios.  Armed with the information above, however, you can make better
decisions about which is the best solution for your own particular needs.

]]>

Leave a comment

Your email address will not be published.