Natural vs. Surrogate Keys in SQL Server : Getting the Proper Perspective


Performance Testing

Three different examples highlighting three different aspects of the performance issue are tested.  As you will see, neither synthetic nor natural keys win in all cases.  Note: the examples shown here are ‘corner cases’, designed to highlight performance differences.  In real world databases, the differences you see are likely to be smaller than those shown here.

Test Case 1: OLTP Data Insertion

A sample employee table is created, using SSN as primary key.  Two client sessions ( on separate machines) are simultaneously started, each inserting 100,000 rows of random test data.    The test is then rerun with an IDENTITY column as primary key, and a unique constraint added on SSN.

Test Table 1: Natural Key

CREATE TABLE Employees

(

      SSN               CHAR(9) PRIMARY KEY,

      Firstname         VARCHAR(50),

      Lastname          VARCHAR(50),

      Date1             DATETIME NOT NULL DEFAULT GETDATE(),

      Int1              INTEGER NOT NULL DEFAULT 0,

      Char1             Varchar(50),

)

Test Table 2: Surrogate Key

CREATE TABLE Employees

(

      EmployeeID        INT IDENTITY PRIMARY KEY,

      SSN               CHAR(9) UNIQUE NOT NULL,

      Firstname         VARCHAR(50),

      Lastname          VARCHAR(50),

      Date1             DATETIME NOT NULL DEFAULT GETDATE(),

      Int1              INTEGER NOT NULL DEFAULT 0,

      Char1             Varchar(50),

)

Test Results (average of three runs)

·         Natural Key Insert:          39.1 sec.
·         Surrogate Key Insert:     46.5 sec.   (19% slower)

Before I ran this test, I expected the vast majority of any performance differential would be due maintaining two indexes, rather than one.  However, when I reran it without the unique constraint on SSN, the difference was almost exactly half the original, meaning contention on the IDENTITY column is a significant factor.

Test Case #2: Narrowing a Wide Index

A query is used to join a child table to its parent by a 40-byte three-column foreign key.   The child table contains one million rows, and the parent 300K rows.   Key values are created randomly.  The test is then rerun using a 4-byte IDENTITY column to perform the JOIN.  To force an index seek (rather than an index scan) a WHERE clause is used to limit rows retrieved to 1% of the table total.

Table Schema:

CREATE TABLE Parent

(

      SurrogateKey      INT IDENTITY NOT NULL,

      NatKeyPart1       CHAR(32) NOT NULL,

      NatKeyPart2       INT NOT NULL,

      NatKeyPart3       INT NOT NULL,

      MiscData          VARCHAR(100),

)

CREATE INDEX ix_Parent ON Parent(ID)

CREATE INDEX ix_ParentParts ON Parent(KeyPart1,KeyPart2,KeyPart3)

CREATE TABLE Child

(

      SurrogateKey      INT NOT NULL,

      NatKeyPart1       CHAR(32) NOT NULL,

      NatKeyPart2       INT NOT NULL,

      NatKeyPart3       INT NOT NULL,

      ChildData         VARCHAR(100) PRIMARY KEY CLUSTERED

)

Test Results (average of three runs)

·         Natural Key Insert:          11.1 sec. (21% slower)
·         Surrogate Key Insert:     9.2 sec.  

Test Case #3: Join Elimination via Natural Key

The same data and methodology from Test Case #2 are used.  The difference here is that the query references only natural key values from the Parent table, rather than all columns.  This means when the natural key is used as the foreign key, no join to the Parent table is necessary; the query can be fully serviced by the Child table.

Test Results (average of three runs):

·         Natural Key Insert:          2.9 sec
·         Surrogate Key Insert:       9.2 sec.   (320% slower)

Conclusion

This article has hopefully demonstrated that the choice of natural vs. surrogate key is a complex issue, with no single answer that fits all scenarios.  Armed with the information above, however, you can make better decisions about which is the best solution for your own particular needs.

Pages: 1 2




Related Articles :

  • No Related Articles Found

One Response to “Natural vs. Surrogate Keys in SQL Server : Getting the Proper Perspective”

  1. If the db is to be replicated sql server will add surrogate key for you if you don’t already have one

Software Reviews | Book Reviews | FAQs | Tips | Articles | Performance Tuning | Audit | BI | Clustering | Developer | Reporting | DBA | ASP.NET Ado | Views tips | | Developer FAQs | Replication Tips | OS Tips | Misc Tips | Index Tuning Tips | Hints Tips | High Availability Tips | Hardware Tips | ETL Tips | Components Tips | Configuration Tips | App Dev Tips | OLAP Tips | Admin Tips | Software Reviews | Error | Clustering FAQs | Performance Tuning FAQs | DBA FAQs |