Denormalization in SQL Server for Fun and Profit

Almost from birth, database developers are taught that their databases must be normalized.  In many shops, failing to fully normalize can result in anything from public ridicule to exile to the company’s Siberian office.  Rarely discussed are the significant benefits that can accrue from intentionally denormalizing portions of a database schema.  Myths about denormalization abound, such as:

  • A normalized schema is always more stable and maintainable than a denormalized one.
  • The only benefit of denormalization is increased performance.
  • The performance increases from denormalization aren’t worth the drawbacks.

This article will address the first two points (I’ll tackle the final point in the second part of this series).  Other than for increased performance, when might you want to intentionally denormalize your structure?  A primary reason is to “future-proof” your application from changes in business logic that would force significant schema modifications.

Let’s look at a simple example.  You’re designing a database for a pizza store.  Each customer’s order contains one or more pizzas, and each order is assigned to a delivery driver.  In normal form, your schema looks like:

Table: Orders

Customer

Driver

Amount

Table: OrderItems

Order

Pizza Type

Planning Ahead.  Let’s say you’ve heard the owner is considering a new delivery model.  To increase customer satisfaction, every pizza will be boxed and sent for delivery the moment it comes out of the oven- even if other pizzas in the order are still baking. 

Since you’re a savvy developer, you plan for this and denormalize your data structure.  Though today, the driver column is functionally dependent only on the order itself, you cross your fingers, take a deep breath, and violate Second Normal Form by placing it in the OrderItems table.  There—you’ve just future-proofed your application.   Orders can now have multiple drivers. 

Your denormalization has introduced a small update anomaly (if an order’s driver changes, you have to update multiple rows, rather than just one) but if the probability of the delivery model change is large, this is well worth the cost.   This is typical when denormalizing, but usually it’s a small problem, and one that can be handled automatically via triggers, constraints, or other means..   For instance, in this case, you can create (or modify the existing) update SP for Orders to cascade the change into OrderItems.  Alternatively, you can create an UPDATE trigger on OrderItems that ensures all rows within one order have the same driver.  When the rule changes in the future, just remove the trigger—no need to update your tables or any queries that reference them.

Now let’s consider a slightly more complex (and somewhat more realistic) case.   Imagine an application to manage student and teacher assignments for an elementary school.    A sample schema might be:

Table: Teachers

Teacher (PK)

Classroom

Table: Students

Student (PK)

Teacher (FK)

Planning Ahead.  You happen to know that other elementary schools in the region are assigning secondary teachers to some classrooms.  You decide to support this in advance within your schema.  How would you do it via denormalization?  The ugly “repeating groups” solution of adding a “Teacher2” column is one solution, but not one that should appeal to you.    Far better to make the classroom itself the primary key, and move teachers to a child table:

Table: Classrooms

Classroom  (PK)

Teacher  (FK)

Table: Teachers

Teacher  (PK)

Classroom  (FK)

Table: Students

Student  (PK)

Classroom  (FK)

As before, this denormalization creates a problem we need to address.  In the future, the school may support multiple teachers in one classroom, but today that’s an error.   You solve that by the simple expedient of adding a unique constraint on the classroom FK in the teacher’s table.    When the business rule changes in the future, you simply remove the constraint.   Voila!  A far better solution than having to significantly alter your views, queries, and stored procs to conform to a new schema.

Continues…

Leave a comment

Your email address will not be published.