Denormalization in SQL Server for Fun and Profit

Future-proofing an application via denormalization can also
involve the removal of tables, or rather reverse-decomposing multiple tables
into one.  Imagine a book store chain has individual stores, each of which
carries specific genres of books, and handles only specific publishers.  Due to
contractual obligations, if a store carries a genre, it must do so for all
publishers It deals with. 

Fully normalizing this relationship requires three tables. 
With sample data, they might appear like this:

Table: StoreGenres

Store

Genre

Books R US

Mystery

Books R US

Science Fiction

Downtown Books

Textbooks

Downtown Books

Horror

Book Nook

Self Help

                                              

Table: StorePublishers

Store

Publisher

Books R US

Acme Publishing

Books R US

TGA Distributors

Downtown Books

TGA Distributors

Downtown Books

Arkham House

Book Nook

Acme Publishing

Book Nook

TGA Distributors

Table: PublisherGenres

Publisher

Genre

Acme Publishing

Mystery

Acme Publishing

Self Help

TGA Distributors

Textbooks

TGA Distributors

Mystery

TGA Distributors

Science Fiction

TGA Distributors

Self Help

Arkham House

Horror

The list of genres a store carries from each publisher is
generated with the following query:

SELECT
sp.Store, sp.Publisher, pg.Genre

FROM
StorePublishers sp

JOIN
PublisherGenres pg ON sp.Publisher = pg.Publisher

JOIN
StoreGenres sg ON sg.Store
= sp.Store AND sg.Genre = pg.Genre

Planning Ahead.  What happens when the sales contract
expires, and stores are free to carry only specific genres from individual
publishers?  In that case, the above schema won’t work – if Downtown Books
wants to carry mysteries from Acme, they’ll be forced into carrying Acme’s self-help
books also. 

To handle this contingency, you reverse-decompose the three
tables into one:

StorePublisherGenres

Books R US

Acme Publishing

Mystery

Books R US

TGA Distributors

Mystery

Books R US

TGA Distributors

Science Fiction

Downtown Books

TGA Distributors

Textbooks

Downtown Books

Arkham House

Horror

Book Nook

Acme Publishing

Self Help

Book Nook

TGA Distributors

Self Help

Note: this table data is nothing more than the results of
the above query.  But by physically materializing it, you allow individual
genres from each publisher to be inserted or deleted.  You’ve future-proofed
the application to allow each to vary independently.   (You’ve also improved
performance somewhat as well—you’ve recast a three-way JOIN into a single table
query).

As in the prior examples, your denormalization has created a
minor update anomaly.  If Downtown Books adds Acme Publishing to their list of
publishers, you have to insert two new records, not one – at least until that
contract expires.  Of course, you can always ensure consistency through a
trigger on the table or an update stored procedure.   When the contract
expires, simply update the trigger, rather than replacing all three tables, and
the queries that reference them.

In these simple examples, you could say we’re not really
denormalizing as much as we’re simply “pre-normalizing”.   In other words,
while we’re technically violating normal form according to  the current
business requirements, once the anticipated changes take place, our schema will
again be normalized.  However, for more complex cases, this isn’t always true. 
 

To see this, let’s return to our example of teachers and
students, with the current business rule that each student is assigned to a
single teacher and classroom, and the future possibility that backup teachers
will be assigned to each class.   What if there was another possibility- that
the school might instead “split” each classroom into two, and assign each
subclass its own teacher and students.  Can we create a schema that allows for
the possibility of both this and the teacher/backup teacher we already
considered?

One possibility is to add a second foreign key into the
students table:

Table: Students

Student (PK)

Teacher (Nullable FK)

Classroom( Nullable FK)

One of the FKs relates to teachers, the other to
classrooms.  In the case of students assigned to a class with multiple
teachers, the second FK is used and the first is left null.   If students are
assigned to a teacher in a “subclass”, the reverse is true.  

Continues…

Leave a comment

Your email address will not be published.