Denormalization in SQL Server for Fun and Profit

Future-proofing an application via denormalization can also involve the removal of tables, or rather reverse-decomposing multiple tables into one.  Imagine a book store chain has individual stores, each of which carries specific genres of books, and handles only specific publishers.  Due to contractual obligations, if a store carries a genre, it must do so for all publishers It deals with. 

Fully normalizing this relationship requires three tables.  With sample data, they might appear like this:

Table: StoreGenres

Store

Genre

Books R US

Mystery

Books R US

Science Fiction

Downtown Books

Textbooks

Downtown Books

Horror

Book Nook

Self Help

                                              

Table: StorePublishers

Store

Publisher

Books R US

Acme Publishing

Books R US

TGA Distributors

Downtown Books

TGA Distributors

Downtown Books

Arkham House

Book Nook

Acme Publishing

Book Nook

TGA Distributors

Table: PublisherGenres

Publisher

Genre

Acme Publishing

Mystery

Acme Publishing

Self Help

TGA Distributors

Textbooks

TGA Distributors

Mystery

TGA Distributors

Science Fiction

TGA Distributors

Self Help

Arkham House

Horror

The list of genres a store carries from each publisher is generated with the following query:

SELECT sp.Store, sp.Publisher, pg.Genre

FROM StorePublishers sp

JOIN PublisherGenres pg ON sp.Publisher = pg.Publisher

JOIN StoreGenres sg ON sg.Store = sp.Store AND sg.Genre = pg.Genre

Planning Ahead.  What happens when the sales contract expires, and stores are free to carry only specific genres from individual publishers?  In that case, the above schema won’t work – if Downtown Books wants to carry mysteries from Acme, they’ll be forced into carrying Acme’s self-help books also. 

To handle this contingency, you reverse-decompose the three tables into one:

StorePublisherGenres

Books R US

Acme Publishing

Mystery

Books R US

TGA Distributors

Mystery

Books R US

TGA Distributors

Science Fiction

Downtown Books

TGA Distributors

Textbooks

Downtown Books

Arkham House

Horror

Book Nook

Acme Publishing

Self Help

Book Nook

TGA Distributors

Self Help

Note: this table data is nothing more than the results of the above query.  But by physically materializing it, you allow individual genres from each publisher to be inserted or deleted.  You’ve future-proofed the application to allow each to vary independently.   (You’ve also improved performance somewhat as well—you’ve recast a three-way JOIN into a single table query).

As in the prior examples, your denormalization has created a minor update anomaly.  If Downtown Books adds Acme Publishing to their list of publishers, you have to insert two new records, not one – at least until that contract expires.  Of course, you can always ensure consistency through a trigger on the table or an update stored procedure.   When the contract expires, simply update the trigger, rather than replacing all three tables, and the queries that reference them.

In these simple examples, you could say we’re not really denormalizing as much as we’re simply “pre-normalizing”.   In other words, while we’re technically violating normal form according to  the current business requirements, once the anticipated changes take place, our schema will again be normalized.  However, for more complex cases, this isn’t always true.   

To see this, let’s return to our example of teachers and students, with the current business rule that each student is assigned to a single teacher and classroom, and the future possibility that backup teachers will be assigned to each class.   What if there was another possibility- that the school might instead “split” each classroom into two, and assign each subclass its own teacher and students.  Can we create a schema that allows for the possibility of both this and the teacher/backup teacher we already considered?

One possibility is to add a second foreign key into the students table:

Table: Students

Student (PK)

Teacher (Nullable FK)

Classroom( Nullable FK)

One of the FKs relates to teachers, the other to classrooms.  In the case of students assigned to a class with multiple teachers, the second FK is used and the first is left null.   If students are assigned to a teacher in a “subclass”, the reverse is true.  

Continues…

Pages: 1 2 3




Related Articles :

No comments yet... Be the first to leave a reply!

Software Reviews | Book Reviews | FAQs | Tips | Articles | Performance Tuning | Audit | BI | Clustering | Developer | Reporting | DBA | ASP.NET Ado | Views tips | | Developer FAQs | Replication Tips | OS Tips | Misc Tips | Index Tuning Tips | Hints Tips | High Availability Tips | Hardware Tips | ETL Tips | Components Tips | Configuration Tips | App Dev Tips | OLAP Tips | Admin Tips | Software Reviews | Error | Clustering FAQs | Performance Tuning FAQs | DBA FAQs |