Future-proofing an application via denormalization can also
involve the removal of tables, or rather reverse-decomposing multiple tables
into one. Imagine a book store chain has individual stores, each of which
carries specific genres of books, and handles only specific publishers. Due to
contractual obligations, if a store carries a genre, it must do so for all
publishers It deals with.
Fully normalizing this relationship requires three tables.
With sample data, they might appear like this:
Table: StoreGenres |
|
Store |
Genre |
Books R US |
Mystery |
Books R US |
Science Fiction |
Downtown Books |
Textbooks |
Downtown Books |
Horror |
Book Nook |
Self Help |
Table: StorePublishers |
|
Store |
Publisher |
Books R US |
Acme Publishing |
Books R US |
TGA Distributors |
Downtown Books |
TGA Distributors |
Downtown Books |
Arkham House |
Book Nook |
Acme Publishing |
Book Nook |
TGA Distributors |
Table: PublisherGenres |
|
Publisher |
Genre |
Acme Publishing |
Mystery |
Acme Publishing |
Self Help |
TGA Distributors |
Textbooks |
TGA Distributors |
Mystery |
TGA Distributors |
Science Fiction |
TGA Distributors |
Self Help |
Arkham House |
Horror |
The list of genres a store carries from each publisher is
generated with the following query:
SELECT
sp.Store, sp.Publisher, pg.Genre
FROM
StorePublishers sp
JOIN
PublisherGenres pg ON sp.Publisher = pg.Publisher
JOIN
StoreGenres sg ON sg.Store
= sp.Store AND sg.Genre = pg.Genre
Planning Ahead. What happens when the sales contract
expires, and stores are free to carry only specific genres from individual
publishers? In that case, the above schema won’t work – if Downtown Books
wants to carry mysteries from Acme, they’ll be forced into carrying Acme’s self-help
books also.
To handle this contingency, you reverse-decompose the three
tables into one:
StorePublisherGenres |
||
Books R US |
Acme Publishing |
Mystery |
Books R US |
TGA Distributors |
Mystery |
Books R US |
TGA Distributors |
Science Fiction |
Downtown Books |
TGA Distributors |
Textbooks |
Downtown Books |
Arkham House |
Horror |
Book Nook |
Acme Publishing |
Self Help |
Book Nook |
TGA Distributors |
Self Help |
Note: this table data is nothing more than the results of
the above query. But by physically materializing it, you allow individual
genres from each publisher to be inserted or deleted. You’ve future-proofed
the application to allow each to vary independently. (You’ve also improved
performance somewhat as well—you’ve recast a three-way JOIN into a single table
query).
As in the prior examples, your denormalization has created a
minor update anomaly. If Downtown Books adds Acme Publishing to their list of
publishers, you have to insert two new records, not one – at least until that
contract expires. Of course, you can always ensure consistency through a
trigger on the table or an update stored procedure. When the contract
expires, simply update the trigger, rather than replacing all three tables, and
the queries that reference them.
In these simple examples, you could say we’re not really
denormalizing as much as we’re simply “pre-normalizing”. In other words,
while we’re technically violating normal form according to the current
business requirements, once the anticipated changes take place, our schema will
again be normalized. However, for more complex cases, this isn’t always true.
To see this, let’s return to our example of teachers and
students, with the current business rule that each student is assigned to a
single teacher and classroom, and the future possibility that backup teachers
will be assigned to each class. What if there was another possibility- that
the school might instead “split” each classroom into two, and assign each
subclass its own teacher and students. Can we create a schema that allows for
the possibility of both this and the teacher/backup teacher we already
considered?
One possibility is to add a second foreign key into the
students table:
Table: Students |
Student (PK) |
Teacher (Nullable FK) |
Classroom( Nullable FK) |
One of the FKs relates to teachers, the other to
classrooms. In the case of students assigned to a class with multiple
teachers, the second FK is used and the first is left null. If students are
assigned to a teacher in a “subclass”, the reverse is true.