SQL Server Performance Forum – Threads Archive
Normalize vs non-normalizeI am trying to decide whether I should normalize a table or not. Say I have a table named ‘A’ has 12 columns and four of the columns contains repeating data(think of it like lookup values like state, product catagory). Each of these four columns can technically become a separate table using normalization technique. This table ‘A’ could end up with 1 million rows and read and write activities on this table could be 50/50 or 40/60 split. Generally speaking I want to normalize a table so there is no redundant data and this will make update work efficently because there is only one place to update. This will however also complicate the join required as more smaller tables created. Are there a general guide or statistics I can check to decide whether I should normalize or not? If the read/write activity on this table is the deciding factor, what is the ratio of read/write for normalization? Wingman
I forget the terminology, but you have repeating data and you have redundant data. Repeating is when you have the same type of foreign key mentioned more than once for an entity. Redundant is when you have a detail that depends on a foreign key, but it is stored on the entity. Your description seems to mess up the distinction.
Ok, fair enough. Using your explanation or your thinking, let me re-phase my question, when should I turn a table with redundant data (4 fields as I explained in my original email) into repeating data (foreign keys) in the main table and 4 extra small tables. Please read the situation again in my original email. wingman
quote:Originally posted by Adriaan I forget the terminology, but you have repeating data and you have redundant data. Repeating is when you have the same type of foreign key mentioned more than once for an entity. Redundant is when you have a detail that depends on a foreign key, but it is stored on the entity. Your description seems to mess up the distinction.
Hi, Read these articles, maybe they help you to take a decision. http://www.dmreview.com/article_sub.cfm?articleId=5251
You should normalize, eitherwise you will have much bigger problems then writting a few more joins.
Are we talking about an OLAP or an OLTP database here?
Both have slightly different approaches. If this is OLTP, your first goal should be to properly normalise your schema. That means bring it into 3 Normal Form. And only as some kind of last resort (assuming proper indexes and proper queries are in place), you might think about breaking theoretical rules and go for denormalisation. You shouldn’t design your schema with denormalisation in mind right from the start. If we’re talking about OLAP, this might be a bit different, but I am not familiar with OLAP, so I can’t say much about how to design such a schema. —
Microsoft SQL Server MVP
Heute schon gebloggt?http://www.insidesql.de/blogs
In any case be careful how far you normalize (like Frank hinted with 3rd NF) and whether or not to use substitute keys – too many of those and you have a query nightmare. Also make sure you have clearly defined entities, not "catch-all" generic categories where you put varying numbers of parameters in child tables – another query nightmare.