SQL Server Performance Forum – Threads Archive
DataWarehouse: Star Schema or Normalised database?Hi
I need some advice on designing a DataWarehouse…
We are currently using 2000 (upgrading to 2005 soon) for out OLTP database which is currently about 90GB. Currently we run stats, etc. over a read only copy of our live database which is kept upto date by log shipping. Unfortunately, the queries being run over the read only are taking longer and longer, so was decided we need a Data Warehouse so that we can run these stats queries quicker, and have more historical data to play with (because we delete quote a lot of data from the OLTP at the moment), and also to allow some data mining type queries. As I don’t have any experience in this area, we have a Consultant in to help guide us on this. After reading up on data warehouses, I was convinced that we would be designing the data warehouse database with a star or snowflake schema. However, the consultant who has been involved in designing data warehouses for years, for some big companies, and for different databases, says that we shouldn’t be using star schema. We are basically going down the route of the data warehouse database being a normalised database very similar to the OLTP database. I am quietly concerned about this, and don’t really have the experience to challenge this decision. Does anyone else have experience of this?… is there valid reasons to use a normalised database for data warehouse as opposed to star schema?… would anyone else have concerns about doing this?
I thought that we would probably be using Analysis Services over the data warehouse database… would Analysis Services be okay creating cubes over a normailised DB or does it need to be star/snowflake schema? Thanks in advance for any help or advice.
I would be more than quietly concerned about paying a Data Warehouse consultant who’s recommendation is make the Data Warehouse more like the OLTP database. 1. OLTP Design – OLTP databases are designed to allow input/update of tables as rapidly as possible to permit the business to operate. As such you try and normalize to remove data redundancy and you need granular data access. 2. OLAP Design – OLAP database design is intended to reflect static unchanging data into summaries. As such you naturally try to remove normalization because it speeds the reporting process, and you don’t need the granularity. 3. OLTP Reporting Design – I have not heard anything that would justify even considering, thinking about, pondering or paying a consultant to move you to Data Warehousing. What I heard you say was that your queries for data were taking longer and longer to run. Queries can be easily optimized and sped up using additional indexes, indexed views, pre-joined tables, etc. Given that you are already moving your data to a read only server, you could add as many indexes as you would like to the tables without impacting the OLTP inserts/updates which would speed things up for you. You could also create indexed views, or pre-joined tables for certain types of data and modify your queries to use those instead. 4. The reason for creating cubes and moving to Data Warehousing is the business case that OLTP reporting can’t resolve. ie Business Analysts who need to do ad-hoc queries quickly to make decisions about the "state of the business". For those needs OLTP type reporting tools just won’t work. Not that they are to slow, they just won’t work. We do all of the heavy lifting to merge data, prepare cubes, identify facts/dimensions etc, so that they can sit at their desk and say what if I looked at the data inside out what will that show me. "Oh my goodness now I see a trend. Quick now let me take just this segment and show me the data upside down." 5. As for the recommendation my concern would be this … The performance issue is that in having the data normalized you have to do a lot of joins to pull it together and summarize it. If you don’t pre-summarize it (roll it up) and you don’t force data redundancy to avoid having to do joins, then what is the point? "Woo – hoo I just charged you huge dollars to create a slow data warehouse to replace your slow OLTP reporting server. But be sure to thank me since at least you can say it is a data warehouse." Wait that would probably benefit the consultant as opposed to your company. Have you actually checked any of this persons references of these big companies he’s consulted to before? 6. In all fairness to this consultant – The data of the real world very seldom falls into our computer in either a STAR or SNOWFLAKE pattern. There tends to be some STAR looking things, and some SNOWFLAKE like patterns, and some SNOWFLAKE’s that are really STARS with a 6’th point on them etc. But if it looks like an OLTP Duck, and it performs like and OLTP Duck, I’d check the tail feathers to see if it is an OLTP Duck, before paying this guy for delivering a Golden Goose.
At our company we are undergoing similiar challanges. We are stearing towards a normalized design EDW with star schemas and DataMarts built from those structures. The decision is based on many factors such as the magnitude of the warehouse estimated at 300 + TeraBytes and the complexity and diversity of the company. Most successful large scale (hundred + Terabyte) implementations are built around the normalized design. However 90GB is just barely begining to touch the iceburg in this space so druer’s comments are valid in your case. DW is big business and DW assessement is critical seek advice from various resources if possible and don’t base your decision on just one example. Raulie
Per Bill Inmon: – Data Warehouse database contains compony wide data in 3rd normal form. There is complete history of changes contained there.
– Data Marts are department and/or business area specific and their data source is dw database where all company wide data are integrated and cleaned. They contain just as much history as it is required for business they support. Their typical structure is star or snowflake schema.
Hi all Enterprise Datawarehouse (EDW) is a normalized structure. Dimensional Star/Snowflake Schemas are not normalized. All constitute Datawarehousing just have different design approaches. Raulie
Hi Raulie, by DW database I mean EDW.
Hi mmavoric I thought that is what you meant just was not sure and wanted to make sure the distinction was made. Here are some good links on the subject by the way. http://www.intelligententerprise.com/showArticle.jhtml?articleID=17800088
Raulie, thanks for the great articles. [<img src=’/community/emoticons/emotion-1.gif’ alt=’‘ />]