SQL Server Performance

Surrogate keys

Discussion in 'SQL Server 2005 Analysis Services' started by johnslg, Aug 2, 2007.

  1. johnslg New Member

    Nearly everything I read tells me to use integer surrogate keys for the dimension tables, combine all surrogate keys as the composite primary key in the fact table, and include business keys as attributes in the fact table and dimension tables.
    All of my data is coming from a single OLTP db in which each table has a bigint identity PK. In my fact table I'm going to have 8 or 9 bigint surrogate keys along with the same number of bigint business keys. Each dimension is going to have a bigint surrogate key along with a bigint business key.
    Wouldn't it be more efficient just to use my business keys as PK<->FK and cut the width of my tables?
    Thanks.
  2. satya Moderator

    Just revisiting the concept of surrogate keys and their use, it is the primary key for a dimension table and is independent of any keys provided by source data systems. The original key for each record is carried in the dimension table but is not used as the primary key. Special keys are used for date and time dimensions, but these keys differ from surrogate keys used for other dimension tables.
    The IDENTITY property and IDENTITY function can be used to create identity columns in tables and to manage series of generated numeric keys. IDENTITY functionality is more useful in surrogate key management than uniqueidentifier GUIDs.
    There used to be an article in this regard, if I found will post here back.
  3. martins New Member

    Hi,
    The whole idea of surrogate keys are to avoid using the business keys of your dimensions in the fact table. It is therefore not always necessary to have the business keys in your fact table as well.
    The bigint identity in your OLTP database was probably set up to be the primary key for a reason, but to successfully create a dimensional model you will have to find out what data columns/business data in each table uniquely identifies a record...and use that as business key in your dimension.
    Depending on the type of facts in your fact table, it will not always be the case that the combination of all surrogate keys will form the composite primary key. Think of financial transactions for instance...each financial transaction should have a transaction number which identifies it uniquely, irrespective to which supplier it might have been paid to/received from.
    Your assumptions from what you have read might work in a vey simplistic model, but in a real business scenario it might cause you a lot of problems later.
    Hope it helps,
    Martin.
  4. Raulie New Member

    To anwser your question NO, never use business key (Natural Key) in lieu of true surrogate keys for dimensions as this can impose major problems in your design.
  5. dineshasanka Moderator

    Yes, but in case of date dimensions i have used surrgate keys like 20070101 for date 2007/1/1. Purpose of doing this is to partition data. From that we could increase performance
  6. johnslg New Member

    [quote user="dineshasanka"]
    Yes, but in case of date dimensions i have used surrgate keys like 20070101 for date 2007/1/1. Purpose of doing this is to partition data. From that we could increase performance[/quote]
    When you partition data in your OLAP do you partition fact table and all dimensions or just the fact table? I am going to have at least three different date hierarchies in my date dimension so I'm not sure using one as the key like you do will be advisable.
    I've also read (and the MS test referred to) situations where data was partitioned and the current day (period) was stored ROLAP while the remainder was MOLAP. I'm going to have several hundred million rows of data with tens of thousand additions per day. What is the advantage of this scenario? (Needless to say I am concerned about performance and latency.)
    Thanks
  7. Raulie New Member

    Johnslg
    Why would you want to put three different date hierarchies in one time dimension??? As far as partitioning I would just partition the fact table when you join the fact to dimension tables it will only join to dimension rows that correspond to the fact table partitions. Make sure you index all PKs and FKs.
  8. johnslg New Member

    I have to slice sales by order date, scheduled delivery date, and actual delivery date. I would think that would require three different date hierarchies.
  9. Raulie New Member

    Ok sounds like your Time Dimension will play "Dimensional Roles" on the Fact Table. In this design you can build only one Time Dimension, and out of this Time Dimension you can create multiple views and join each individual View (From the Time Dim) to each cooresponding Fact Table time column. This scenario is very common best practice in dimensional design. You would never want build multiple hierarchies in one Dimension.
  10. Raulie New Member

    In the case of Time dimensions the Surrogate Key is best represented as an integer like dineshasanka mentioned (20071001) sometimes this will be referred to as a smart key, but never use the Natural key which takes many forms 2007/1/1, 2007-01-01, Jan-1-2007 etc..

Share This Page