SQL Server Performance

join multiple times on same table in one query

Discussion in 'ALL SQL SERVER QUESTIONS' started by xsean, Jan 16, 2012.

  1. xsean New Member

    i have 2 tables: Customers and Variables. Below are the fields for each tables.

    Variables Table:
    VariableID int PK
    VariableType varchar(20)
    VariableValue varchar(50)

    Customers Table:
    CustomerID int PK
    CustomerName varchar(100)
    Gender int
    Occupation int
    MaritalStatus int
    Religion int

    Sample Variables Data:
    VariableID VariableType VariableValue
    1 Gender Male
    2 Gender Female
    3 Occupation Doctor
    4 Occupation Engineer
    5 Occupation Director
    6 MaritalStatus Single
    7 MaritalStatus Married
    8 MaritalStatus Widowed
    9 Religion Buddhist
    10 Religion Christian

    Sample Customers Data:
    CustomerID CustomerName Gender Occupation MaritalStatus Religion
    1 Alexander Smith 1 3 6 10
    2 Kelvin Woo 1 5 7 9
    3 Stacy White 2 3 8 10

    The Gender, Occupation, MaritalStatus, Religion in Customers Table will join back to Variables Table to get VariableValue.

    I'm not sure this is a good approach but is easy to maintain as I can change the VariableValue and add new VariableType if there is new Occupation.

    My concern is joining multiple times on same table in one query will greatly hurt the query performance?

    If yes, any suggestion to solve this issue? I hope can get the balance between flexibility and performance. I get flexiblity but not performance on the above design.
  2. davidfarr Member

    Technically, your design appears to violate the 2nd Normal Form of design rules.
    This does not mean that the design cannot work. It just makes some queries more tricky, and possibly confusing to other staff that run queries on these tables.

    For example; When inserting or updating VariableID values in the Customers table, you would need to first determine the correct [VariableID] values from the Variables table.
    To do this you need to match on the two varchar fields [VariableType] and [VariableValue] to get the correct [VariableID] value. This can become a performance issue if millions of records are involved.
    You would also be advised to create a unique composite index on these two fields to ensure that the combined field value of each record remains unique (Occupation | Doctor).
    If you are doing an insert or update from a list with inconsistent or incorrect spelling (Occupation | Dokter) then this can get tricky.

    Also consider what your SELECT statement would look like:
    SELECT t1.CustomerID, t1.CustomerName, t2.VariableValue as Gender, t3.VariableValue as Occupation, t4.VariableValue as MaritalStatus, t5.VariableValue as Religion
    FROM Customers t1
    LEFT JOIN Variables t2 on t1.Gender = t2.VariableID
    LEFT JOIN Variables t3 on t1.Occupation = t3.VariableID
    LEFT JOIN Variables t4 on t1.MaritalStatus = t4.VariableID
    LEFT JOIN Variables t5 on t1.Religion = t5.VariableID
    The above statement gives a correct result, and should execute quite fast, but consider that this kind of object naming and field naming is confusing to people who are familiar with more normalized designs.

    With correct indexing on your design; I would expect almost no performance problem at all.
    There is no absolute right or wrong choice here; If you (or your client/company) feels that there is an unacceptable performance difference then you may consider a different design.
    I would say to keep this design if you can, because the advantage of the flexibility probably outweighs the performance cost.

Share This Page