SQL Server Performance

Multiple tables on same side of "On" clause

Discussion in 'SQL Server 2005 General Developer Questions' started by lmeinke, Nov 6, 2009.

  1. lmeinke New Member

    Hello All,
    I have a query that uses 6 joins to produce a results set. It is in the form of:
    Select A.field1, B.field2, C.field3, etc....
    From Table A
    Inner Join Table B
    On A.field1 + A.field2 = B.field1 + B.field2
    Inner Join Table C
    On A.field1 + A.field2 = C.field1 + C.field2
    Inner Join Table D
    On A.field1 = D.field1 --this is needed to get D.field99
    Inner Join Table E
    On A.field1 + D.field99 = E.field1 + E.field2
    Inner Join Table D
    On A.field1 + D.field99 = F.field1 + F.field2
    This query runs as expected and the results have been verified. But in looking back at it I confused as to how the SQL Server engine could process (reliably) two different tables on the same side of the join statement. It seems as though this would not work because the order of processing the joins is left up to SQL Server.
    Should this work?
    Would this work in other SQL databases?
    Is there a better way?....CTE’s etc….
    Thank you,
    Lonnie
  2. Adriaan New Member

    In an ON clause, you can refer to any table that is mentioned before this clause.
    You should avoid concatenation of column values when filtering or joining, as you may get unwanted matches. For instance if the concatenated string is abcd, then this could be the result of any of these five combinations of values:
    <blank> + abcd
    a + bcd
    ab + cd
    abc + d
    abcd + <blank>
    ... and this can be on either side of the = sign, so you could be matching a+bcd on abc+d ...
    You need to use proper logical statements:
    INNER JOIN D ON A.field1 = E.field1 AND D.field99 = E.field2
  3. lmeinke New Member

    Adriaan,
    Thank you for the response. In the tables I'm dealing with the 'a' column is a unique vendor ID and the 'b' column is one of 14 distribution centers. I understand you logic with the abcd, a+bcd example but it doesn't seem exactly applicable as the columns and tables are named explicitly in order. In looking at the two statements I'm not clear why they would behave differently. If you could elaborate a bit more I would greatly appreciate it.
    Thank you,
    Lonnie
  4. Adriaan New Member

    It really doesn't matter that the columns are named in order!
    The problem is that you're forcing SQL to concatenate the values from the columns, and to compare these concatenated values, instead of the original column values.
    You might have vendor id's a, ab, and abc, and distribution center id's bcd, cd and d. With the concatenation, you will get false matches because abcd matches a+bcd and ab+cd and abc+d.
    This is a type of mistake that sometimes pops up on this forum, more typically when a correlated subquery is required in a WHERE clause.
  5. lmeinke New Member

    I'm sorry but I'm still not following what you are trying so hard to tell me.
    example:
    a.field1 + D.field99 = E.field1 + E.Field2 --reality vendor1dc3 = vendor1dc3
    If SQL server is intelligent enough to process the joins neccesarry for this statement to work reliably (in other words the relation of table a & d) how can it evaluate to anything else. I can only apologize for my ignorance but I need a more throrough explanation or some pointers to reference material that may have such examples.
    Thank you,
    Lonnie
  6. Adriaan New Member

    Yes, SQL is intelligent enough to do the concatenation. But the concatenation may lead to ambiguity, which would result in false matches.
    Perhaps you're lucky in that the data doesn't actually present the patterns that I showed, but your SQL coding doesn't rule out those false matches.
    You need two separate comparisons, one for each set of fields. Do not concatenate!
    Another reason not to use concatenation: it means SQL probably cannot use indexes to support the join operation.
  7. lmeinke New Member

    Adriaan,
    Thank you for your responses. I've got the message loud and clear that concatenations are not recommended :). I'm an example person so I'll have to dig for some examples about the ambiguity you speak of.
    Thank you again.
    Lonnie
  8. Adriaan New Member

    Just split the concatenations:
    INNER JOIN D ON A.field1+D.field99 = E.Field1 + E.field2
    INNER JOIN D ON A.field1 = E.field1 AND D.field99 = E.field2
  9. i2lovefishing New Member

    Try this, if it makes sense......

    CREATE TABLE [dbo].[zz_Rates](
    [EmployeeID] [varchar](10) NULL,
    [ProjectID] [varchar](10) NULL,
    [HourlyRate] [money] NULL
    )

    CREATE TABLE [dbo].[zz_Hours](
    [EmployeeID] [varchar](10) NULL,
    [ProjectID] [varchar](10) NULL,
    [HoursWorked] [int] NULL
    )

    INSERT INTO zz_Rates VALUES ('02', '02', 11.00)
    INSERT INTO zz_Rates VALUES ('02', '0202', 13.00)
    INSERT INTO zz_Rates VALUES ('0202', '02', 30.00)
    INSERT INTO zz_Rates VALUES ('0202', '0202', 35.00)

    INSERT INTO zz_Hours VALUES ('02', '02', 50)
    INSERT INTO zz_Hours VALUES ('02', '0202', 80)
    INSERT INTO zz_Hours VALUES ('0202', '02', 8)
    INSERT INTO zz_Hours VALUES ('0202', '0202', 16)
    SELECT a.employeeid, sum(a.HourlyRate * b.HoursWorked) as wages
    FROM zz_Rates a join zz_Hours b on a.employeeid = b.employeeid and a.projectid = b.projectid
    GROUP BY a.employeeid
    SELECT a.employeeid, sum(a.HourlyRate * b.HoursWorked) as wages
    FROM zz_Rates a join zz_Hours b on a.employeeid + a.projectid = b.employeeid + b.projectid
    GROUP By a.employeeid

Share This Page