SQL Server Performance

Disagreement with a developer...

Discussion in 'SQL Server 2005 General Developer Questions' started by jholovacs, Oct 9, 2007.

  1. jholovacs New Member

    consider the following query:
    SELECT a.field1, b.field2
    FROM dbo.tableA a
    LEFT JOIN dbo.tableB b
    ON a.field3 = b.field3
    WHERE
    b.field4 IN ('A', 'B', 'C')
    Can anyone tell me if there would ever be an advantage to this being a left join operation? Also, the developer I'm discussing this with is confident that the WHERE clause is evaluated before the JOIN and I don't see how that could even be possible, but I don't want to unequivocally state he is wrong unless I can show him somehow.
    Supporting info/ documentation would be appreciated.

  2. thomas New Member

    The question of left join or not is more a question of, is it correct or not?. left join is not something you do for 'an advantage', it's something you do because it's logical, because you have a 1:many relationship, because it has meaning with your data.
  3. thomas New Member

    P.S for outer joins, the WHERE is evaluated AFTER the join.
  4. jholovacs New Member

    The performance issue is an aside for me... of course I'd prefer to know which is faster, but from a query operation standpoint, I'd really like to know how the query is analyzed in the optimizer. My first reaction as the developer was explaining why he did this sort of thing pegged my "BS-o-meter", but he swears by it. I'm pretty sure he's smoking something, but I need to be able to show him the error of his ways.
  5. FrankKalis Moderator

    [quote user="jholovacs"]
    consider the following query:
    SELECT a.field1, b.field2
    FROM dbo.tableA a
    LEFT JOIN dbo.tableB b
    ON a.field3 = b.field3
    WHERE
    b.field4 IN ('A', 'B', 'C')
    Can anyone tell me if there would ever be an advantage to this being a left join operation? Also, the developer I'm discussing this with is confident that the WHERE clause is evaluated before the JOIN and I don't see how that could even be possible, but I don't want to unequivocally state he is wrong unless I can show him somehow.
    Supporting info/ documentation would be appreciated.
    [/quote]
    The LEFT JOIN here is senseless! Since the WHERE clause is referring to the unpreserved tableB the LEFT JOIN is effectivly turned into an INNER JOIN and any halfway smart optimizer will rewrite such a query to an INNER JOIN. I have a more detailed example here: http://sql-server-performance.com/Community/forums/p/12669/70997.aspx#70997
    On the logical level is FROM evaluated before WHERE and any implementation is free to optimize things as long as the final resultset is guaranteed to be correct.
    To be honest, this is actually quite basic stuff, but I guess it will help reading Itzik Ben-Gan's "Inside SQL Server 2005" books.
  6. jholovacs New Member

    yeah, it didn't make sense to me either, which is why I asked him about it in the first place. He patiently explained to me that the WHERE clause was evaluated before the JOIN operation and filtered the result set for the JOIN, and it was a LEFT JOIN because that made it faster; the implication was that I was an idiot for being a DBA and not knowing this to be the case.
    I politely told him I thought he was dead flat wrong, but I really need to be able to prove it to be able to shut him up.
  7. FrankKalis Moderator

    Wrong! You're the DBA. It is him who should prove his hypothesis by hard facts. I would ask him to run his query and show the execution plan. All bets that SQL Server will show an INNER JOIN and that you get exactly the same plan for the equivalent query using the INNER JOIN. So, with two most likely identical execution plans, why should one variante run faster than the other?
    Here's a "management summary" of me posts in the other thread I mentioned above:USE tempDB
    IF OBJECT_ID('#Table1', 'table') > 0
    DROP TABLE #Table1, #Table2
    CREATE TABLE #Table1(tid INT, c1 CHAR)
    CREATE TABLE #Table2(tid INT, c2 CHAR)INSERT
    INTO #Table1 SELECT 1, 'a'
    UNION ALL SELECT 2, 'b'
    UNION ALL SELECT 3, 'c'
    UNION ALL SELECT 4, 'd'INSERT
    INTO #Table2
    SELECT 1, 'x'
    UNION ALL SELECT 2, 'y'
    UNION ALL SELECT 3, 'z'SELECT
    t1.*, t2.c2
    FROM #Table1 t1
    LEFT OUTER JOIN #Table2 t2
    ON t1.tid=t2.tid
    WHERE t2.c2='x'
    SELECT t1.*, t2.c2
    FROM #Table1 t1
    INNER JOIN #Table2 t2
    ON t1.tid=t2.tid
    WHERE t2.c2='x'
    tid c1 c2
    ----------- ---- ----
    1 a x
    (1 row(s) affected)
    tid c1 c2
    ----------- ---- ----
    1 a x
    (1 row(s) affected)
    Both statements produce the same resultset. Here are the execution plans:
    StmtText
    ---------------------------------------------------------------------------------------------------------------------
    SELECT t1.*, t2.c2
    FROM #Table1 t1
    LEFT OUTER JOIN #Table2 t2
    ON t1.tid=t2.tid
    WHERE t2.c2='x'
    (1 row(s) affected)
    StmtText
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------
    |--Hash Match(Inner Join, HASH:([t2].[tid])=([t1].[tid]), RESIDUAL:([tempdb].[dbo].[#Table2].[tid] as [t2].[tid]=[tempdb].[dbo].[#Table1].[tid] as [t1].[tid]))
    |--Table Scan(OBJECT:([tempdb].[dbo].[#Table2] AS [t2]), WHERE:([tempdb].[dbo].[#Table2].[c2] as [t2].[c2]='x'))
    |--Table Scan(OBJECT:([tempdb].[dbo].[#Table1] AS [t1]))
    (3 row(s) affected)
    StmtText
    ---------------------------------------------------------------------------------------------------------------
    SELECT t1.*, t2.c2
    FROM #Table1 t1
    INNER JOIN #Table2 t2
    ON t1.tid=t2.tid
    WHERE t2.c2='x'
    (1 row(s) affected)
    StmtText
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------
    |--Hash Match(Inner Join, HASH:([t2].[tid])=([t1].[tid]), RESIDUAL:([tempdb].[dbo].[#Table2].[tid] as [t2].[tid]=[tempdb].[dbo].[#Table1].[tid] as [t1].[tid]))
    |--Table Scan(OBJECT:([tempdb].[dbo].[#Table2] AS [t2]), WHERE:([tempdb].[dbo].[#Table2].[c2] as [t2].[c2]='x'))
    |--Table Scan(OBJECT:([tempdb].[dbo].[#Table1] AS [t1]))
    (3 row(s) affected)
    On the logical level the FROM clause is evaluated first. That includes JOINs. The WHERE clause is logically evaluated thereafter. So, the filtering happens after the JOIN has been processed. There are many, many threads in almost any RDBMS community about this. Just google, for example, the MS public newsgroups. Here's just one example: http://groups.google.de/group/microsoft.public.sqlserver.programming/msg/ee85c52a7c1c909d
    As I mentioned before, Itzik has dedicated two chapters in one of his books about SQL Server 2005 to logical and physical query processing. Tell your developer to get a copy of these books before making such funny statements again.
  8. jholovacs New Member

    Thanks that's exactly what I was looking for.
  9. FrankKalis Moderator

    Have fun. [:)]
  10. evilDBA New Member

    Is he already fired? :)
  11. jholovacs New Member

    We don't fire our developers when they screw something up. We make them fix it; it's far more painful that way.
    :)
  12. DilliGrg Member

    [quote user="jholovacs"]
    We don't fire our developers when they screw something up. We make them fix it; it's far more painful that way.
    :)
    [/quote]
    Hmmm... I never thought about it.[;)]

Share This Page