SQL Server Performance

filter criteria in Where clause Vs Along with Join

Discussion in 'T-SQL Performance Tuning for Developers' started by ramasubbup, Nov 27, 2005.

  1. ramasubbup New Member


    Hi,

    Which query will run fast?

    query 1
    --------

    select a.*
    from a
    inner join b
    on a.c1=b.c1
    where a.c2 IS NULL

    query 2
    -----------
    select a.*
    from a
    inner join b
    on a.c1=b.c1
    and a.c2 IS NULL


    How SS will select the records?

    Will it does the Join operation first and filters the records specified in WHERE clause?
    or filters the records specified in WHERE clause and does the Join?

    thanks
    Ramasubbu






  2. ranjitjain New Member

    type this and run your Query
    SET STATISTICS IO ON

    Also check for execution plan generated.
    Mostly both will be same.
  3. Madhivanan Moderator

    I think query1 is more efficient

    Madhivanan

    Failing to plan is Planning to fail
  4. ramasubbup New Member


    In the first query JOIN is happening first and WHERE CLAUSE is applied over the results
    from the JOIN.

    In the second query JOIN and WHERE both are happening at the same time.
    Second should be faster no?

  5. Madhivanan Moderator

    Well. For Inner joins there wont be any differences but when you use Left Join

    Set the Execution Plan and test these

    USE NorthWind

    Select O.CustomerId,O.OrderId,OD.UnitPrice from Orders O left join [Order Details] OD
    on O.OrderId=OD.OrderId and O.CustomerId='Chops'

    Select O.CustomerId,O.OrderId,OD.UnitPrice from Orders O left join [Order Details] OD
    on O.OrderId=OD.OrderId where O.CustomerId='Chops'


    Madhivanan

    Failing to plan is Planning to fail
  6. ramasubbup New Member

    Yes, that's TRUE.

    In the LEFT JOIN, the where condition is applied to the result sets,
    which shows that join operations happes first and WHERE condition is
    applied over the results.

    I don't know if it is same for INNER JOIN.

    Execution plans are same doesn't guarantee the same time.

    For less number of records there is no significant differences.
    For huge volume of data, we have to go for BEST option.

    Can anyone know how it happens in the back ground, for both the queries?


  7. FrankKalis Moderator

    Why do you think that same execution plans don't guarantee the same execution time?
    In a clean and sterile test scenario on a separated machine all executions expect the first one are likely to run equally long ceteris paribus. However, surely in a production environment there are a lot of other things to consider. Network traffic, concurrency...

    Here's a posting by Joe Celko about how a SELECT works

    quote:
    Here is how a SELECT works in SQL ... at least in theory. Real products
    will optimize things when they can.


    a) Start in the FROM clause and build a working table from all of the
    joins, unions, intersections, and whatever other table constructors are
    there. The table expression> AS <correlation name> option allows you
    give a name to this working table which you then have to use for the
    rest of the containing query.


    b) Go to the WHERE clause and remove rows that do not pass criteria;
    that is, that do not test to TRUE (reject UNKNOWN and FALSE). The WHERE
    clause is applied to the working set in the FROM clause.


    c) Go to the optional GROUP BY clause, make groups and reduce each
    group to a single row, replacing the original working table with the new
    grouped table. The rows of a grouped table must be group
    characteristics: (1) a grouping column (2) a statistic about the group
    (i.e. aggregate functions) (3) a function or (4) an expression made up
    those three items.


    d) Go to the optional HAVING clause and apply it against the grouped
    working table; if there was no GROUP BY clause, treat the entire table
    as one group.


    e) Go to the SELECT clause and construct the expressions in the list.
    This means that the scalar subqueries, function calls and expressions in
    the SELECT are done after all the other clauses are done. The "AS"
    operator can also give names to expressions in the SELECT list. These
    new names come into existence all at once, but after the WHERE clause,
    GROUP BY clause and HAVING clause has been executed; you cannot use them
    in the SELECT list or the WHERE clause for that reason.


    If there is a SELECT DISTINCT, then redundant duplicate rows are
    removed. For purposes of defining a duplicate row, NULLs are treated as
    matching (just like in the GROUP BY).


    f) Nested query expressions follow the usual scoping rules you would
    expect from a block structured language like C, Pascal, Algol, etc.
    Namely, the innermost queries can reference columns and tables in the
    queries in which they are contained.
    The most important sentence here is just right at the beginning

    quote:
    Real products will optimize things when they can.
    You can read this in the way that as long as the results are the same, any shortcut is allowed.

    --
    Frank Kalis
    Microsoft SQL Server MVP
    http://www.insidesql.de
    Heute schon gebloggt?http://www.insidesql.de/blogs
    Ich unterstütze PASS Deutschland e.V. http://www.sqlpass.de)
  8. ramasubbup New Member

    Thanks for that.

    The Step B, applies the WHERE clause on the intermediate table.

    If we change the queries(QUERIES USING ONLY INNER JOINs) as
    stated below, we can avoid STEP B....hence there should be some performance gain.

    Remove the WHERE clause. Shift the conditions specified in the WHERE clause to JOIN conditions
    ( As done in QUERY 2)


    Correct me if I am wrong.
  9. mmarovic Active Member

    You are wrong. Filtering (join condition) has to be applied anyway. Query Optimizer will try to find the best scenario when to apply condition. It doesn't affect its decision if condition is put in inner join clause or in where clause.
  10. druer New Member

    Using the LEFT OUTER JOIN can produce a completely different set of results than the WHERE clause would (as is the case for the example Madhivanan put up as an example from Northwind). Thus better/worse performance can't really be considered. (I'm still baffled by why the example returns rows that are for customers other than 'Chops' though, but it definitely returns them.)

    As indicated in the example from Madhivanan the INNER JOIN clause yields an identical execution plan as the WHERE clause and the times shown are identical both in the Query Analyzer and reported in Profiler.
  11. Adriaan New Member

    You usually see WHERE MyColumn IS NULL clauses with RIGHT JOIN or LEFT JOIN, where MyColumn is one of the join columns of the outer table.

    This way the query will find all rows from the inner table that do not have a matching row on the outer table.

    You can always rewrite an 'unmatched join' as a NOT EXISTS subquery in the WHERE clause. This often results in better response times - but it depends.

Share This Page