SQL Server Performance Forum – Threads Archive

filter criteria in Where clause Vs Along with Join

Hi, Which query will run fast? query 1
——– select a.*
from a
inner join b
on a.c1=b.c1
where a.c2 IS NULL query 2
———–
select a.*
from a
inner join b
on a.c1=b.c1
and a.c2 IS NULL
How SS will select the records? Will it does the Join operation first and filters the records specified in WHERE clause?
or filters the records specified in WHERE clause and does the Join? thanks
Ramasubbu

type this and run your Query
SET STATISTICS IO ON Also check for execution plan generated.
Mostly both will be same.

I think query1 is more efficient Madhivanan Failing to plan is Planning to fail

In the first query JOIN is happening first and WHERE CLAUSE is applied over the results
from the JOIN. In the second query JOIN and WHERE both are happening at the same time.
Second should be faster no?

Well. For Inner joins there wont be any differences but when you use Left Join Set the Execution Plan and test these USE NorthWind Select O.CustomerId,O.OrderId,OD.UnitPrice from Orders O left join [Order Details] OD
on O.OrderId=OD.OrderId and O.CustomerId=’Chops’ Select O.CustomerId,O.OrderId,OD.UnitPrice from Orders O left join [Order Details] OD
on O.OrderId=OD.OrderId where O.CustomerId=’Chops’
Madhivanan Failing to plan is Planning to fail

Yes, that’s TRUE. In the LEFT JOIN, the where condition is applied to the result sets,
which shows that join operations happes first and WHERE condition is
applied over the results. I don’t know if it is same for INNER JOIN. Execution plans are same doesn’t guarantee the same time. For less number of records there is no significant differences.
For huge volume of data, we have to go for BEST option. Can anyone know how it happens in the back ground, for both the queries?

Why do you think that same execution plans don’t guarantee the same execution time?
In a clean and sterile test scenario on a separated machine all executions expect the first one are likely to run equally long ceteris paribus. However, surely in a production environment there are a lot of other things to consider. Network traffic, concurrency… Here’s a posting by Joe Celko about how a SELECT works

quote:
Here is how a SELECT works in SQL … at least in theory. Real products
will optimize things when they can.
a) Start in the FROM clause and build a working table from all of the
joins, unions, intersections, and whatever other table constructors are
there. The table expression> AS <correlation name> option allows you
give a name to this working table which you then have to use for the
rest of the containing query.
b) Go to the WHERE clause and remove rows that do not pass criteria;
that is, that do not test to TRUE (reject UNKNOWN and FALSE). The WHERE
clause is applied to the working set in the FROM clause.
c) Go to the optional GROUP BY clause, make groups and reduce each
group to a single row, replacing the original working table with the new
grouped table. The rows of a grouped table must be group
characteristics: (1) a grouping column (2) a statistic about the group
(i.e. aggregate functions) (3) a function or (4) an expression made up
those three items.
d) Go to the optional HAVING clause and apply it against the grouped
working table; if there was no GROUP BY clause, treat the entire table
as one group.
e) Go to the SELECT clause and construct the expressions in the list.
This means that the scalar subqueries, function calls and expressions in
the SELECT are done after all the other clauses are done. The "AS"
operator can also give names to expressions in the SELECT list. These
new names come into existence all at once, but after the WHERE clause,
GROUP BY clause and HAVING clause has been executed; you cannot use them
in the SELECT list or the WHERE clause for that reason.
If there is a SELECT DISTINCT, then redundant duplicate rows are
removed. For purposes of defining a duplicate row, NULLs are treated as
matching (just like in the GROUP BY).
f) Nested query expressions follow the usual scoping rules you would
expect from a block structured language like C, Pascal, Algol, etc.
Namely, the innermost queries can reference columns and tables in the
queries in which they are contained.

The most important sentence here is just right at the beginning

quote:
Real products will optimize things when they can.

You can read this in the way that as long as the results are the same, any shortcut is allowed. —
Frank Kalis
Microsoft SQL Server MVP
http://www.insidesql.de
Heute schon gebloggt?http://www.insidesql.de/blogs
Ich unterstÃ¼tze PASS Deutschland e.V. http://www.sqlpass.de)

Thanks for that. The Step B, applies the WHERE clause on the intermediate table. If we change the queries(QUERIES USING ONLY INNER JOINs) as
stated below, we can avoid STEP B….hence there should be some performance gain. Remove the WHERE clause. Shift the conditions specified in the WHERE clause to JOIN conditions
( As done in QUERY 2)
Correct me if I am wrong.

You are wrong. Filtering (join condition) has to be applied anyway. Query Optimizer will try to find the best scenario when to apply condition. It doesn’t affect its decision if condition is put in inner join clause or in where clause.

Using the LEFT OUTER JOIN can produce a completely different set of results than the WHERE clause would (as is the case for the example Madhivanan put up as an example from Northwind). Thus better/worse performance can’t really be considered. (I’m still baffled by why the example returns rows that are for customers other than ‘Chops’ though, but it definitely returns them.) As indicated in the example from Madhivanan the INNER JOIN clause yields an identical execution plan as the WHERE clause and the times shown are identical both in the Query Analyzer and reported in Profiler.

You usually see WHERE MyColumn IS NULL clauses with RIGHT JOIN or LEFT JOIN, where MyColumn is one of the join columns of the outer table. This way the query will find all rows from the inner table that do not have a matching row on the outer table. You can always rewrite an ‘unmatched join’ as a NOT EXISTS subquery in the WHERE clause. This often results in better response times – but it depends.

]]>