understanding query plan | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

understanding query plan

hi,<br />i am running the folowing query:<br />SELECT T1.a, T1.b , T1.c, T1.d, T2.a<br />From T1 Inner Join T2<br />ON T1.b = T2.b <br />Where T1.a = 1 (a range, not a single row)<br /><br />T1: 5.5 million rows,<br /> has clustered index on column a<br /><br />T2: 13 million rows,<br /> has clustered index on column b and non-clustered index on colum a (not a covering index)<br /><br />when i look at the plan i see the the optimizer does NOT use the clustered index T2(b) to join the result from T1 (where the clustered index on column a is used), but instead, uses the non-clustetred index T2(a) which does not cover b column.<br />the optinizer choose hash join and it is faster than if i force plan to use the clusterded index by WITH hint or by LOOP join hint.<br />Why is it better not to use in that case the clustered index on a join column. when T2.a is not selected, the optimizer, again, does not use the clustered index but instead use another non-clusterd index on column that is not selected in the query?<br /><br />the plan:<br /> |–Parallelism(Gather Streams)<br /> |–Hash Match(Inner Join, HASH<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[Expr1004])=([T2].<b>), RESIDUAL<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[Expr1004]=[T2].<b>))<br /> |–Bitmap(HASH<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[Expr1004]), DEFINE<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[Bitmap1005]))<br /> | |–Parallelism(Repartition Streams, PARTITION COLUMNS<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[Expr1004]))<br /> | |–Compute Scalar(DEFINE<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[Expr1004]=Convert([T1].<b>)))<br /> | |–Clustered Index Seek(OBJECT<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[datamart].[dbo].[T1].[a]), SEEK<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[T1].[a]=1) ORDERED FORWARD)<br /> |–Parallelism(Repartition Streams, PARTITION COLUMNS<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[T2].<b>), WHERE<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />PROBE([Bitmap1005])=TRUE))<br /> |–Index Scan(OBJECT<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[datamart].[dbo].[T2].[idx_suppression]))<br /><br /><br />any new information will be appreciated.<br />tahnks.<br /><br /><br />(config: SQL Server 2000: 4 CPU, 3.5GB RAM)
HOw about the performance of the query used, rather from query plan? Satya SKJ
A hash join is generally much slower than a loop join, and should be avoided, which you already know. But because of this, you would think that the optimizer would be smart enough to figure this out and use the clustered index, not the non-clustered index. But unfortunately, this is not always the case, sometimes the optimizer makes mistakes, and the use of a hash join indicates to me that it is making a mistake, since a loop join is usually superior. Generally speaking, the optimizer only uses a hash join if there are no other choices, but in your case their is. So one of two things is most likely happening. Either the statistics are outdated and the optimizer is making a bad decision because of the out dated statistics, or the optimizer is simply making a mistake. If I was you, my first step would be to update the statistics and see if this fixes the problem. But if it doesn’t, then I would use a hint in the query to make it perform properly. Then I would continue to watch this query over time, and checking what happens after you apply a new service pack. Often, Microsoft will include fixes in a service pack that might (or might not) fix the problem. So by checking this query after adding a service pack, you will then determine if you still need the hint or not. Generally, I avoid hints, but sometimes they are justified, as is perhaps in your case.
——————
Brad M. McGehee
Webmaster
SQL-Server-Performance.Com
]]>