different joins | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

different joins

Hi, I’m doing my master’s thesis in SQL Server 2000 where I analyze and try to optimize some searches to a database. I have a query to a table with the columns (col1 int, col2 int, col3 int). The table contains 7000000 rows.
The question looks like:
SELECT DISTINCT T1.col1
FROM table1 T1
inner join table1 T2
on T1.col1 = T2.col1
inner join table1 T3
on T1.col1 = T3.col1
inner join table1 T4
on T1.col1 = T4.col1
inner join table1 T5
on T1.col1 = T5.col1
inner join table1 T6
on T1.col1 = T6.col1
inner join table1 T7
on T1.col1 = T7.col1
inner join table1 T8
on T1.col1 = T8.col1
inner join table1 T9
on T1.col1 = T9.col1
inner join table1 T10
on T1.col1 = T10.col1
inner join table1 T11
on T1.col1 = T10.col1
inner join table1 T12
on T1.col1 = T10.col1
WHERE T1.col2 = 1234
AND T2.col2 = 2345
AND T3.col2 = 3456
AND T4.col2 = 4567
AND T5.col2 = 5678
AND T6.col2 = 6789
AND T7.col2 = 7890
AND T8.col2 = 8901
AND T9.col2 = 9012
AND T10.col2=10000
AND T11.col2=12000
AND T12.col2=14000 I had a clustered index on (col1, col2, col3) and a nonclustered on (col2, col1)
The execution plan shows that the nonclustered index is used. When I changed the clustered index to (col2, col1, col3), execution plan still uses the nonclustered. Why? Without any join hints, execution plan joins six of T1-T12 with hash match, then this is sorted and then merge join is used to the last of T1-T12.
I tried join hints and got the result that the query with the hash match-hint executes 1 second faster than the query used without hints, but the querycost is the same. Why is the hash/merge-method choosen instead of the hash-method? If I remove "inner join table T12 …" and "T12.col2=14000" execution plan uses hash match to join all of T1 to T11. Why does it use merge when there’s more than 11 joins? I’m grateful for all help.
Hi ya, hmm not sure if I can answer all the questions you have… a clustered index is not actually a very efficient index… it’s lowest level is the actual data in the table, so an index scan like the one you’re doing results in a what is effectively a full table scan. in terms of the hash versus merge join I can only assume that there is a threshold that you are reaching in the optimiser in terms of attempting to further optimise the execution plan. does the following query produce a different plan? Cheers
Twan SELECT DISTINCT T1.col1
FROM table1 T1
inner join table1 T2
on T1.col1 = T2.col1
AND T2.col2 = 2345
inner join table1 T3
on T1.col1 = T3.col1
AND T3.col2 = 3456
inner join table1 T4
on T1.col1 = T4.col1
AND T4.col2 = 4567
inner join table1 T5
on T1.col1 = T5.col1
AND T5.col2 = 5678
inner join table1 T6
on T1.col1 = T6.col1
AND T6.col2 = 6789
inner join table1 T7
on T1.col1 = T7.col1
AND T7.col2 = 7890
inner join table1 T8
on T1.col1 = T8.col1
AND T8.col2 = 8901
inner join table1 T9
on T1.col1 = T9.col1
AND T9.col2 = 9012
inner join table1 T10
on T1.col1 = T10.col1
AND T10.col2=10000
inner join table1 T11
on T1.col1 = T10.col1
AND T11.col2=12000
inner join table1 T12
on T1.col1 = T10.col1
AND T12.col2=14000
WHERE T1.col2 = 1234

Hi, I got the same execution plan with that code… here is how it looks
(SA = Stream Aggregate, MJ=Merge Join, HM=Hash Match, IS=Index Seek) SELECT
|
SA
|
MJ – IS
|
MJ – IS
|
MJ – IS
|
MJ – IS
|
MJ – IS
|
MJ – IS
|
SORT
|
HM – IS
|
HM – IS
|
HM – IS
|
HM – IS
|
HM – IS
|
IS I thought that a clustered index on (col2, col1, col3) is sorted in that order so the data will be found faster than with a nonclustered… at least in my query. Have I misunderstood indexes?
Okay, I might have missed something in your maze of JOINs, but as far as I see it’s quite logical that SQL Server uses the nonclustered index. Since your query only asks for col1 and col2 in different variations, the nonclustered index is covering your query, that means it is a covered index in the case and is actually one of the fastest methods for data retrieval. SQL Server doesn’t even need to touch the data pages, but only needs to traverse the nonclustered index to satisfy your query. As for the joining strategy I tend to agree with Twan. And yes, you seem to have some misunderstanding about indexes. Both kinds of indexes are basically the same. They are arranged in B-trees and simply speaken, differ only at the leaf level. While the clustered index contains there the actual data pages, the nonclustered index contains there a pointer which "points" to the physical location. It would lead too far here, to explain indexes in detail, so I suggest you read up this topic in BOL and come back with further questions.
–Frank
http://www.insidesql.de

If I have one clustered index (col2, col1, col3) and one nonclustered index also on (col2, col1, col3), the nonclustered is choosen. What I’m trying to find out is why the nonclustered performs better, than the clustered index, although the nonclustered has a "longer way" to go to find the actual data. (clustered index has the data in the leaf level and nonlustered has the data in the data pages which are pointed from the leaf level) Thanks
The key is that the nonclustered index is a covering index in your query.
SQL Server doesn’t need to touch the actual data pages, but only need to traverse the b-tree of the nonclustered index to satisfy the query and therefore can avoid roundtrips and therefore is faster than a clustered index in your case.
You’re right that nonclustered indexes usually have a longer way to go. That’s why they only make sense when your SARG is highly selective. HTH

–Frank
http://www.insidesql.de

]]>