Sort order after the result of a hash join? | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Sort order after the result of a hash join?

Can some clever spark out there explain why, when I have a sorted stream (eg. using clustered index) which I then perform a hash join, result in an unordered stream? In the example below, basket table and basket_item table both have clustered indexed sorted by basket_id asc. without the inner hash join between product and basket, the query is happy merge joining without requiring a sort. As soon as I put the the hash join in it performs the hash join on the sorted stream (basket_item) – but then has to sort the resultant stream before merge joining to the basket table. I have over 800M rows in the basket table – so this is not a good plan! The query below is a little contrived to demonstrate the problem – my "real" query does not involve taking the min(major_department_code) – but if anyone has any idea how to remove this redundant sort, then please let me know.
I have considered that it may have been to do with a parallel process, but seting MAXDOP to 1 does not resolve the query either. I’m sure there may be a technical reason for this, but the hash join process is described as "use each row from the top input to build a hash table, and each row in the bottom to probe into the hash table, outputing all matched results" – my "top" input is the product table – and my bottom query is the basket_item – so why, if the bottom is sorted, would it want to re-sort itself after the hash join?
select bb.basket_id
, bi.min_mdc
from mart.basket bb
left merge join
    select      a.basket_id
    ,          min_mdc = min(p.major_department_code)
    from        mart.product p
    inner hash join mart.basket_item a
    on          p.product_id = a.product_id
    group by    a.basket_id
) bi
on  bi.basket_id = bb.basket_id

You are using an Index Scan here, and thus it is a VERY efficient query to begin with since it isn’t even touching the base table at all. And a Hash is still probably the most efficient way to resolve this query. Try forcing a LOOP JOIN and MERGE JOIN to see if it changes.

Software Reviews | Book Reviews | FAQs | Tips | Articles | Performance Tuning | Audit | BI | Clustering | Developer | Reporting | DBA | ASP.NET Ado | Views tips | | Developer FAQs | Replication Tips | OS Tips | Misc Tips | Index Tuning Tips | Hints Tips | High Availability Tips | Hardware Tips | ETL Tips | Components Tips | Configuration Tips | App Dev Tips | OLAP Tips | Admin Tips | Software Reviews | Error | Clustering FAQs | Performance Tuning FAQs | DBA FAQs |