SQL Server Performance

Sort order after the result of a hash join?

Discussion in 'SQL Server 2008 T-SQL Performance Tuning' started by Gary ranson, Nov 15, 2011.

  1. Gary ranson New Member

    Can some clever spark out there explain why, when I have a sorted stream (eg. using clustered index) which I then perform a hash join, result in an unordered stream?

    In the example below, basket table and basket_item table both have clustered indexed sorted by basket_id asc. without the inner hash join between product and basket, the query is happy merge joining without requiring a sort.

    As soon as I put the the hash join in it performs the hash join on the sorted stream (basket_item) - but then has to sort the resultant stream before merge joining to the basket table. I have over 800M rows in the basket table - so this is not a good plan!

    The query below is a little contrived to demonstrate the problem - my "real" query does not involve taking the min(major_department_code) - but if anyone has any idea how to remove this redundant sort, then please let me know.
    I have considered that it may have been to do with a parallel process, but seting MAXDOP to 1 does not resolve the query either.

    I'm sure there may be a technical reason for this, but the hash join process is described as "use each row from the top input to build a hash table, and each row in the bottom to probe into the hash table, outputing all matched results" - my "top" input is the product table - and my bottom query is the basket_item - so why, if the bottom is sorted, would it want to re-sort itself after the hash join?

    select bb.basket_id
    , bi.min_mdc
    from mart.basket bb
    left merge join
        select      a.basket_id
        ,          min_mdc = min(p.major_department_code)
        from        mart.product p
        inner hash join mart.basket_item a
        on          p.product_id = a.product_id
        group by    a.basket_id
    ) bi
    on  bi.basket_id = bb.basket_id

  2. satya Moderator

    You are using an Index Scan here, and thus it is a VERY efficient query to begin with since it isn't even touching the base table at all. And a Hash is still probably the most efficient way to resolve this query. Try forcing a LOOP JOIN and MERGE JOIN to see if it changes.

Share This Page