SQL Server Performance

Strange performance problem on bigint column

Discussion in 'SQL Server 2005 Performance Tuning for DBAs' started by mccabe, Jan 16, 2008.

  1. mccabe New Member

    Hi!
    I have a join which looks something like this:
    SELECT ...
    FROM @memberselection t
    INNER JOIN order o
    ON o.member_id = t.member_id
    INNER JOIN transactions t
    ON t.order_id = o.order_id
    For a large set in @memberselection this takes forever. For a selection of about 1000 members, it takes 30 second.
    PK on @memberselect is member_id.
    I have an index on order table on member_id. Member_id is VARCHAR(30).
    If I complement the join between order and @memberselection like below, time will drop to less than 1 second, even on larger sets!
    SELECT ...
    FROM @memberselection t
    INNER JOIN order o
    ON o.member_id = t.member_id
    AND o.customer_id = t.customer_id <--- Added join
    INNER JOIN transactions t
    ON t.order_id = o.order_id
    there is an index on order table on customer_id and customer_id is INTEGER.
    I thought that this big difference was because the server had an easier time dealing with INT's then VARCHARs, so I tried changing the member_id to BIGINT instead (its too large to use INT)
    Then time for query without customer_id dropped to 14 seconds, about half compared to VARCHAR columns.

    Should there really be such a big difference in processing time when joining on INT vs BIGINT columns!?
    I cannot see any other differences (except that the query plan when joining on customer_id needs some extra step to lookup member_id, so it should really be slower!).
    In both cases it uses the expected index and the indices are identical except one starts with member_id and one with customer_id.

    I can of course change all my joins to include customer_id, but Im interested in why this phenomenon happens in the first place...
    I can also mention that stats are updated and query cache cleared between each test..

    cheers
    /Linus





  2. Adriaan New Member

    The major difference would be between VARCHAR and numeric, not between INT and BIGINT. There's just more processing involved in joining on character data, that's why the common suggestion is to add a substitute key, like an identity column, to which you can refer in any FK relationship, and which is the better option for JOINs anyway.
    When you changed the VARCHAR number to a BIGINT, you probably had to drop some markers in the number, like hyphens. With the substitute key, the VARCHAR column can stay as it is.
  3. mccabe New Member

    Hi
    Thanks for the reply!

    In this case I did not have to loose any data (the member numbers were already numbers only), so it would be ok for me to change. Adding a substitute key would be an option but if i need to change all code i might as well just add joins on customer_id and the (performance) problem is solved...
    The question I really want answered is why is there such a a difference when joining on the bigint column instead of the int column...
    A workmate suggested it might be because the customer_id data is much nicer than the member_id data. Customer_is pretty much an unbroken sequence of numbers, whereas member_id has a lot of holes in the series (probably about every 8th number is used only). Could this be the reason?
    /L
  4. Adriaan New Member

    Compare execution plans. I expect that member_id is highly selective on the Order table.
  5. mccabe New Member

    Sorry, I do not understand exactly what you mean.
    The query plans are near identical.
    /L

  6. Adriaan New Member

    You said: "In both cases it uses the expected index and the indices are identical except one starts with member_id and one with customer_id."
    And the best performing plan uses the index starting with member_id? Makes sense about the selectivity of member_id on your Order table.
  7. mccabe New Member

    No, I think you misunderstood me...
    When using only join on member_id, it will use index on member_id and query is slow.
    When using join on customer_id, it will use index on customer_id and query is fast.. and as a i mentioned there is a BIG difference on slow and fast here :)
    /L

  8. Adriaan New Member

    Sorry about that. If the join on customer_id is performing better, then it is probably because there are more repeating values for member_id than for customer_id, so customer_id is more selective.
    Just goes to show you can design an index according to some very clean logic, but the actual data stored in the table can still have an unexpected impact on performance.
  9. mccabe New Member

    Hi!
    This seems like a valid explanation to me.
    But then I learned I could check index density and the one for member_id (the bad performing one) is even lower than the customer_id one (both are quite close to zero though). If it was really a problem about selectivity, shouldnt the index density indicate this?
    cheers
    /Linus


  10. Adriaan New Member

    I think we're back to the initial point here - a join on numeric columns will outperform a join on alphanumeric columns.
    Not sure why we got sidetracked here.[;)]
  11. mccabe New Member

    hehe.. yeah, ok.. i buy the alphanumeric explanation, but i still dont get the 15 vs 1 second difference on BIGINT vs INT...
    /L

Share This Page