SQL Server Performance

NOT IN & IN Differences

Discussion in 'SQL Server 2005 T-SQL Performance Tuning' started by chrismcv, Jun 19, 2008.

  1. chrismcv New Member

    I've encountered an issue with a query, and am somewhat stumped by it.
    SELECT stockID, stockDescription, stockTotalPrice,
    WHERE stockInUse=1 AND stockLocationID = 2
    AND stockID NOT IN (/* A very long list of items (800 stock ID's)*/)
    This takes about 3 minutes to run on a reasonable sized database. However, by nesting the IN statement as below, it takes approximately 1 second.... When I Show the Execution plan on SSMS it says query one cost is 43% and query two is 57%, relative to a batch with just the two statements. I don't really understand that either.
    SELECT stockID, stockDescription, stockTotalPrice,
    WHERE stockInUse=1 AND stockLocationID = 2
    AND stockID NOT IN (SELECT stockID FROM tblStock WHERE stockID IN (/* A very long list of items (800 stock ID's)*/))
    Any thoughts would be most welcome.
  2. Adriaan New Member

    NOT IN is known to be a poor performer. And a query with 800 filter values can definitely have poor performance all by itself.
    How many matches does your table actually have for those 800 values? And how many non-matches?
    Finally, if your list of 800 filter values is applied to a FK column, then this may be a clumsy way of filtering on a (number of) common value(s) on column(s) on corresponding rows in the RK table.
  3. chrismcv New Member

    thanks for the reply....
    I appreciate NOT IN performance isn't good to start with, but the list is coming from a client application and the information isn't available elsewhere in the database - so I think its the best way to achieve the result I'm after.
    There will be 800 matches and 50000 non matches in the data that I have currently.
    The 800 filter list is applied to a PK column - I'm not sure what the RK table is?
    My curiosity is more around why the second query which does the same thing returns so much faster than the first?
    Thanks again,
  4. FrankKalis Moderator

    How have you done this testing? I wouldn't be suprised if this observation is due to data caching that lets the second query appear to be faster. Have you cleared the cache between both statements?
  5. chrismcv New Member

    I have done it in testing...
    I have swapped the order of queries round and get the same timings, so am happy to eliminate testing.
  6. Expansion New Member

    I have two comments:
    1. the querycost in SSMS doesn't match your timing. That can be an issue with Windows regional Options. If your setting for decimal symbol is a , (comma) in stead of . (dot) SSMS doesn't show the decimals right. First change your your Regional Option to English or decimal symbol to . and then start SSMS.
    2. look into the queryplan if there are differences. For instance with one query it may probe every NOT IN value in the table and with the other it may make a intermediate resultset and join that to test the existence of a value.
    Succes, Cor Westra
  7. cascred New Member

    The second query is using an index for the NOT IN Clause. I'm willing to bet that there is an index on the stockID column in tblStock. The first query compares to /* A very long list of items (800 stock ID's)*/ and this is an unordred, unindexed list.
    What would be even better yet is to create another table with your very long list of 800 items in an indexed column in that table and use that table in your query.

Share This Page