SQL Server Performance

way to replace 'Exist along with subquery' in a where clause

Discussion in 'SQL Server 2005 T-SQL Performance Tuning' started by WingSzeto, Oct 2, 2008.

  1. WingSzeto Member

    I have a query as follow:
    select distinct emp_id, emp_name from tblEmployee a
    inner join tblEmploy_type_link b on a.emp_id = b.emp_id
    inner join tblGeneral_Group c on b.group_id = c.group_id
    where b.status in ( 1,2,3)
    and exists (select 1 from tblPaid_hours d
    where d.emp_id = a.emp_id and d.endtime >= '2008-08-01' and d.startime >= '2008-08-31')
    My qestion is that if there is a way to rewrite this so I don't use the 'exists subquery' part in the where clause. I use 'exists' because the query only needs to know if the employee has record(s) in the tblPaid_hours for the specified period. There are no field needed to be output from the tblPaid_Hours. The reason for the rewrite is because this subquery causes the majority of the slow down. TblPaid_hours has 9 million records and it has index on starttime, index on endtime and index on emp_id. However, the execution plan chose to use clustered index scan to do the "endtime >= '2008-08-01' and d.startime >= '2008-08-31' " part. The clustered index on tblPaid_hours is a field with identity. Also, please keep in mind that employee could have no entry in tblPaid_hours.
    If I create an index for both Endtime and Starttime, the execution time is reduced from 41 seconds to 9 seconds. But I do like to know if there is a better query design for it first.
    Any help on this is very much appreciated.
  2. Adriaan New Member

    You don't mention if there's an index on empid in tblPaid_hours - would seem like an important one.
    Did you try an IN subquery?
    AND a.empid IN (select d.empid from tblPaid_hours ........)
  3. WingSzeto Member

    Thank for the reply. I did mention there is an index on emp_id in tblPaid_hours in my original email. Maybe I should clarify more.. All the fields participated in the join and where clause has an index on it. But they all contain single field.
    Regarding trying an IN subquery, I thought using 'Exists' would be much more efficient because it doesn't need to evalue all the records, I thought as soon as the query engine detects a record satsifies the condition, it would return 'True' and done. Please advice if my understanding is not correct.
  4. Adriaan New Member

    Oops, yes - you did mention the index on emp_id.
    The idea behind EXISTS is that it will perform better if there are multiple matches for the correlated values. Adding more criteria on the subquery kind of negates that, unless you have a proper index - for instance by INCLUDE-ing the dates with the index on emp_id.
    By the way, those criteria seem a bit off: d.endtime >= '2008-08-01' and d.startime >= '2008-08-31'.
    With an endtime after Aug 1, and a startime after Aug 31, there's every chance there are no matches, which is a situation where EXISTS won't perform better than IN.
    (***) You can also do a LEFT JOIN against a derived table - which can be the subquery that you have, with part of the WHERE clause moved to the ON clause of the join, and then IS NULL criteria for the join column of the derived table, in the main WHERE statement.
    *** ... or an INNER JOIN with no criteria for the join column, to find matching rows - the idea with the left join and IS NULL criteria is for finding unmatched rows.
  5. WingSzeto Member

    Oops.... I have a typo there in my query. The part, d.starttime >='2008-08-31', should be d.starttime <='2008-08-31'.
    Yes, I thought about using the left join, but it returns more records than I want even though I use the distinct already. Maybe my left join wasn't right. I will redo it again and see what happens.
    I can't do inner join because there is a potential that no record exists for that user for a specified time period.
  6. Kobojunkie New Member

    If what you are trying to do is retrieve only the emp_id and emp_name of the employees and you are not selecting any field from table2. You only need records that have entries in the second table within the specified period, then an inner join makes sense to me.
    Distinct should work in this case since the emp_id and emp_name fields being selected are in a table A that does not contain different IDs for the same employee.
  7. Adriaan New Member

    Wingman - the date criteria still seem a little weird, but if you're certain ...
  8. Kobojunkie New Member

    Did you try thisselect
    emp_id, emp_name
    from tblEmployee a
    inner join tblEmploy_type_link b
    on a.emp_id = b.emp_id
    inner join tblGeneral_Group c
    on b.group_id = c.group_id
    and b.status in ( 1,2,3)
    inner join tblPaid_hours d
    on d.emp_id = a.emp_id
    and d.endtime >= '2008-08-01' and d.startime >= '2008-08-31'

Share This Page