I have a query as follow: select distinct emp_id, emp_name from tblEmployee a inner join tblEmploy_type_link b on a.emp_id = b.emp_id inner join tblGeneral_Group c on b.group_id = c.group_id where b.status in ( 1,2,3) and exists (select 1 from tblPaid_hours d where d.emp_id = a.emp_id and d.endtime >= '2008-08-01' and d.startime >= '2008-08-31') My qestion is that if there is a way to rewrite this so I don't use the 'exists subquery' part in the where clause. I use 'exists' because the query only needs to know if the employee has record(s) in the tblPaid_hours for the specified period. There are no field needed to be output from the tblPaid_Hours. The reason for the rewrite is because this subquery causes the majority of the slow down. TblPaid_hours has 9 million records and it has index on starttime, index on endtime and index on emp_id. However, the execution plan chose to use clustered index scan to do the "endtime >= '2008-08-01' and d.startime >= '2008-08-31' " part. The clustered index on tblPaid_hours is a field with identity. Also, please keep in mind that employee could have no entry in tblPaid_hours. If I create an index for both Endtime and Starttime, the execution time is reduced from 41 seconds to 9 seconds. But I do like to know if there is a better query design for it first. Any help on this is very much appreciated. wingman
You don't mention if there's an index on empid in tblPaid_hours - would seem like an important one. Did you try an IN subquery? AND a.empid IN (select d.empid from tblPaid_hours ........)
Adriaan, Thank for the reply. I did mention there is an index on emp_id in tblPaid_hours in my original email. Maybe I should clarify more.. All the fields participated in the join and where clause has an index on it. But they all contain single field. Regarding trying an IN subquery, I thought using 'Exists' would be much more efficient because it doesn't need to evalue all the records, I thought as soon as the query engine detects a record satsifies the condition, it would return 'True' and done. Please advice if my understanding is not correct. wingman
Oops, yes - you did mention the index on emp_id. The idea behind EXISTS is that it will perform better if there are multiple matches for the correlated values. Adding more criteria on the subquery kind of negates that, unless you have a proper index - for instance by INCLUDE-ing the dates with the index on emp_id. By the way, those criteria seem a bit off: d.endtime >= '2008-08-01' and d.startime >= '2008-08-31'. With an endtime after Aug 1, and a startime after Aug 31, there's every chance there are no matches, which is a situation where EXISTS won't perform better than IN. (***) You can also do a LEFT JOIN against a derived table - which can be the subquery that you have, with part of the WHERE clause moved to the ON clause of the join, and then IS NULL criteria for the join column of the derived table, in the main WHERE statement. *** ... or an INNER JOIN with no criteria for the join column, to find matching rows - the idea with the left join and IS NULL criteria is for finding unmatched rows.
Oops.... I have a typo there in my query. The part, d.starttime >='2008-08-31', should be d.starttime <='2008-08-31'. Yes, I thought about using the left join, but it returns more records than I want even though I use the distinct already. Maybe my left join wasn't right. I will redo it again and see what happens. I can't do inner join because there is a potential that no record exists for that user for a specified time period. wingman
If what you are trying to do is retrieve only the emp_id and emp_name of the employees and you are not selecting any field from table2. You only need records that have entries in the second table within the specified period, then an inner join makes sense to me. Distinct should work in this case since the emp_id and emp_name fields being selected are in a table A that does not contain different IDs for the same employee.
Did you try thisselect distinct emp_id, emp_name from tblEmployee a inner join tblEmploy_type_link b on a.emp_id = b.emp_id inner join tblGeneral_Group c on b.group_id = c.group_id and b.status in ( 1,2,3) inner join tblPaid_hours d on d.emp_id = a.emp_id and d.endtime >= '2008-08-01' and d.startime >= '2008-08-31'