Date range values – Between Months, years | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Date range values – Between Months, years

Its a situation where we have about 120M records from a view (extracted from 2-3 tables) one of the filed is DATE_ADDED (smalldatetime). We have data for about 3 years (2003, 2004, 2005). Now the requirements is pull the records based on month by month and year by year to dispaly in reports. In general the noramll SQL query would be, –I am trying to get the records for year 2003
SELECT ID, DATE_ADDED FROM SAMPLE_TABLE
WHERE DATE_ADDED >= ‘2003-01-01’ AND DATE_ADDED <= ‘2003-12-31’ or SELECT ID, DATE_ADDED FROM SAMPLE_TABLE
WHERE YEAR(DATE_ADDED) = 2003 or SELECT ID, DATE_ADDED FROM SAMPLE_TABLE
WHERE DATEPART(yyyy, DATE_ADDED) = 2003 BTW the DATE_ADDED has index on it. If I execute the above SQLs it takes 3:05 minutes to get the results. I also tried first convert the string expression ‘2003-01-01’ to a date time value and otherway around date to string and comapred, still the performance is same. Other thing I tried was getting the year part using DATEPART(yyyy,…) and YEAR(’01/12/2003′), the effect is same. Do some one have solution for this? Thanks,
Chandra Paladugu
You could consider making the index on DATE_ADDED a clustered index. If the table already has a clustered index defined on other columns, then of course you have to consider the ramifications of changing the design.
We already have a Clustered Index on other field which is the ID. I don’t know what the design ideas means? I think the issue here is the number of records. For the sake of testing I already removed the cluster index on ID field and added to DATE_ADDED, it imporved the performance a bit, but still I am clocking at 2:25 minutes. This table has just 6 columns/fields – ID, DATE_ADDED are among them – not too many though . There is nothing else I could think of doing further normalization because none of the fields data are repeated. Chandra Paladugu
You’re not satisfied with a 15% drop in query time? Remember this is a table with 120M rows. Does the DATE_ADDED field include the time, or is it 00:00:000 on all records? If you do see the time, you could create a shadow field that copies the DATE_ADDED date without the time component, and have a clustered index on that.
Thanks for your replies Adriaan.<br /><br />I work in a Search Engine realted filed, where we measure response time in milli/nano seconds, Obeviously I am not satisfied <img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ />.<br /><br />Yes Sir, the DATA_ADDED field does not include time, all time stamp section is 00:00:000.<br />What worries me more is that the number of records will grow a lot this year, that’s why the tables were created for each year seperately but merged by creating a view. Even if I try to access a single (year) table without a view the perfromance is pretty much same.<br /><br />Chandra Paladugu
You may want to try partinioned views, so whenever you’re looking for 2003 data, you’ll be accessing 2003 table. Look for "Partitioned Views" in BOL. 1. For example, you’ll create a table for each year SAMPLE_TABLE_2000, SAMPLE_TABLE_2001, SAMPLE_TABLE_2002, SAMPLE_TABLE_2003, etc. The key here is to create a CHECK CONSTRAINT on each table where you’ll check for the DATE_ADDED in your table year.
CONSTRAINT (CK_SAMPLE_TABLE2000) CHECK (DATE_ADDED) >= ’01/01/2000′ AND (DATE_ADDED) <= ’12/31/2000′)
2. You will create a partitioned view "SAMPLE_TABLE" which will be the only way the user can query your underlying tables (SAMPLE_TABLE_2000,SAMPLE_TABLE_2001, SAMPLE_TABLE_2002, SAMPLE_TABLE_2003, etc). The code in your partitioned view will look something like SELECT * FROM SAMPLE_TABLE_2000
UNION ALL
SELECT * FROM SAMPLE_TABLE_2001
UNION ALL
SELECT * FROM SAMPLE_TABLE_2002
UNION ALL
SELECT * FROM SAMPLE_TABLE_2003 With the partitioned view you can create any other index you may need in the case you are running the MS Enterprise SQL 2000. Another thing you may want to look into is moving these tables to another physical drive in the case the entire database is located in a drive that is very busy.
I would second Adriaan. It might be a good idea to create the clustered index on the datetime column. What is th actual view definition look like? —
Frank Kalis
Microsoft SQL Server MVP
http://www.insidesql.de
Ich unterstütze PASS Deutschland e.V. http://www.sqlpass.de)

Also, when I read back the original question … Why are you querying a view? I have a sneaky feeling that if you query the underlying tables directly, you might get a better response time (criteria get applied before the JOINs are set up, etc.). Else check the view definition itself for perfomance issues, columns that you do not need in this query, etc. etc. As your query centers on year + month, you might consider adding an INT column that gives you that in yyyymm format, and use this for your clustered index.
quote:–I am trying to get the records for year 2003
SELECT ID, DATE_ADDED FROM SAMPLE_TABLE
WHERE DATE_ADDED >= ‘2003-01-01’ AND DATE_ADDED <= ‘2003-12-31’ or SELECT ID, DATE_ADDED FROM SAMPLE_TABLE
WHERE YEAR(DATE_ADDED) = 2003 or SELECT ID, DATE_ADDED FROM SAMPLE_TABLE
WHERE DATEPART(yyyy, DATE_ADDED) = 2003
Which one query takes 3 minutes? What datatype is date_added column? It should be smallDateTime. Why do you need row for each single row from that year? I would understand aggregations but on detail level you may try to select top N or paging.
quote:Originally posted by saleyoun You may want to try partinioned views, so whenever you’re looking for 2003 data, you’ll be accessing 2003 table. Look for "Partitioned Views" in BOL. 1. For example, you’ll create a table for each year SAMPLE_TABLE_2000, SAMPLE_TABLE_2001, SAMPLE_TABLE_2002, SAMPLE_TABLE_2003, etc. The key here is to create a CHECK CONSTRAINT on each table where you’ll check for the DATE_ADDED in your table year.
CONSTRAINT (CK_SAMPLE_TABLE2000) CHECK (DATE_ADDED) >= ’01/01/2000′ AND (DATE_ADDED) <= ’12/31/2000′)
2. You will create a partitioned view "SAMPLE_TABLE" which will be the only way the user can query your underlying tables (SAMPLE_TABLE_2000,SAMPLE_TABLE_2001, SAMPLE_TABLE_2002, SAMPLE_TABLE_2003, etc). The code in your partitioned view will look something like SELECT * FROM SAMPLE_TABLE_2000
UNION ALL
SELECT * FROM SAMPLE_TABLE_2001
UNION ALL
SELECT * FROM SAMPLE_TABLE_2002
UNION ALL
SELECT * FROM SAMPLE_TABLE_2003 With the partitioned view you can create any other index you may need in the case you are running the MS Enterprise SQL 2000. Another thing you may want to look into is moving these tables to another physical drive in the case the entire database is located in a drive that is very busy.

The terminology we are using may be different… but what is described above is exactly what we are doing. Well regarding the replication & hardware, the data is replicated to 4-6 servers, infact our report server points at the replication server with the following h/w – Dual 3.4 Ghz, 4GB RAM. Chandra Paladugu
quote:Originally posted by mmarovic
quote:–I am trying to get the records for year 2003
SELECT ID, DATE_ADDED FROM SAMPLE_TABLE
WHERE DATE_ADDED >= ‘2003-01-01’ AND DATE_ADDED <= ‘2003-12-31’ or SELECT ID, DATE_ADDED FROM SAMPLE_TABLE
WHERE YEAR(DATE_ADDED) = 2003 or SELECT ID, DATE_ADDED FROM SAMPLE_TABLE
WHERE DATEPART(yyyy, DATE_ADDED) = 2003
Which one query takes 3 minutes? What datatype is date_added column? It should be smallDateTime. Why do you need row for each single row from that year? I would understand aggregations but on detail level you may try to select top N or paging.

All the above taking approx same amount of time 3 min, the data type is smalldatetime and the time portion of it is 00:00:000. As said in the beginning it is in a report generation process, where I have to do cumulative and/or aggregations. Chandra Paladugu
Why don’t you use sql aggregate functions to produce aggregations instead of passing huge number of rows to the client (if I understand correctly what you said)?
Yes sir, I am using aggregate functions. Following is the exact query I am using, of course I replaced original names with fictitious values. SELECT st.ID as CustNo, Sum(IsNull(SearchCount,0)) as SearchTotal
FROM DbName.dbo.SampleTable st
WHERE st.DATE_ADDED >= ‘2003-01-01’ AND st.DATE_ADDED <= ‘2003-12-31’
AND st.CustId=’ppp’
GROUP BY st.SearchNo I don’t know the following might look odd/stupid for some experts…I just found a way of making it to work after several trial and errors with resposne times about 1-2 seconds which is truly acceptable for us. Until I find another good clean way of doing, probably I will stick to it for time being. Here what I did, I created new small table (DummyDate) in that I pumped in dates from 01/01/2002 to 07/01/2005 (Today), so that it will have a all the dates and the data look like this, 01/01/2003
01/02/2003
…..
…..
…..
07/01/2005 If I do the math how big this table (DummyDate)can grow…..
No Years * 365 (days) = Total Records
10 * 365 = 3650 I modified the query as follows…. which works fine. SELECT ID as CustNo, Sum(IsNull(SearchCount,0)) as SearchTotal
FROM DbName.dbo.SampleTable WHERE st.DATE_ADDED in (select DummyDate from DateTable where DummyDate >= ‘2003-01-01’ AND DummyDate <= ‘2003-12-31’)
AND CustId=’ppp’
GROUP BY SearchNo Note: Pl pardon my typos.
Chandra Paladugu
What index is used for both queries? You group by column that is not part of select list, but you don’t have aggregate functions in each expression of select list, I mean you have st.ID. How is that possible? If you give us correct query, we may be able to help. Cheers.
quote:Originally posted by mmarovic What index is used for both queries? You group by column that is not part of select list, but you don’t have aggregate functions in each expression of select list, I mean you have st.ID. How is that possible? If you give us correct query, we may be able to help. Cheers.

Sorry it was a cut and paste error, just to simplify the query further I also removed aliases. Pl see the revised query below… SELECT ID, Sum(IsNull(SearchCount,0))
FROM DbName.dbo.SampleTable WHERE DATE_ADDED in (select DummyDate from DateTable where DummyDate >= ‘2003-01-01’ AND DummyDate <= ‘2003-12-31’)
AND CustId=’ppp’
GROUP BY ID

Ok. What does execution plan look like? Are we talking about indexed view?
I uploaded the image to my site at tripod, please see the execution plan at
http://cpaladugu.tripod.com/
Can you text version of execution plan please? Both for original query and modified one.
]]>