SQL Server Performance

Date range values - Between Months, years

Discussion in 'T-SQL Performance Tuning for Developers' started by cspaladugu, Jun 30, 2005.

  1. cspaladugu New Member

    Its a situation where we have about 120M records from a view (extracted from 2-3 tables) one of the filed is DATE_ADDED (smalldatetime). We have data for about 3 years (2003, 2004, 2005). Now the requirements is pull the records based on month by month and year by year to dispaly in reports.

    In general the noramll SQL query would be,

    --I am trying to get the records for year 2003
    SELECT ID, DATE_ADDED FROM SAMPLE_TABLE
    WHERE DATE_ADDED >= '2003-01-01' AND DATE_ADDED <= '2003-12-31'

    or

    SELECT ID, DATE_ADDED FROM SAMPLE_TABLE
    WHERE YEAR(DATE_ADDED) = 2003

    or

    SELECT ID, DATE_ADDED FROM SAMPLE_TABLE
    WHERE DATEPART(yyyy, DATE_ADDED) = 2003

    BTW the DATE_ADDED has index on it.

    If I execute the above SQLs it takes 3:05 minutes to get the results.

    I also tried first convert the string expression '2003-01-01' to a date time value and otherway around date to string and comapred, still the performance is same.

    Other thing I tried was getting the year part using DATEPART(yyyy,...) and YEAR('01/12/2003'), the effect is same.

    Do some one have solution for this?

    Thanks,


    Chandra Paladugu
  2. Adriaan New Member

    You could consider making the index on DATE_ADDED a clustered index. If the table already has a clustered index defined on other columns, then of course you have to consider the ramifications of changing the design.
  3. cspaladugu New Member

    We already have a Clustered Index on other field which is the ID. I don't know what the design ideas means? I think the issue here is the number of records. For the sake of testing I already removed the cluster index on ID field and added to DATE_ADDED, it imporved the performance a bit, but still I am clocking at 2:25 minutes.

    This table has just 6 columns/fields - ID, DATE_ADDED are among them - not too many though . There is nothing else I could think of doing further normalization because none of the fields data are repeated.

    Chandra Paladugu
  4. Adriaan New Member

    You're not satisfied with a 15% drop in query time? Remember this is a table with 120M rows.

    Does the DATE_ADDED field include the time, or is it 00:00:000 on all records? If you do see the time, you could create a shadow field that copies the DATE_ADDED date without the time component, and have a clustered index on that.
  5. cspaladugu New Member

    Thanks for your replies Adriaan.<br /><br />I work in a Search Engine realted filed, where we measure response time in milli/nano seconds, Obeviously I am not satisfied <img src='/community/emoticons/emotion-1.gif' alt=':)' />.<br /><br />Yes Sir, the DATA_ADDED field does not include time, all time stamp section is 00:00:000.<br />What worries me more is that the number of records will grow a lot this year, that's why the tables were created for each year seperately but merged by creating a view. Even if I try to access a single (year) table without a view the perfromance is pretty much same.<br /><br />Chandra Paladugu
  6. saleyoun New Member

    You may want to try partinioned views, so whenever you're looking for 2003 data, you'll be accessing 2003 table. Look for "Partitioned Views" in BOL.

    1. For example, you'll create a table for each year SAMPLE_TABLE_2000, SAMPLE_TABLE_2001, SAMPLE_TABLE_2002, SAMPLE_TABLE_2003, etc. The key here is to create a CHECK CONSTRAINT on each table where you'll check for the DATE_ADDED in your table year.
    CONSTRAINT (CK_SAMPLE_TABLE2000) CHECK (DATE_ADDED) >= '01/01/2000' AND (DATE_ADDED) <= '12/31/2000')


    2. You will create a partitioned view "SAMPLE_TABLE" which will be the only way the user can query your underlying tables (SAMPLE_TABLE_2000,SAMPLE_TABLE_2001, SAMPLE_TABLE_2002, SAMPLE_TABLE_2003, etc). The code in your partitioned view will look something like

    SELECT * FROM SAMPLE_TABLE_2000
    UNION ALL
    SELECT * FROM SAMPLE_TABLE_2001
    UNION ALL
    SELECT * FROM SAMPLE_TABLE_2002
    UNION ALL
    SELECT * FROM SAMPLE_TABLE_2003

    With the partitioned view you can create any other index you may need in the case you are running the MS Enterprise SQL 2000. Another thing you may want to look into is moving these tables to another physical drive in the case the entire database is located in a drive that is very busy.

  7. FrankKalis Moderator

    I would second Adriaan. It might be a good idea to create the clustered index on the datetime column. What is th actual view definition look like?

    --
    Frank Kalis
    Microsoft SQL Server MVP
    http://www.insidesql.de
    Ich unterstütze PASS Deutschland e.V. http://www.sqlpass.de)
  8. Adriaan New Member

    Also, when I read back the original question ...

    Why are you querying a view? I have a sneaky feeling that if you query the underlying tables directly, you might get a better response time (criteria get applied before the JOINs are set up, etc.).

    Else check the view definition itself for perfomance issues, columns that you do not need in this query, etc. etc.

    As your query centers on year + month, you might consider adding an INT column that gives you that in yyyymm format, and use this for your clustered index.
  9. mmarovic Active Member

    quote:--I am trying to get the records for year 2003
    SELECT ID, DATE_ADDED FROM SAMPLE_TABLE
    WHERE DATE_ADDED >= '2003-01-01' AND DATE_ADDED <= '2003-12-31'

    or

    SELECT ID, DATE_ADDED FROM SAMPLE_TABLE
    WHERE YEAR(DATE_ADDED) = 2003

    or

    SELECT ID, DATE_ADDED FROM SAMPLE_TABLE
    WHERE DATEPART(yyyy, DATE_ADDED) = 2003
    Which one query takes 3 minutes? What datatype is date_added column? It should be smallDateTime. Why do you need row for each single row from that year? I would understand aggregations but on detail level you may try to select top N or paging.
  10. cspaladugu New Member

    quote:Originally posted by saleyoun

    You may want to try partinioned views, so whenever you're looking for 2003 data, you'll be accessing 2003 table. Look for "Partitioned Views" in BOL.

    1. For example, you'll create a table for each year SAMPLE_TABLE_2000, SAMPLE_TABLE_2001, SAMPLE_TABLE_2002, SAMPLE_TABLE_2003, etc. The key here is to create a CHECK CONSTRAINT on each table where you'll check for the DATE_ADDED in your table year.
    CONSTRAINT (CK_SAMPLE_TABLE2000) CHECK (DATE_ADDED) >= '01/01/2000' AND (DATE_ADDED) <= '12/31/2000')


    2. You will create a partitioned view "SAMPLE_TABLE" which will be the only way the user can query your underlying tables (SAMPLE_TABLE_2000,SAMPLE_TABLE_2001, SAMPLE_TABLE_2002, SAMPLE_TABLE_2003, etc). The code in your partitioned view will look something like

    SELECT * FROM SAMPLE_TABLE_2000
    UNION ALL
    SELECT * FROM SAMPLE_TABLE_2001
    UNION ALL
    SELECT * FROM SAMPLE_TABLE_2002
    UNION ALL
    SELECT * FROM SAMPLE_TABLE_2003

    With the partitioned view you can create any other index you may need in the case you are running the MS Enterprise SQL 2000. Another thing you may want to look into is moving these tables to another physical drive in the case the entire database is located in a drive that is very busy.



    The terminology we are using may be different... but what is described above is exactly what we are doing. Well regarding the replication & hardware, the data is replicated to 4-6 servers, infact our report server points at the replication server with the following h/w - Dual 3.4 Ghz, 4GB RAM.

    Chandra Paladugu
  11. cspaladugu New Member

    quote:Originally posted by mmarovic


    quote:--I am trying to get the records for year 2003
    SELECT ID, DATE_ADDED FROM SAMPLE_TABLE
    WHERE DATE_ADDED >= '2003-01-01' AND DATE_ADDED <= '2003-12-31'

    or

    SELECT ID, DATE_ADDED FROM SAMPLE_TABLE
    WHERE YEAR(DATE_ADDED) = 2003

    or

    SELECT ID, DATE_ADDED FROM SAMPLE_TABLE
    WHERE DATEPART(yyyy, DATE_ADDED) = 2003
    Which one query takes 3 minutes? What datatype is date_added column? It should be smallDateTime. Why do you need row for each single row from that year? I would understand aggregations but on detail level you may try to select top N or paging.

    All the above taking approx same amount of time 3 min, the data type is smalldatetime and the time portion of it is 00:00:000. As said in the beginning it is in a report generation process, where I have to do cumulative and/or aggregations.

    Chandra Paladugu
  12. mmarovic Active Member

    Why don't you use sql aggregate functions to produce aggregations instead of passing huge number of rows to the client (if I understand correctly what you said)?
  13. cspaladugu New Member

    Yes sir, I am using aggregate functions. Following is the exact query I am using, of course I replaced original names with fictitious values.

    SELECT st.ID as CustNo, Sum(IsNull(SearchCount,0)) as SearchTotal
    FROM DbName.dbo.SampleTable st
    WHERE st.DATE_ADDED >= '2003-01-01' AND st.DATE_ADDED <= '2003-12-31'
    AND st.CustId='ppp'
    GROUP BY st.SearchNo

    I don't know the following might look odd/stupid for some experts...I just found a way of making it to work after several trial and errors with resposne times about 1-2 seconds which is truly acceptable for us. Until I find another good clean way of doing, probably I will stick to it for time being.

    Here what I did, I created new small table (DummyDate) in that I pumped in dates from 01/01/2002 to 07/01/2005 (Today), so that it will have a all the dates and the data look like this,

    01/01/2003
    01/02/2003
    .....
    .....
    .....
    07/01/2005

    If I do the math how big this table (DummyDate)can grow.....
    No Years * 365 (days) = Total Records
    10 * 365 = 3650

    I modified the query as follows.... which works fine.

    SELECT ID as CustNo, Sum(IsNull(SearchCount,0)) as SearchTotal
    FROM DbName.dbo.SampleTable WHERE st.DATE_ADDED in (select DummyDate from DateTable where DummyDate >= '2003-01-01' AND DummyDate <= '2003-12-31')
    AND CustId='ppp'
    GROUP BY SearchNo

    Note: Pl pardon my typos.


    Chandra Paladugu
  14. mmarovic Active Member

    What index is used for both queries? You group by column that is not part of select list, but you don't have aggregate functions in each expression of select list, I mean you have st.ID. How is that possible? If you give us correct query, we may be able to help.

    Cheers.
  15. cspaladugu New Member

    quote:Originally posted by mmarovic

    What index is used for both queries? You group by column that is not part of select list, but you don't have aggregate functions in each expression of select list, I mean you have st.ID. How is that possible? If you give us correct query, we may be able to help.

    Cheers.

    Sorry it was a cut and paste error, just to simplify the query further I also removed aliases. Pl see the revised query below...

    SELECT ID, Sum(IsNull(SearchCount,0))
    FROM DbName.dbo.SampleTable WHERE DATE_ADDED in (select DummyDate from DateTable where DummyDate >= '2003-01-01' AND DummyDate <= '2003-12-31')
    AND CustId='ppp'
    GROUP BY ID
  16. mmarovic Active Member

    Ok. What does execution plan look like? Are we talking about indexed view?
  17. cspaladugu New Member

    I uploaded the image to my site at tripod, please see the execution plan at
    http://cpaladugu.tripod.com/
  18. mmarovic Active Member

    Can you text version of execution plan please? Both for original query and modified one.

Share This Page