SQL Server Performance

Implementing an Archiving solution in SSIS

Discussion in 'SQL Server 2005 Integration Services' started by gdmort, Jan 29, 2009.

  1. gdmort New Member

    Hi All,
    I'm busy trying to implement an SSIS package solution to move data from 100s of tables in an operational database into an archive database.
    The 2 databases have exactly the same structure in terms of tables etc.
    Currently each table in the operational database has a flag column which I use to expire data when new data is loaded. I have a table that list the tables in the database along with an archive flag to show whether or not a table needs archiving.
    My SSIS package currently iterates through all the tables that need archiving (in a foreach loop container).
    For each table I check in the archive database if we already have data for a date that we are trying to archive. If so, I delete that data using an "Execute SQL task" on a "DELETE FROM ... WHERE..." query that I build using a script component.
    I then copy the data across to the archive database using an "Execute SQL task" on an "INSERT INTO ... SELECT FROM" query that I build using a script component. I then do a row count check to make sure all the data has been transferred and then delete the data from the operational database.
    I then write an entry to a logging table stating how many rows where archived and for what dates.
    The deletes are taking a long time and I need to speed the whole process up. One thing I'm about to try is to do the deletes in batches to avoid filling the log. I've been doing some reading up around improving performance on SSIS and it seems as though many optimisations can be done if data flow components are used. However my solution doesn't use any.
    I'm looking for any suggestions I can get to improve the performance. I don't mind starting from scratch if SSIS is not the right way to go.
    Thanks in advance.

  2. ndinakar Member

    Deleting in small batches is definetely one way. Another way is, instead of deleting from table where status = 1 (lets say 1 = expired), you might want to insert into the archive table first, then delete from source tabl eby joining with archive table.
    DELETE TableSource TS
    Join TableArchive TA On Ts.somecol = TA.somecol WHERE TS.status = 1
    It will find the rows faster as an index seek could be used to find the rows. Test this first, though.
  3. Saurabh Srivastava New Member

    Your Objective
    " move data from 100s of tables in an operational database into an archive database" in an optimized manner.
    So, before jumping on any conslusion or suggestion I would like to know
    1) how much data we are talking about?
    2) when you test your options?how long it took (Deletes)?
    3) Any existing performance issues on that server.
    4) Log file size before and after delete
    Thanks
    Saurabh
  4. gdmort New Member

    Thanks for the quick responses. I will try doing the delete with a join later today.

    In response to the other questions:

    1) The archive database will be about 400GB. I expect to archive between 10 and 20GB daily.

    2) I ran the following update statement:

    Update ADAPTIV_Deal
    Set StatusFlag = 2
    Where StatusFlag <> 2 and BusDate <> '2008-12-23 00:00:00.000'

    It ran for 90 minutes and then I stopped it.

    I then tried to run the update in batches by running the following code:

    set rowcount 50000

    updatemore:

    Update ADAPTIV_Deal
    Set StatusFlag = 2
    Where StatusFlag <> 2 and BusDate <> '2008-12-23 00:00:00.000'

    if @@rowcount > 0 goto updatemore

    set rowcount 0

    This also ran for 90 minutes when I stopped it. I assume the delete would also run this long.There is a non-unique, non-clustered index on the BusDate column.

    3) No performance issues that I'm aware of.

    4) I didn't check the log sizes. I'll check and rerun the tests.

    Thanks for the help - much appreciated.
  5. Saurabh Srivastava New Member

    wholly cow!!! 90 minutes for 50K rows. Check error log and windows event log for IO errors and memory warning messages etc.
  6. satya Moderator

    Did you manage the wrap the steps/TSQL in a stored procedure?
    I believe that will have a gain in optimization of the process and in case you need to drop the number of rows based on the conditions, better to UPDATE the STATISTICS on the table too.

Share This Page