SQL Server Performance

Scaling billing system 200.000 records per day

Discussion in 'SQL Server 2005 General Developer Questions' started by afranchimon, Mar 23, 2009.

  1. afranchimon New Member

    We've been asked to rework a billing system that copes with about 200.000 records per day. A billing record represents an incoming message that can be of a certain type. Currently our system handles about 200.000 messages per day, and this is all outputted in a single table. (time_sent, entity_id, type_id).
    At the end of the month, finance needs to draw reports for each entity_id how many messages have been handled during the month. It's difficult to adjust the system entirely, we'd rather not touch the one large billing table. However it is possible, so any suggestions are welcome. Also, we need to have the details of the time the message was sent.
    Now our idea was, to add a trigger to this table that starts writing to a counter table for every day (entity_id, type_id, year, month, day, count), which would obviously already speed up the reports significantly. However, since we have about 5000 entities, and 16 billing_types, this table would also grow out of proportions fairly fast. The number of entities, and number messages are expected to increase by 20-30% per year, and we obviously need to be able to work with this for a good number of years. Therefore we want to have a similar table, that contains the totals for each month. Our goal is to have a responsive front-end, even when we try to look in the detail table for a single entity_id on a certain date..
    Obviously the table with the billing details is indexed on the time_sent and entity_id columns. Most of these messages come in chronologically, but this is not necessary.
    Now I have a number of questions, and all suggestions are more than welcome).
    a) Is this a good approach? And why/why not?
    b) Any better ideas?
    c) How certain can we be that these triggers (and recursive triggers, if we want to have a table for monthly stats) are executed?
    d) Any pitfalls we should watch out for?
    e) Besides the indexes on the billing detail table, is there another way to increase the performance on this table?
    Any advice is more than welcome and really appreciated.

  2. FrankKalis Moderator

    When finance needs to do reporting on it, they only look at the "historical" data, right? I mean, data that is unlikely to change any longer. If that is the case, you could run their queries in UNCOMMITTED isolation level. That means that you don't block any other modification query at the same time.
    Another way could be to have a procedure that runs once a day and aggregates the daily number into another table. Just like your trigger idea, but not for every insert. Of course, this would need to be agreed upon with your business users.
  3. afranchimon New Member

    Hi Frank,
    Thanks for your reply. Yes you're right. The data is very unlikely to change. Lock levels is something we need to look at, sounds like great advice!
    Running a script at the end of the day is something we considered as well. I am always scared of the results when something goes wrong during these processes, therefore the triggers. As long as the database is up, they should always give the required results. When the database is down, our system will halt and no records will be inserted (and it's pretty hard with no database running in the first place :).
    Thanks!

  4. FrankKalis Moderator

    [quote user="afranchimon"]
    Running a script at the end of the day is something we considered as well. I am always scared of the results when something goes wrong during these processes, therefore the triggers. As long as the database is up, they should always give the required results. When the database is down, our system will halt and no records will be inserted (and it's pretty hard with no database running in the first place :). [/quote]
    Well, not this might let you sleep better, but when the database is down, a not finished batch is likely to be the least of your problems. [:)]
    However the good thing when you do this exclusively on the database side is, that such a batch either succeeds of fails completely. There is nothing in between. And if it fails, it should be repeatable from either some application side or from some operations team kicking off this process manually again.As far as I can tell, this batch doesn't sound too complicated or complex. so any batch aggregating one day worth of bills into another table should be just a matter of seconds.

  5. afranchimon New Member

    [quote user="FrankKalis"]
    Well, not this might let you sleep better, but when the database isdown, a not finished batch is likely to be the least of your problems.[:)]
    However the good thing when you do this exclusively on the databaseside is, that such a batch either succeeds of fails completely. Thereis nothing in between. And if it fails, it should be repeatable fromeither some application side or from some operations team kicking offthis process manually again.As far as I can tell, this batch doesn'tsound too complicated or complex. so any batch aggregating one dayworth of bills into another table should be just a matter of seconds.

    [/quote]
    You're absolutely right about that :).
    I actually missed mentioning in my original post that we might make our new front-end accessible to clients as well, so they can monitor their usage 'realtime'. Realtime is a bit abstract here, as 1 day old would be good enough. (As they are used to getting their bills at the end of the month, I'm sure they are quite pleased when they have insight in yesterday's usage). What I'm trying to say that batch processing would still meet the requirements.
    Do you have a specific reason why batch processing is a better solution than using triggers? Would it be more reliable? Is it a better practice?
    I am most definitely not an expert in this field, that's why I am asking advice on this page.
    Once again, thanks for your efforts!
  6. FrankKalis Moderator

    [quote user="afranchimon"]
    Do you have a specific reason why batch processing is a better solution than using triggers? Would it be more reliable? Is it a better practice?
    [/quote]
    Batch processing has the advantage that you're in control when to kick the process off and therefore run asynchronously, while trigger always run synchronously. So, you could schedule this for off-peak hours.
  7. geebee2 New Member

    My suggestion is to have a table that accumulates new records (an input buffer), then periodically take records from that table into the permanent structure.
    You can either use triggers to build summary data, or do it explicitly.
    I run a logging system for our websites, which handles about 50,000 records (clicks) a day.
    The update process runs about once every 20 minutes or so.
    Before I used this structure, I ran into problems inserting data into a very large table in "real" time.
    Having a simple "input buffer" table makes sure there are no problems saving data as it comes in.
  8. afranchimon New Member

    [quote user="geebee2"]
    My suggestion is to have a table that accumulates new records (an input buffer), then periodically take records from that table into the permanent structure.
    You can either use triggers to build summary data, or do it explicitly.
    I run a logging system for our websites, which handles about 50,000 records (clicks) a day.
    The update process runs about once every 20 minutes or so.
    Before I used this structure, I ran into problems inserting data into a very large table in "real" time.
    Having a simple "input buffer" table makes sure there are no problems saving data as it comes in.
    [/quote]
    Thanks for your reply. The input buffer sounds like a good idea, however we don't really have problems with inserting the data, and as I said, we'd rather not touch the large billing table. But I'll keep it in the back of my head!
  9. Bob L New Member

    I am wondering if table partitioning would be appropriate in this case. Create new filegroups for each of the partitions, with one partition per type that the finance group needs to get totals on. I also like the idea of the holding table, where every so often a stored proc (or similar process) is run to move the data from the holding table into the main table (which would be partitioned). This would greatly speed up any reporting or accumulations on the data in the partitions. It would also free up the main table for reporting since normal usage would be held off until the next process runs. (New filegroups for each of the partitions would also let you move that partition if it grows abnormally large and you needed to move it to another disk.)
    I would make sure to have nonclustered indexes on the other fields that are important (used for sorting or searching, as in the "WHERE" clause). This would also help the speed of searching and sorting on those fields.
    Just suggestions. I have not implemented partitioning in production, but it may be a possibility in your case.
    Would like to hear if anyone else thinks this is a bad idea.
    BobL
  10. afranchimon New Member

    Hi Bob, thanks for your reply.
    I wasn't familiar with partioned tables, but before I answered this post I quickly googled it, and it sounds promising. The only website I found this far is one that used SQL server 2005 beta for benchmarks but it turnes out to be slower than a regular table. But that was a beta. I'm not asking you to google for me, but if you have a good resource with benchmarks I'm really interested. I don't have a lot of time on my hands on the moment but will definitely look into it. Is SQL server capable of 'porting' a table to a partitioned table?
    Thanks!
  11. Bob L New Member

    Yes, you can partition an existing table.
    Steps (after backing up the database):
    1) Create the filegroups for each of the files that will be needed to contain the different partitions (saved separately, but treated as if it is one big normal table)
    2) Create a Partition Function
    3) Create a Partition Scheme
    4) Drop existing clustered index on the existing table
    5) Re-Create the clustered index on the Partition Scheme for that table
    You might also want to partition the indexes, although you could just create a new filegroup for them and recreate all non-clustered indexes for that table on those filegroups, so that they are separate from the data filegroups (then they can be kept on a separate disk, if possible, for more flexible storage, and for speed, if a fast, separate disk group is available).
    Hope this helps. Gives some key words to search for, anyway.
    BobL
  12. satya Moderator

    What is the hardward & software configuration on thsi setup?
    Also the available memory on the machine & allocated on SQL SErver?
    What kind of performance issues you are facing, intermittent or regular slow down of queries?>
  13. afranchimon New Member

    [quote user="satya"]
    What is the hardward & software configuration on thsi setup?
    Also the available memory on the machine & allocated on SQL SErver?
    What kind of performance issues you are facing, intermittent or regular slow down of queries?>
    [/quote]
    Thanks for your reply. The server itself has four Xeon dual core 2.8Ghz processors with 4GB RAM. Win2003x64 + SQL server (32-bit) is not limited to memory usage and usually takes about 3GB.
    We basically do not have problems with the performance of the server, however the queries are awfully slow. We are building a new front-end that we want to be responsive, as we might open this up for our clients to have 'realtime' insight in their usage. I quote 'realtime' as 1 day old should be good enough (considering the advices I've been given in the rest of the posts)

Share This Page