Table splitting | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Table splitting

Is it worthwhile (in terms of general performance) to split a large table (2M rows) into smaller tables (couple of tables or maybe dozens)? Concrete similar example:
Manage the accounts of a bank and all it’s branches. In this case, the accounts are frequently updated and new ones created (200K updates and 60K inserts per month). Is it worthwhile to split the accounts by year (2000, 2001, 2003) or by branch?
Or is it best to recreate the table periodically? (What are the options to improve extent fragmentation?)
Recreating the indexes is regularly done but the system’s performance doesn’t improve much.
Archiving old accounts is difficult as the "accounts" are never "really" closed. Sorry for the poor english and the long and vague text. Thanks in advance,
André Cardoso

I’m finding some answers in the FAQ. Sorry for troubling the forum before looking in the FAQ. Thanks,
André Cardoso
2 million tables isn’t really a lot, and the performance should be ok on a decent server. The issue is probably that the indexes are not appropriate for the queries that are run against this table. It is difficult to give proper advice on the indexes without the table structure and the actual queries run against this database, but if you have a clustered index on account number it would probably be best to changed that to a non-clustered index, and create a clustered index on branch and maybe one or two other columns. Having a clustered index on a unique column is usually not a good idea, there are better things you can do with a clustered index.
2 million tables isn’t really a lot, and the performance should be ok on a decent server. The issue is probably that the indexes are not appropriate for the queries that are run against this table. It is difficult to give proper advice on the indexes without the table structure and the actual queries run against this database, but if you have a clustered index on account number it would probably be best to changed that to a non-clustered index, and create a clustered index on branch and maybe one or two other columns. Having a clustered index on a unique column is usually not a good idea, there are better things you can do with a clustered index.
I’ll try the advice of changing the clustered index to the branch and account number. The biggest problem is with the account "movements" (? debits/credits) (~20M). Thanks,
André Cardoso
If your application can cope up by differentiating the data (movement) then you can go ahead.
But you should consider Jacco’s reference for indexes.
FOr slow running queries use PROFILER to find out which will generate the trace and submit the same to Index Tuning Wizard which will give better recommendations on the indexes. Also on the Hardware part run PERFMON and capture set of counters from Hardware and SQL Server. Refer to the homepage of this site about PERFMON & PROFILER information. HTH _________
Satya SKJ

Jacco: > Having a clustered index on a unique column is usually not a good idea Not sure I agree with this. Can you share your reasoning plz ?
Thanks
Andre, I agree with the previous posts in the forum that 2M rows is not too much for a database application. But there can be two exception to this – The table size may be very large which reduces the no. of rows that can fit on a single page hence incersing the IO. The same is applicable for Clustered and Non-Clustered indexes on the same. The second reson may be since u haven’t defined the kind of peformance required form the database so it may be difficul to understand the same. I m saying this because performance is not abstract term. It varies from user to user. Anyway…. I am writing my viws on distributing the data across multiple tables. Generally the bank account numbers are unique across the banking software for different clients. Also the account numbers are generated based on branch. i.e. All the account no.’s for a branch start from the same prefix numbers. If this is the case then I would recommend splitting the tables into branchwise tables with check on account no. and creating a partiotioned view on the tables so that the performance of search queries will increase. Hoewver even if the account no.s are not unique, dividing the data on the basis of branch with ckeck on btanch name and again having a partitioned view seems to be feasible. In this case, the branch name has to be included in the search conditons which is not generally possible. I mean a customer might not be aware as to which branch he is having account with. Chappy, the idea is that having a non-clustered index on the unique column will give a great read performance so if there is another column suitable for clustered index, non clustered index can be created on the unique column. Also having a clustered index on primary ever increaing column like account number will make the last page of the table a hot spot as every user will try to insert data at the last page whrn a new account is created. HTH. Gaurav
I cant say ive ever really encountered a serious hotspot problem, and I tend to use a lot of clustered indices on incrementing fields (maybe my servers dont go under a heavy enough load on those operations). My impression was that the situation had been improved in sql 2000 (tho I cant see what they could do to improve it, based on the theory of why it happens). Also its worth noting that if you do choose to have the clustered index as something else, and its more than one column, this will increase the disk space required for non clustered indices, degrading read performance. Youd need to think carefully to make sure you are not merely relocating the cost.
Chappy, As I said I don’t advocate using clustered indexes on unique columns because usually you can use the clustered index for better purposes. In the example we are talking about here most accounts will probably be accessed on a one-by-one basis, in which case a non-clustered index will serve the purpose. If on the same table you want to run queries to create reports summarized by branch or date, the performance of these queries will be greatly improved by having a clustered index on the right column. Clustered indexes will improve the performance of statements that use groups of rows, like GROUP BY, WHERE…BETWEEN…AND, and WHERE with low cardinality. If you are only trying to retreive one row, which usually done via the Primary Key or another unique column, the difference between using a clustered index and a non-clustered index is not very big. I don’t intend to say that you should never use a clustered index on a unique column, just that you should look if there is not a better use for it, and there often is.
Ok thanks. Was just making sure there wasnt some golden rule I had overlooked!
Ok thanks. Was just making sure there wasnt some golden rule I had overlooked!
Clustered indexes also affect performance when it comes to inserts which I believe is a factor that is quite often overlooked. I recently ran a comparison test for inserts of "clustered index on bigint identity" versus "clustered index on bigint non-identity". The results showed at least a 50% improvement on performance using the identity version.
I totally agree with that. If you have a table that has mostly single row inserts/updates/selects having the clustered index on a unique column is a good idea. If you have to perform operations on groups however it is better to use the clustered index on a combination of the columns that are include in the GROUP BY or WHERE clauses. The inserts/updates/selects will take about twice as long because both the non-clustered index and the clustered index have to be searched, the group operations will work much faster though, because the clustered index only has to be searched once, independent of the number of rows involved.
]]>