SQL Server Performance

Clustered Index

Discussion in 'SQL Server 2005 Performance Tuning for DBAs' started by Chayan, Sep 21, 2007.

  1. Chayan New Member

    I have a table with more than 50,000 records in SQL server 2000 without any index or even primary key on any column. Now i have identified two columns which are together unique ( say- empid, projectid) for the table. Now if i create a clustered index on the above mentioned columns, then my table scans are getting replaced by clustered index seeks in the execution plan for queries but the situation is that frequntly new records might need to be inserted inbetween the sorted clustered records (as when new projects are associated to an existing empid). So it might result into page splits.
    So how advisable is it to have a clustered index in these kind of situations where the clustered indexed column is not monotonically increasing. Does creating one with a lower fill factor and pad index should be considered. What should be the fill factor and pad index values.
    What about creating a clustered index on a column that is not unique and with similar situation as above. As there will be no nonclustered indexes so can this be considered.
    Thanks in advance.
  2. ranjitjain New Member

    Hi,
    You need to find out how frequently data gets updated. Also the percentage of reads and writes.
    You need to try different options like you can create clustered index only on empid instead of composite columns. If you generally filter queries with empids.
    Fill factor can be detrmined perfectly by testing the performance of queries with many values. But keep in mind that it should not be very less otherwise size of index will increase and time of execution as well.
  3. FrankKalis Moderator

    Also, if you are worried about extensive page splits you might consider using a column with the IDENTITY property as PK and clustered index and create an additional UNIQUE constraint on the two columns and see if this combination performs better.
    With only 50,000 rows all in all I wouldn't worry too much.
  4. Chayan New Member

    Thanks Ranjit/Frank for your valuable suggestions.
    Frank, since i generally filter queries with empids, so will having an IDENTITY column with clustered index on it will be effective in search of empids? Secondly please clarify on which two columns you want me to have a unique constraint?
    I have similar situation in other tables too, in some tables there are around 3,00,000 records with 75% duplicate data for empid, and more duplicates expected, with all other situations same as above. The tables are read intensive. So should I consider something here, or leave them to have table scans while execution.
    Thanks in advance.
  5. IDTX2 New Member

    If you create an identitity column to server as a surrogate key and define a clustered index on that column you will have minimal page splits as records will always be added to the end of the table. You can then create a non-clustered index on empid or what ever combination you frequently query to increase performance. Having a wide or concatenated clustering key is usually a bad idea as the clustering key is carried within the leaf pages of all non-clustered indexes.
  6. MichaelB Member

    Also, keep in mind that your clustered index on your composite would read quicker than if you had a the empid in a non-clustered index but like Frank said, on a table with only 50K rows, it doesnt matter much. both will be fast. since inserts are causing page splits, Frank's solution is best for you.
  7. satya Moderator

    I suggest to take help of Profiler here while running queries against this table and use Database tuning advisor for more help on such recommendations.
    You could test the same to see the exection plan and its activity, that will be more baseline performance test in addition what has been referred above.
    http://sqlserver-qa.net/blogs/perftune/archive/tags/indexes/default.aspx links on indexes and their reference.
  8. Raulie New Member

    If 75% of your data in empid will be duplicate then indexing this column will be useless, for your first senario you could create a composite clustered index on empid, projectid given they are integers and not wide columns and see how that works. Also monitor fragmentation on a regular basis to identify page splits.

Share This Page