Downsides of not having clustered indexes in a DB | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Downsides of not having clustered indexes in a DB

Once again I take a jump to your sql-server-performance KB, it’s been
helpfull several times. Here’s the challenge for this time : We have this ERP software, which has now been migrated to SQL2000 (sp2).
Because of the migration of the db to SQL2000 and the new software-release
of this ERP software, I perform a closer followup regarding db-behavior and
performance. One of the things that really cought my attention is that there are almost
no clustered indexes defined in this database.
There are 1400 tables and only 18 of them have a clustering index. From
these 18 tables, only 4 are in use.
Aparently we only use 428 tables and it’s a 6Gb database.
All tables do have a primary key defined and most of them have also other
indexes defined.
Most of the primary keys have been defined on a single increasing number
column. I’ve asked the softwareproducer/vendor as to why they use practicaly no
clusterd indexes.
their reply : "We’ve checked it 2 years ago (with sql7 sp2) and did not find
sutable solutions for all users, so we didn’t implement it."
They promote their software is suitable for SQLserver as well as for Oracle. As a general rule of thumb – also mentioned at your webside – I always put a
clustered index on a table.
One of the reasons is because that’s the only way of getting the datapages
reorganized by using dbreindex/indexdefrag.
I’ve noticed at some of our own tables in other databases, that (I guess) by
way of use, a heap consumes by far more datapages then a clustered table.
Because of this extreem example I get a little shaky when I see no clustered
indexes have been defined and no documentation is available to found it. Can you provide more technical info about the differerences in the way
sqlserver(2000) treads heaps and clusterd tables datapage wise ? In most discussion groups , there is always a finger pointed to the downside
of the use of clusterd indexes, but I’ve seen no technical issue regarding
the use heaps. Are there technical downsides with using heaps ? Is the fact of "by default" missing clustered indexes a standard procedure
in ERP databases ? Thanks in advance and keep the good work going on for the
sqlserver-community. btw : I’ve already had a response from Brad stating :
"There’s been plenty of articles in SQL Server magazine and on Microsoft’s
website, and in many books (like Inside SQL Server 2000) that point that
heaps are never good. That’s why it is a good reason to add a clustered
index to every index (except perhaps, very, very small tables, but even in
those cases, I still do). I would have to write an article to cover why this
is, although I do describe some of these reasons on my website under
clustered and non-clustered indexes. So, if you are looking for a very
technical , in-depth discussion of this, the best is found in the book
Inside SQL Server. Assuming you are having performance problems with your ERP database, I would
take a Profiler Trace of it and run it through Profiler. It will recommend
some of the easiest clustered indexes to add, and then from there, I would
begin identifying poorly performing queries, finding the tables they hit,
and then add appropriate clustered indexes to them. If you haven’t done so, join the website’s new forum and post your question
there for other feedback from members. Brad
For information refer to this interesting article : from this site. HTH Satya SKJ
I just ran across anther good reason to add a clustered index to every table instead of using a heap. First, keep in mind that a clustered index physically orders the data in a table based on a single or composite column. Second, a heap (a table without a clustered index) is not stored in any particular physical order. Whenever you need to query the column or columns used for the clustered index, SQL Server has the ability to sequentially read the data in a clustered index an extent (8 data pages, or 64K) at a time. This makes it very easy for the disk subsystem to read the data quickly from disk. But if a heap is used, even if you add a non-clustered index on an appropriate column or columns, because the data is not physically ordered (unless you are using a covering index), SQL Server has to read the data from disk randomly using 8K pages. This creates a lot of extra work for the disk subsystem to retrieve the same data. This is just one of the many reasons why a clustered index should be added to virtually all tables.
Brad M. McGehee
Indeed, IO will benefit in this way. Now, what I’ve done is that I took one heap-table (with indexes) ( space = 1.153 mb reserved / 773 mb data / 285 mb index / 95 mb unused ) and copied it to my testserver.
Then I’ve altered the pk-index to be clustering.
Spaceinfo now says : 1.644 mb reserved / 697 mb data / 220 mb index / 726mb unused
(these figures are after dbreindex !)
When I create the same table using the clustered index from the beginning, spaceinfo says : 920 mb reserved / 698 mb data / 221 mb index / 512 KB unused. Any idea why the unused_kb is that big after the alter to clustering ?
Never mind …. dbcc updateusage corrected the stats.
Now is can go on testing..
You should almost always have a clustered index on any table with more than a handful of rows. If you don’t have a clustered index, then every time any rows is to be fetched from the table, SQL Server has to walk through the entire table row by row until it finds the row you are after – even if you have non-clustered indexes. There were some performance issues with SQL Server 6.5 and earlier where a clustered index on a monotonically increasing column (ie, an identity column) could cause a "hot-spot" which could lead to locking problems. However, that is no longer an issue with 7.0 and beyond. One of my clients has two databases that are around 3-4 GB in size, with 700+ tables. In total, maybe a dozen of the tables had clustered indexes. (And similar to your situation, half of those tables weren’t actually used!) I added a clustered index to every table, and almost instantly the performance issues cleared up. (Now if only I could stop them using MS Access to do some of their reports the rest of the issues will go away too…) We had no problems doing this, and some operations that were taking 20-30 minutes are now running in under 1 minute! BTW, a little snippet from Microsoft’s KB article Q297861 which also shows why a clustered index can be very beneficial:
quote:In SQL Server 7.0, and later, SQL Server generally optimizes inserts into a heap with the assumption that saving space is more important than performance. That is the tradeoff you choose to make when you decide to leave a table as a heap. Therefore, an insert into a heap often spends time searching for a location to insert a new row. On the other hand, an insert to a clustered table does not have to spend time searching for space. For more details about how the space is allocated, see the "Managing Space Used by Objects" topic in SQL Server Books Online.

Thanks for all the input.