SQL Server Performance

  • Home
  • Articles
  • Forums
  • Tips
  • Training
  • FAQ's
  • Blogs
  • Software
  • Books
  • About Us
RSS Feeds
Sign in | Join


Article Topics

All Articles
Performance Tuning
Audit
Business Intelligence
Clustering
Reporting Services
Developer
General DBA
ASP.NET / ADO.NET

Write for Us

Share your SQL Server knowledge with others and raise your profile in the community More...
Latest Articles

Working with Windows Communication Foundation (WCF)
Transfer Logins Task and Transfer Database Task in SSIS
Practical Database Change Management (Part 2)
Practical Database Change Management (Part 1)

More     
 
Latest FAQ's

ALTER TABLE SWITCH statement failed because column '%.*ls' has data type ...
ALTER TABLE SWITCH statement failed because column '%.*ls' has data type ...
ALTER TABLE SWITCH statement failed. There is no identical index in ...
'%ls' statement failed because the expression identifying partition number for the ...

More     
   
Latest Software Reviews

Spotlight on ApexSQL Diff - Server-based database comparison tool ...
Spotlight on ApexSQL Data Diff - Server-based database comparison tool ...
Spotlight on ApexSQL Doc 2008
ApexSQL Enforce

More     

articles >> performance tuning >> Clustered Indexes in SQL Server: Things You ...

Clustered Indexes in SQL Server: Things You Need to Know

By : Geert Vanhove
Dec 08, 2005

This article covers a few, more advanced topics about the usage of clustered indexes in SQL Server. Not only will I try to convince you of the absolute necessity of using clustered indexes, I'll also give you some tips on how to use them in a not so obvious context.



Why You Need Clustered Indexes!

As you know, the SQL Server engine treats a table without a clustered index, also called a heap, differently from a clustered table. In contrast with a heap table, a clustered table has its data pages ordered in the sequence of its index structure and has its data pages linked in a linked list. A heap table uses the IAM and PFS pages to find a page with enough space to hold a new row if no space is available in the current page (more info in BOL).

To illustrate this difference in behavior, I created a small script you can run for yourself.

-- Init

Use tempdb

SET NOCOUNT ON

IF OBJECT_ID('dbo.TestTable', 'U') IS NOT NULL DROP TABLE dbo.TestTable
CREATE TABLE dbo.TestTable ( GUID_col_indexed uniqueidentifier, CHAR_col char(10), GUID_col uniqueidentifier)
CREATE CLUSTERED INDEX IX1 ON dbo.TestTable ( GUID_col_indexed )
GO
-- Step 1: Load TestTable with 100.000 records
declare @t int
set @t=1
while @t <= 100000
begin
insert TestTable values (newid(), char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65)+ char(rand()*26 + 65), newid())
select @t=@t + 1
end
go
-- Step 2: Select 100 random values
select top 100 * from TestTable order by GUID_col
go
-- Step 3: Delete 1000 random values
delete TestTable where GUID_col in (select top 1000 GUID_col from TestTable order by GUID_col desc)
go
-- Step 4: Insert 1000 more values
declare @t int
set @t=1
while @t <= 1000
begin
insert TestTable values (newid(), char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65)+ char(rand()*26 + 65), newid())
select @t=@t + 1
end

go

This script is split into 4 steps:

  1. Load 100,000 records into a table using randomly generated values for all fields.
  2. Select 100 random values.
  3. Delete 1,000 random values.
  4. Load 1,000 more records.

To generate random values, I used uniqueidentifiers. I did not use the indexes column as SARG's in the queries. I did this to avoid any advantages related to the physical structure of "random" values.

You should run this script twice, first by using the script exactly as it appears above (creating the index as clustered) and then by leaving out the [CLUSTERED] keyword. You will get results —–I hope— comparable to the results I got after a few test runs:

 

CPU

Reads

Duration

 

Heap

Cluster

Heap

Cluster

Heap

Cluster

-- Step 1: Load Table with 100.000 records

5094

4063

374627

281336

5640

4926

-- Step 2: Select 100 random values

78

78

671

942

97

84

-- Step 3: Delete 1000 random values

641

719

9551

7698

811

834

-- Step 4: Insert 1000 more values

218

47

5717

3076

210

45

Even without addressing the specific characteristics of the clustered index, these results clearly show the internal overhead related to heap structures.



Clustered Indexes on uniqueidentifier Columns: Out of the Question! Or Not?

A lot of whitepapers don't encourage you to use clustered indexes on a uniqueidentifier column. Because of its random generation, records will be inserted randomly over all data pages that are part of the file group your table belongs to. This is absolutely an overhead because a lot more extends will be swapped into and out of memory, page splits are more likely to occur, etc.

But in some VLDB environments with high transactional throughput, this can be a solution when IO throughput is a potential bottleneck. Suppose you have a large (>100 GB) database table using a file group spread over 10 physical disks with a clustered index on a [date-inserted] column. When many inserts occur in combination with lots of updates of recently added records, all data throughput will be focused on a set of physically grouped data pages producing a hot spot. The volumes of dirty pages a checkpoint process wants to flush to disk can get too high for one disk to handle. This is where a clustered index on a uniqueidentifier column can help you. Instead of flushing all dirty pages to one disk, IO's will be spread over all physical disks.

Pay special attention to fragmentation using this solution. Because of its random inserts, page splits are more likely to occur causing more fragmentation. When your data volumes are too high for a daily rebuild, perform a general index rebuild every week. I personally prefer the CREATE INDEX … WITH DROP_EXISTING statement to rebuild indexes.



Conclusion

  1. Always use a clustered index on an OLTP database table.
  2. If you consider using a clustered index on a uniqueidentifier column:
    1. Test your solution in a stress test environment.
    2. Rebuild indexes regularly.

        








Home | Peformance Articles | Audit Articles | Business Intelligence Articles | Clustering Articles | Developer Articles | Reporting Services Articles | DBA Articles | ASP.NET / ADO.NET Articles | SQL Server Training Videos | DBA FAQ's | Developer Peformance FAQ's | DBA Peformance FAQ's | Developer FAQ's | Clustering FAQ's | Error Messages | Audit Tool Reviews | Backup Tool Reviews | Coding Tool Reviews | Compare Tool Reviews | Documentation Tool Reviews | Design Tool Reviews | Monitoring Tool Reviews | Log Tool Reviews | Reporting Tool Reviews | Clustering Tool Reviews | Security Tool Reviews | Change Management Tool Reviews | Remote Access Tool Reviews | Book Reviews | Security Tool Reviews | QDPMA Performance Tuning | ADO.NET / ASP.NET | Administration | Analysis/OLAP Services | Application Development | Configuration | Components | ETL | Hardware | High Availability | Hints | Index | Misc | Operating Systems | Performance Tuning | Replication | T-SQL | Views


              © 1999-2008 by T10 Media. All rights reserved