Site sponsored by: Idera Try Idera’s new SQL admin toolset
SQL Server Performance

  • Home
  • Articles
  • Forums
  • Tips
  • Quiz
  • FAQ's
  • Blogs
  • Software
  • Books
  • About Us
RSS Feeds
Sign in | Join


Article Topics

All Articles
Peformance Tuning
Audit
Business Intelligence
Clustering
Reporting Services
Developer
General DBA
ASP.NET / ADO.NET

SQL Server 2008 - Worth the Wait

SQL Server’s first significant upgrade in three years features a number of envelope-pushing enhancements and improvements. Which will have the greatest impact on SQL administration and development? More...
Latest Articles

Slowly Changing Dimensions in SQL Server 2005
Audit Data Modifications
SQL Server 2008’s Management Data Warehouse
Same Report but Different Methods in SQL Server Reporting Services ...

More     
 
Latest FAQ's

How to Integrate Performance Monitor and SQL Profiler
SSIS Lookups are Case Sensitive
Convert Number to Words in SSRS
After installing SP2 on SQL Server 2005 x64, when trying to ...

More     
   
Latest Software Reviews

SQL Server DBA Dashboard
SwisSQL DBChangeManager
SQLMesh - SQL Server Search Tool
SoftTreeTech SQL Assistant

More     

articles >> peformance tuning >> Clustered Indexes in SQL Server: Things You ...

Clustered Indexes in SQL Server: Things You Need to Know

By : Geert Vanhove
Dec 08, 2005
Printer friendly

This article covers a few, more advanced topics about the usage of clustered indexes in SQL Server. Not only will I try to convince you of the absolute necessity of using clustered indexes, I'll also give you some tips on how to use them in a not so obvious context.



Why You Need Clustered Indexes!

As you know, the SQL Server engine treats a table without a clustered index, also called a heap, differently from a clustered table. In contrast with a heap table, a clustered table has its data pages ordered in the sequence of its index structure and has its data pages linked in a linked list. A heap table uses the IAM and PFS pages to find a page with enough space to hold a new row if no space is available in the current page (more info in BOL).

To illustrate this difference in behavior, I created a small script you can run for yourself.

-- Init

Use tempdb

SET NOCOUNT ON

IF OBJECT_ID('dbo.TestTable', 'U') IS NOT NULL DROP TABLE dbo.TestTable
CREATE TABLE dbo.TestTable ( GUID_col_indexed uniqueidentifier, CHAR_col char(10), GUID_col uniqueidentifier)
CREATE CLUSTERED INDEX IX1 ON dbo.TestTable ( GUID_col_indexed )
GO
-- Step 1: Load TestTable with 100.000 records
declare @t int
set @t=1
while @t <= 100000
begin
insert TestTable values (newid(), char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65)+ char(rand()*26 + 65), newid())
select @t=@t + 1
end
go
-- Step 2: Select 100 random values
select top 100 * from TestTable order by GUID_col
go
-- Step 3: Delete 1000 random values
delete TestTable where GUID_col in (select top 1000 GUID_col from TestTable order by GUID_col desc)
go
-- Step 4: Insert 1000 more values
declare @t int
set @t=1
while @t <= 1000
begin
insert TestTable values (newid(), char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65) + char(rand()*26 + 65)+ char(rand()*26 + 65), newid())
select @t=@t + 1
end

go

This script is split into 4 steps:

  1. Load 100,000 records into a table using randomly generated values for all fields.
  2. Select 100 random values.
  3. Delete 1,000 random values.
  4. Load 1,000 more records.

To generate random values, I used uniqueidentifiers. I did not use the indexes column as SARG's in the queries. I did this to avoid any advantages related to the physical structure of "random" values.

You should run this script twice, first by using the script exactly as it appears above (creating the index as clustered) and then by leaving out the [CLUSTERED] keyword. You will get results —–I hope— comparable to the results I got after a few test runs:

 

CPU

Reads

Duration

 

Heap

Cluster

Heap

Cluster

Heap

Cluster

-- Step 1: Load Table with 100.000 records

5094

4063

374627

281336

5640

4926

-- Step 2: Select 100 random values

78

78

671

942

97

84

-- Step 3: Delete 1000 random values

641

719

9551

7698

811

834

-- Step 4: Insert 1000 more values

218

47

5717

3076

210

45

Even without addressing the specific characteristics of the clustered index, these results clearly show the internal overhead related to heap structures.



Clustered Indexes on uniqueidentifier Columns: Out of the Question! Or Not?

A lot of whitepapers don't encourage you to use clustered indexes on a uniqueidentifier column. Because of its random generation, records will be inserted randomly over all data pages that are part of the file group your table belongs to. This is absolutely an overhead because a lot more extends will be swapped into and out of memory, page splits are more likely to occur, etc.

But in some VLDB environments with high transactional throughput, this can be a solution when IO throughput is a potential bottleneck. Suppose you have a large (>100 GB) database table using a file group spread over 10 physical disks with a clustered index on a [date-inserted] column. When many inserts occur in combination with lots of updates of recently added records, all data throughput will be focused on a set of physically grouped data pages producing a hot spot. The volumes of dirty pages a checkpoint process wants to flush to disk can get too high for one disk to handle. This is where a clustered index on a uniqueidentifier column can help you. Instead of flushing all dirty pages to one disk, IO's will be spread over all physical disks.

Pay special attention to fragmentation using this solution. Because of its random inserts, page splits are more likely to occur causing more fragmentation. When your data volumes are too high for a daily rebuild, perform a general index rebuild every week. I personally prefer the CREATE INDEX … WITH DROP_EXISTING statement to rebuild indexes.



Conclusion

  1. Always use a clustered index on an OLTP database table.
  2. If you consider using a clustered index on a uniqueidentifier column:
    1. Test your solution in a stress test environment.
    2. Rebuild indexes regularly.


Comments:
Your Name  
Email    
(Emails will not be displayed on the site or used for promotional purposes)
Comment  


Type characters in the image
 
 (case sensitive)

 
 
 

        








Home | Peformance Articles | Audit Articles | Business Intelligence Articles | Clustering Articles | Developer Articles | Reporting Services Articles | DBA Articles | ASP.NET / ADO.NET Articles | DBA FAQ's | Developer Peformance FAQ's | DBA Peformance FAQ's | Developer FAQ's | Clustering FAQ's | Error Messages | Audit Tool Reviews | Backup Tool Reviews | Coding Tool Reviews | Compare Tool Reviews | Documentation Tool Reviews | Design Tool Reviews | Monitoring Tool Reviews | Log Tool Reviews | Reporting Tool Reviews | Clustering Tool Reviews | Security Tool Reviews | Change Management Tool Reviews | Remote Access Tool Reviews | Book Reviews | Security Tool Reviews | QDPMA Performance Tuning | ADO.NET / ASP.NET | Administration | Analysis/OLAP Services | Application Development | Configuration | Components | ETL | Hardware | High Availability | Hints | Index | Misc | Operating Systems | Performance Tuning | Replication | T-SQL | Views