Finding Duplicate Indexes in Large SQL Server Databases

One important component of tuning a large, heavily used database, is to ensure that the tables are indexed optimally: enough indexing, but not too much indexing for the application you are running. There are rules of thumb about index tuning, but the entire issue is so complex that there’s no “silver bullet” solution that will work for every case.

However, in tuning indexes we can generally say that it’s not a good idea to maintain duplicate indexes on the same data. SQL Server does not provide checks to prevent duplicate indexes from being created, as long as the names are different (see http://www.sql-server-performance.com/sql_server_performance_audit.asp, page 7).

So it’s perfectly legal in SQL Server, if not recommended, to

CREATE INDEX ind_name ON myTable (LastName)
GO
CREATE INDEX ind_name_1 ON myTable (LastName)
GO
CREATE INDEX ind_name_2 ON myTable (LastName, FirstName)
GO

SQL Server will dutifully create all three indexes.

Duplicate indexes in a simple case like this might seem obvious or even silly, but as Mr. McGehee points out in the article at the link above, they often creep into a bigger system as a result of various people tuning it at different times, or because of upgrades applied to third-party or vendor-created databases, or as a result of accidentally, manually creating indexes that duplicate those that are auto-generated by SQL Server.

In a small, simple database, one could simply poke around to locate these redundant indexes by using:

sp_helpindex 'myTable' 

which, from our example, might return:

index_name index_description index_keys
ind_name nonclustered located on PRIMARY LastName
ind_name_1 nonclustered located on PRIMARY LastName
ind_name_2 nonclustered located on PRIMARY LastName, FirstName
PK_myTable clustered, unique, primary key located on PRIMARY IDNum

The index_keys column can be used to identify indexes on the same column(s), which would be candidates for removal. If a column, or sequence of columns, appears in the same order in the index_keys list, such as LastName, and LastName, FirstName, then the indexes are probably redundant. An index on LastName, FirstName would fulfill the same function as an index on LastName alone, so it’s not necessary to maintain both. Removing the duplicates should speed inserts without compromising the performance of selects.

This problem is considerably more difficult when tuning a large database, which might have been worked on over a period of years by various people, and could have hundreds of tables and indexes. Picking through such a large collection of tables one at a time would be quite time consuming. But there is a way to get SQL Server to help out with that task. The following statement will show all the indexes defined in a database, by pulling information from the SYSINDEXES table:

SELECT tbl.[name] AS TableName,
	idx.[name] AS IndexName,
	INDEXPROPERTY( tbl.[id], idx.[name], 'IsStatistics') AS IsStats,
	INDEXPROPERTY( tbl.[id], idx.[name], 'IsAutoStatistics') AS IsAutoStats,
	INDEXPROPERTY( tbl.[id], idx.[name], 'IsHypothetical') AS IsHypothetical,
	INDEXPROPERTY( tbl.[id], idx.[name], 'IsClustered') AS IsClustered,
	INDEX_COL( tbl.[name], idx.indid, 1 ) AS col1,
	INDEX_COL( tbl.[name], idx.indid, 2 ) AS col2,
	INDEX_COL( tbl.[name], idx.indid, 3 ) AS col3,
	INDEX_COL( tbl.[name], idx.indid, 4 ) AS col4,
	INDEX_COL( tbl.[name], idx.indid, 5 ) AS col5,
	INDEX_COL( tbl.[name], idx.indid, 6 ) AS col6,
	INDEX_COL( tbl.[name], idx.indid, 7 ) AS col7,
	INDEX_COL( tbl.[name], idx.indid, 8 ) AS col8,
	INDEX_COL( tbl.[name], idx.indid, 9 ) AS col9,
	INDEX_COL( tbl.[name], idx.indid, 10 ) AS col10,
	INDEX_COL( tbl.[name], idx.indid, 11 ) AS col11,
	INDEX_COL( tbl.[name], idx.indid, 12 ) AS col12,
	INDEX_COL( tbl.[name], idx.indid, 13 ) AS col13,
	INDEX_COL( tbl.[name], idx.indid, 14 ) AS col14,
	INDEX_COL( tbl.[name], idx.indid, 15 ) AS col15,
	INDEX_COL( tbl.[name], idx.indid, 16 ) AS col16,
	dpages,
	used,
	rowcnt
FROM SYSINDEXES idx
INNER JOIN SYSOBJECTS tbl ON idx.[id] = tbl.[id]
WHERE indid > 0

This query uses the INDEX_COL() function to determine what columns are being indexed, and to separate that list into discrete columns. If you run the query, you will probably find that there are several types of entries. “Normal” indexes will appear in the list, with the columns they index listed in fields col1 – colx. In addition, you’ll find that any statistics maintained in the database are also listed, generally with names starting “_WA_Sys_”. Those statistics entries are also flagged by the function INDEXPROPERTY(), which will fill the column “IsStats” as 1. Statistics are important, but fall outside the scope of our task here; for this exercise we’d like to filter them out. (See the Extra Credit section at the end of this article.)

Continues…

Leave a comment

Your email address will not be published.