SQL Server 2000 offers a function called CHECKSUM. The main purpose for this function is to create what are called hash indices. A hash indice is an index built on a column that stores the checksum of the data found in another column in the table. The CHECKSUM function takes data from another column and creates a checksum value. In other words, the CHECKSUM function is used to create a mostly unique value that represents other data in your table. In most cases, the CHECKSUM value will be much smaller than the actual value. For the most part, checksum values are unique, but this is not guaranteed. It is possible that two slightly different values may produce the same identical CHECKSUM value.
Here’s how this works using a music database example. Say we have a song with the title “My Best Friend is a Mule from Missouri”. As you can see, this is a rather long value, and adding an index to the song title column would make for a very wide index. But in this same table, we can add a CHECKSUM column that takes the title of the song and creates a checksum based on it. In this case, the checksum would be 1866876339. The CHECKSUM function always works the same, so if you perform the CHECKSUM function on the same value many different times, you would always get the same result.
So how does the CHECKSUM help us? The advantage of the CHECKSUM function is that instead of creating a wide index by using the song title column, we create an index on the CHECKSUM column instead. “That’s fine and dandy, but I thought you wanted to search by the song’s title? How can anybody ever hope to remember a checksum value in order to perform a search?”
Here’s how. Take a moment to review this code:
SELECT title, artist, composer
WHERE title = ‘My Best Friend is a Mule from Missouri’
AND checksum_title = CHECKSUM(‘My Best Friend is a Mule from Missouri’)
In this example, it appears that we are asking the same question twice, and in a sense, we are. The reason we have to do this is because there may be checksum values that are identical, even though the names of the songs are different. Remember, unique checksum values are not guaranteed.
Here’s how the query works. When the Query Optimizer examines the WHERE clause, it determines that there is an index on the checksum_title column. And because the checksum_title column is highly selective (minimal duplicate values) the Query Optimizer decides to use the index. In addition, the Query Optimizer is able to perform the CHECKSUM function, converting the song’s title into a checksum value and using it to locate the matching records in the index. Because an index is used, SQL Server can quickly locate the rows that match the second part of the WHERE clause. Once the rows have been narrowed down by the index, then all that has to be done is to compare these matching rows to the first part of the WHERE clause, which will take very little time.
This may seem a lot of work to shorten the width of an index, but in some cases, this extra work will pay off in better performance in the long run.
Because of the nature of this tip, I suggest you experiment using this method, along with trying the more conventional method of creating an index on the title column itself. Since there are so many variables to consider, it is tough to know which method is better in your particular situation unless you give them both a try. [2000, 2005] Updated 10-4-2004
Some queries can be very complex, involving many tables, joins, and other conditions. I have seen some queries run over 1000 lines of code (I didn’t write them). This can make them difficult to analyze in order to identify what indexes might be used to help the query perform better.
For example, perhaps you want to create a covering index for the query and you need to identify the columns to include in the covering index. Or, perhaps you want to identify those columns that are used in joins in order to check to see that you have indexes on those columns used in the joins in order to maximize performance.
To make complex queries easier to analyze, consider breaking them down into their smaller constituent parts. One way to do this is to simply create lists of the key components of the query, such as:
- List all of the columns that are to be returned
- List all of the columns that are used in the WHERE clause
- List all of the columns used in the JOINs (if applicable)
- List all the tables used in JOINs (if applicable)
Once you have the above information organized in this easy-to-comprehend form, it is must easier to identify those columns that could potentially make use of indexes when executed. [6.5, 7.0, 2000, 2005] Updated 10-4-2004
Queries that include either the DISTINCT or the GROUP BY clauses can be optimized by including appropriate indexes. Any of the following indexing strategies can be used:
- Include a covering, non-clustered index (covering the appropriate columns) of the DISTINCT or GROUP BY clauses.
- Include a clustered index on the columns in the GROUP BY clause.
- Include a clustered index on the columns found in the SELECT clause.
Adding appropriate indexes to queries that include DISTINCT or GROUP BY is most important for those queries that run often. If a query is rarely run, then adding an index may cause more performance problems than it helps. [7.0, 2000, 2005] Updated 10-4-2004
Computed columns in SQL Server 2000 can be indexed if they meet all of the following criteria:
- The computed column’s expression is deterministic. This means that the computed value must always be the same given the same inputs.
- The ANSI_NULL connection-level object was on when the table was created.
- TEXT, NTEXT, or IMAGE data types are not used in the computed column.
- The physical connection used to create the index, and all connections used to INSERT, UPDATE, or DELETE rows in the table must have these six SET options properly configured: ANSI_NULLS = ON, ANSI_PADDINGS = ON, ANSI_WARNINGS = ON, ARITHABORT = ON, CONCAT_NULL_YIELDS_NULL = ON, QUOTED_IDENTIFIER = ON, NUMERIC_ROUNDABORT = OFF.
If you create a clustered index on a computed column, the computed values are stored in the table, just like with any clustered index. If you create a non-clustered index, the computed value is stored in the index, not in the actual table.
While adding an index to a computed column is possible, it is rarely advisable. The biggest problem with doing so is that if the computed column changes, then the index (clustered or non-clustered) has to also be updated, which contributes to overhead. If there are many computed values changing, this overhead can significantly hurt performance.
The most common reason you might consider adding an index to a computed column is if you are using the CHECKSUM() function on a large character column in order to reduce the size of an index. By using the CHECKSUM() of a large character column, and indexing it instead of the large character column itself, the size of the index can be reduced, helping to save space and boost overall performance.  Updated 10-4-2004
Many databases experience both OLTP and OLAP queries. As you probably already know, it is nearly impossible to optimize the indexing of a database that has both type of queries. This is because in order for OLTP queries to be fast, there should not be too many indexes as to hinder INSERT, UPDATE, or DELETE operations. And for OLAP queries to be fast, there should be as many indexes as needed to speed SELECT queries.
While there are many options for dealing with this dilemma, one option that may work for some people is a strategy where OLAP queries are mostly (if not all) are run during off hours (assuming the database has any off hours), and take advantage of indexes that are added each night before the OLAP queries begin, and then are dropped once the DSS queries are complete. This way, those indexes needed for fast performing OLAP queries will minimally interfere with OLTP transactions (especially during busy times).
As you can imagine, this strategy can take a lot of planning and work, but in some cases, it can offer the best performance for databases that experience both OLTP and OLAP queries. Because it is hard to guess if this strategy will work for you, you will want to test it before putting it into production. [6.5, 7.0, 2000, 2005] Updated 10-4-2004
Be aware that the MIN() or MAX() functions can take advantage of appropriate indexes. If you find that you are using these functions often, and your current query is not taking advantage of current indexes to speed up these functions, consider adding appropriate indexes. [6.5, 7.0, 2000] Updated 10-4-2004
If you know that a particular column will be subject to many sorts, consider adding a unique index to that column. This is because unique columns generally sort faster in SQL Server than if there are duplicate column data present. [6.5, 7.0, 2000, 2005] Updated 10-4-2004
Whenever you upgrade software that affects SQL Server, get a new Profiler trace and run it against the Index Wizard or Database Storage Engine Advisor to catch any obvious missing indexes that may be needed as a result of the upgrade. Application software that is updated often changes the Transact-SQL code (and SPs, if used) that accesses SQL Server. In many cases, the vendor supplying the upgraded code may not have taken into account how index use might have changed after the upgrade was made. You may be surprised what you find. [6.5, 7.0, 2000, 2005] Added 4-19-2005
DELETE operations can sometimes be time- and space-consuming. In some environments you might be able to increase the performance of this operation by using TRUNCATE instead of DELETE. TRUNCATE will almost instantly be executed. However, TRUNCATE will not work when there are Foreign Key references present for that table. A workaround is to DROP the constraints before firing the TRUNCATE. Here’s a generic script that will drop all existing Foreign Key constraints on a specific table:
CREATE TABLE dropping_constraints
INSERT INTO dropping_constraints
‘ALTER TABLE [‘ +
‘] DROP CONSTRAINT ‘ +
t1.CONSTRAINT_NAME = t2.CONSTRAINT_NAME
DECLARE @stmt VARCHAR(8000)
DECLARE @rowcnt INT
SELECT TOP 1 @stmt=cmd FROM dropping_constraints
SET @stmt = ‘DELETE FROM dropping_constraints WHERE cmd =’+ QUOTENAME(@stmt,””)
SELECT TOP 1 @stmt=cmd FROM dropping_constraints
DROP TABLE dropping_constraints
This can also be extended to drop all FK constraints in the current database. To achieve this, just comment out the WHERE clause. [6.5, 7.0, 2000] Added 4-19-2005