SQL Server Transact-SQL WHERE
If you currently have a query that uses NOT IN, which offers poor performance because the SQL Server optimizer has to use a scan to perform this activity, instead try to use one of the following options, all of which offer better performance:
- Use EXISTS or NOT EXISTS.
- Use IN.
- Perform a LEFT OUTER JOIN and check for a NULL condition.
[2000, 2005, 2008] Updated 2-4-2009
When you have a choice of using the IN or the EXISTS clause in your Transact-SQL, you will generally want to use the EXISTS clause, as it is usually more efficient and performs faster. [2000, 2005, 2008] Updated 2-4-2009
If you run into a situation where a WHERE clause is not sargable because of the use of a function on the right side of an equality sign (and there is no other way to rewrite the WHERE clause), consider creating an index on a computed column instead. This way, you avoid the non-sargable WHERE clause altogether, using the results of the function in your WHERE clause instead.
Because of the additional overhead required for indexes on computed columns, you will only want to do this if you need to run this same query over and over in your application, thereby justifying the overhead of the indexed computed column. [2000, 2005, 2008] Updated 2-4-2009
If you find that SQL Server uses a SCAN instead of an SEEK when you use an IN or OR clause as part of your WHERE clause, even when those columns are covered by an index, consider using an index hint to force the Query Optimizer to use the index.
SELECT * FROM tblTaskProcesses WHERE nextprocess = 1 AND processid IN (8,32,45)
takes about 3 seconds, while:
SELECT * FROM tblTaskProcesses (INDEX = IX_ProcessID) WHERE nextprocess = 1 AND processid IN (8,32,45)
returns in under a second. Contributed by David Ames. [2000, 2005, 2008] Updated 2-4-2009
If you use LIKE in your WHERE clause, try to use one or more leading characters in the clause, if possible. For example, use:
If you use a leading character in your LIKE clause, then the Query Optimizer has the ability to potentially use an available index to perform the query, speeding performance and reducing the load on SQL Server.
But if the leading character in a LIKE clause is a wildcard, the Query Optimizer will not be able to use an index, and a scan must be run, reducing performance and taking more time.
The more leading characters you can use in the LIKE clause, the more likely the Query Optimizer will find and use a suitable index. [2000, 2005, 2008] Updated 2-4-2009
If your application needs to retrieve summary data often, but you don’t want to have the overhead of calculating it on the fly every time it is needed, consider using a trigger that updates summary values after each transaction into a summary table.
While the trigger has some overhead, overall, it may be less that having to calculate the data every time the summary data is needed. You may have to experiment to see which method is fastest for your environment. [2000, 2005, 2008] Updated 2-4-2009
When you have a choice of using the IN or the BETWEEN clauses in your Transact-SQL, you will generally want to use the BETWEEN clause, as it is much more efficient. For example:
SELECT customer_number, customer_name
WHERE customer_number in (1000, 1001, 1002, 1003, 1004)
Is much less efficient than this:
SELECT customer_number, customer_name
WHERE customer_number BETWEEN 1000 and 1004
Assuming there is a useful index on customer_number, the Query Optimizer can locate a range of numbers much faster (using BETWEEN) than it can find a series of numbers using the IN clause (which is really just another form of the OR clause). [2000, 2005, 2008] Updated 2-4-2009
If possible, try to avoid using the SUBSTRING function in your WHERE clauses. Depending on how it is constructed, using the SUBSTRING function can force a scan instead of allowing the optimizer to use an index (assuming there is one). If the substring you are searching for does not include the first character of the column you are searching for, then a scan is performed.
If possible, you should avoid using the SUBSTRING function and use the LIKE condition instead, for better performance.
Instead of doing this:
WHERE SUBSTRING(column_name,1,1) = ‘b’
Try using this instead:
WHERE column_name LIKE ‘b%’
If you decide to make this choice, keep in mind that you will want your LIKE condition to be sargable, which means that you cannot place a wildcard in the first position. [2000, 2005, 2008] Updated 2-4-2009
Where possible, avoid string concatenation in your Transact-SQL code, as it is not a fast process, contributing to overall slower performance of your application. If you need to use heavy string processing, consider creating a CLR function, which potentially will provide greater performance. [2000, 2005, 2008] Updated 2-4-2009
Generally, avoid using optimizer hints in your WHERE clauses. This is because it is generally very hard to outguess the Query Optimizer. Optimizer hints are special keywords that you include with your query to force how a query is executed. If you decide to include a hint in a query, this forces the Query Optimizer to become static, preventing the Query Optimizer from dynamically adapting to the current environment for the given query. More often than not, this hurts, not helps performance.
If you think that a hint might be necessary to optimize your query, be sure you do all of the following first:
- Update the statistics on the relevant tables.
- If the problem query is inside a stored procedure, recompile it.
- Review the search arguments to see if they are sargable, and if not, try to rewrite them so that they are sargable.
- Review the current indexes, and make changes if necessary.
If you have done all of the above, and the query is not running as you expect, then you may want to consider using an appropriate optimizer hint.
If you haven’t heeded my advice and have decided to use some hints, keep in mind that as your data changes, and as the Query Optimizer changes (through service packs and new releases of SQL Server), your hard-coded hints may no longer offer the benefits they once did. So if you use hints, you need to periodically review them to see if they are still performing as expected. [2000, 2005, 2008] Updated 2-4-2009
If you have a WHERE clause that includes expressions connected by two or more AND operators, SQL Server will evaluate them from left to right in the order they are written. This assumes that no parenthesis have been used to change the order of execution. Because of this, you may want to consider one of the following when using AND:
Locate the least likely true AND expression first. This way, if the AND expression is false, the clause will end immediately, saving time.
If both parts of an AND expression are equally likely being false, put the least complex AND expression first. This way, if it is false, less work will have to be done to evaluate the expression.
You may want to consider using Query Analyzer or Management Studio to look at the execution plans of your queries to see which is best for your situation. [2000, 2005, 2008] Updated 2-4-2009
If you want to boost the performance of a query that includes an AND operator in the WHERE clause, consider the following:
- Of the search criterions in the WHERE clause, at least one of them should be based on a highly selective column that has an index.
- If at least one of the search criterions in the WHERE clause is not highly selective, consider adding indexes to all of the columns referenced in the WHERE clause (assuming they will be selective).
- If none of the columns in the WHERE clause are selective enough to use an index on their own, consider creating a covering/included index for the query.
[2000, 2005, 2008] Updated 2-4-2009
The Query Optimizer will perform a scan on a table if the WHERE clause in the query contains an OR operator and if any of the referenced columns in the OR clause are not indexed (or does not have a useful index). Because of this, if you use many queries with OR clauses, you will want to ensure that each referenced column in the WHERE clause has a useful index. [2000, 2005, 2008] Updated 2-4-2009
A query with one or more OR clauses can sometimes be rewritten as a series of queries combined with a UNION ALL statement in order to boost the performance of the query. For example, let’s take a look at the following query:
SELECT employeeID, firstname, lastname
WHERE dept = ‘prod’ or city = ‘Orlando’ or division = ‘food’
This query has three separate conditions in the WHERE clause. In order for this query to use an index, then there must be an index on all three columns found in the WHERE clause.
This same query can be written using UNION ALL instead of OR, like this example:
SELECT employeeID, firstname, lastname FROM names WHERE dept = ‘prod’
SELECT employeeID, firstname, lastname FROM names WHERE city = ‘Orlando’
SELECT employeeID, firstname, lastname FROM names WHERE division = ‘food’
Each of these queries will produce the same results. If there is only an index on dept, but not the other columns in the WHERE clause, then the first version will not use any index and a scan must be performed. But the second version of the query will use an available index for part of the query, not for all of the query.
Admittedly, this is a very simple example, but even so, it does demonstrate how rewriting a query can affect whether or not an index is used or not. If this query was much more complex, then the approach of using UNION ALL might be more efficient, as it allows you to tune each part of the index individually, something that cannot be done if you use only ORs in your query.
Note that I am using UNION ALL instead of UNION. The reason for this is to prevent the UNION statement from trying to sort the data and remove duplicates, which hurts performance. Of course, if there is the possibility of duplicates, and you want to remove them, then of course you can use UNION.
If you have a query that uses ORs and it not making the best use of indexes, consider rewriting it as a UNION ALL, and then testing performance. Only through testing can you ensure one version of your query will be faster than another. [2000, 2005, 2008] Updated 2-4-2009
Don’t use ORDER BY in your SELECT statements unless you really need to, as it can add a lot of extra overhead. For example, perhaps it may be more efficient to sort the data at the client than at the server. In other cases, perhaps the client doesn’t even need sorted data to achieve its goal. The key here is to remember that you shouldn’t automatically sort data, unless you know it is necessary. [2000, 2005, 2008] Updated 2-4-2009
Whenever SQL Server has to perform a sorting operation, additional resources have to be used to perform this task. Sorting often occurs when any of the following Transact-SQL statements are executed:
- ORDER BY
- GROUP BY
- SELECT DISTINCT
- CREATE INDEX, ALTER INDEX (generally not as critical as this happens much less often)
In many cases, these commands cannot be avoided. On the other hand, there are few ways that sorting overhead can be reduced. These include:
- Keep the number of rows to be sorted to a minimum. Do this by only returning those rows that absolutely need to be sorted.
- Keep the number of columns to be sorted to the minimum. In other words, don’t sort more columns that required.
- Keep the width (physical size) of the columns to be sorted to a minimum.
- Sort column with number datatypes instead of character datatypes.
When using any of the above Transact-SQL commands, try to keep the above performance-boosting suggestions in mind. [2000, 2005, 2008] Updated 2-4-2009