SQL Server Transact-SQL WHERE

*****

If you have to sort by a particular column often, consider making that column a clustered index. This is because the data is already presorted for you and SQL Server is smart enough not to resort the data. [2000, 2005, 2008] Updated 2-4-2009

*****

If your WHERE clause includes an IN operator along with a list of values to be tested in the query, order the list of values so that the most frequently found values are placed at the first of the list, and the less frequently found values are placed at the end of the list. This can speed performance because the IN option returns true as soon as any of the values in the list produce a match. The sooner the match is made, the faster the query completes. [2000, 2005, 2008] Updated 2-4-2009

*****

If your SELECT statement contains a HAVING clause, write your query so that the WHERE clause does most of the work (removing undesired rows) instead of the HAVING clause do the work of removing undesired rows. Using the WHERE clause appropriately can eliminate unnecessary rows before they get to the GROUP BY and HAVING clause, saving some unnecessary work, and boosting performance.

For example, in a SELECT statement with WHERE, GROUP BY, and HAVING clauses, here’s what happens. First, the WHERE clause is used to select the appropriate rows that need to be grouped. Next, the GROUP BY clause divides the rows into sets of grouped rows, and then aggregates their values. And last, the HAVING clause then eliminates undesired aggregated groups. If the WHERE clause is used to eliminate as many of the undesired rows as possible, this means the GROUP BY and the HAVING clauses will have less work to do, boosting the overall performance of the query. [2000, 2005, 2008] Updated 2-4-2009

*****

If your application performs many wildcard (LIKE %) text searches on CHAR or VARCHAR columns, consider using SQL Server’s full-text search option. It can significantly speed up wildcard searches of text stored in a database. [2000, 2005, 2008] Updated 2-4-2009

*****

The GROUP BY clause can be used with or without an aggregate function. But if you want optimum performance, don’t use the GROUP BY clause without an aggregate function. This is because you can accomplish the same end result by using the DISTINCT option instead, and it is faster.

For example, you could write your query two different ways:

USE Northwind

SELECT OrderID

FROM [Order Details]

WHERE UnitPrice > 10

GROUP BY OrderID

or

USE Northwind

SELECT DISTINCT OrderID

FROM [Order Details]

WHERE UnitPrice > 10

Both of the above queries produce the same results, but the second one will use less resources and perform faster. [2000, 2005, 2008] Updated 2-4-2009

*****

The GROUP BY clause can be sped up if you follow these suggestions:

  • Keep the number of rows returned by the query as small as possible.
  • Keep the number of groupings as few as possible.
  • Don’t group redundant columns.
  • If there is a JOIN in the same SELECT statement that has a GROUP BY, try to rewrite the query to use a subquery instead of using a JOIN. If this is possible, performance will be faster. If you have to use a JOIN, try to make the GROUP BY column from the same table as the column or columns on which the set function is used.
  • Consider adding an ORDER BY clause to the SELECT statement that orders by the same column as the GROUP BY. This may cause the GROUP BY to perform faster. Test this to see if is true in your particular situation.

[2000, 2005, 2008] Updated 2-4-2009

*****

Sometimes perception is more important that reality. For example, which of the following two queries is the fastest:

  • A query that takes 30 seconds to run, and then displays all of the required results.
  • A query that takes 60 seconds to run, but displays the first screen full of records in less than 1 second.

Most DBAs would choose the first option as it takes less server resources and performs faster. But from many user’s point-of-view, the second one may be more palatable. By getting immediate feedback, the user gets the impression that the application is fast, even though in the background, it is not.

If you run into situations where perception is more important than raw performance, consider using the FAST query hint. The FAST query hint is used with the SELECT statement using this form:

OPTION(FAST number_of_rows)

where number_of_rows is the number of rows that are to be displayed as fast as possible.

When this hint is added to a SELECT statement, it tells the Query Optimizer to return the specified number of rows as fast as possible, without regard to how long it will take to perform the overall query. Before rolling out an application using this hint, I would suggest you test it thoroughly to see that it performs as you expect. You may find out that the query may take about the same amount of time whether the hint is used or not. If this is the case, then don’t use the hint. [2000, 2005, 2008] Updated 2-4-2009

*****

It is fairly common request to write a Transact-SQL query to compare a parent table and a child table and find out if there are any parent records that don’t have a match in the child table. Generally, there are three ways this can be done:

Using a NOT EXISTS

SELECT a.hdr_key

FROM hdr_tbl a

WHERE NOT EXISTS (SELECT * FROM dtl_tbl b WHERE a.hdr_key = b.hdr_key)

Using a LEFT JOIN

SELECT a.hdr_key

FROM hdr_tbl a

LEFT JOIN dtl_tbl b ON a.hdr_key = b.hdr_key

WHERE b.hdr_key IS NULL

Using a NOT IN

SELECT hdr_key

FROM hdr_tbl

WHERE hdr_key NOT IN (SELECT hdr_key FROM dtl_tbl)

In each case, the above query will return identical results. But, which of these three variations of the same query produces the best performance? Assuming everything else is equal, the best performing version through the worst performing version will be from top to bottom, as displayed above. In other words, the NOT EXISTS variation of this query is generally the most efficient.

I say generally, because the indexes found on the tables, along with the number of rows in each table, can influence the results. If you are not sure which variation to try yourself, you can try them all and see which produces the best results in your particular circumstances. [2000, 2005, 2008] Updated 2-4-2009

*****

Be careful when using OR in your WHERE clause, as it is fairly simple to accidentally retrieve much more data than you need, which hurts performance. For example, take a look at the query below:

SELECT companyid, plantid, formulaid

FROM batchrecords

WHERE companyid = ‘0001’ and plantid = ‘0202’ and formulaid = ‘39988773’

OR

companyid = ‘0001’ and plantid = ‘0202’

As you can see from this query, the WHERE clause is redundant, as:

companyid = ‘0001’ and plantid = ‘0202’ and formulaid = ‘39988773’

is a subset of:

companyid = ‘0001’ and plantid = ‘0202’

In other words, this query is redundant. Unfortunately, the SQL Server Query Optimizer isn’t smart enough to know this, and will do exactly what you tell it to. What will happen is that SQL Server will have to retrieve all the data you have requested, then in effect do a SELECT DISTINCT to remove redundant rows it unnecessarily finds.

In this case, if you drop this code from the query:

OR

companyid = ‘0001’ and plantid = ‘0202’

then run the query, you will receive the same results, but with much faster performance. [2000, 2005, 2008] Updated 2-4-2009

*****

Avoid using variables in the WHERE clause of a query located in a batch file. Let’s find out why this may not be a good idea.

First, let’s look at the following code:

SELECT employee_id

FROM employees

WHERE age = 30 and service_years = 10

Assuming that both the age and the service_years columns have useful indexes, and the table has many thousands of records, then SQL Server’s Query Optimizer will select the indexes to perform the query and return results very quickly.

Now, let’s look at the same query, but written to be more generic, one that you might find in a generic batch file:

DECLARE @age int

SET @age = “30”

DECLARE @service_years int

SET @service_years = “10”

SELECT employee_id

FROM employees

WHERE age = @age and service_years = @service_years

When the above code is run, even though both the age and the service_years columns may have useful indexes, they may not be used, and a scan may be used instead, potentially increasing the amount of time for the query to run.

The reason the indexes may not be used is because the Query Analyzer does not know the value of the variables when it selects an access method to perform the query. Because this is a batch file, only one pass is made of the Transact-SQL code, preventing the Query Optimizer from knowing what it needs to know in order to select an access method that uses the indexes.

If you cannot avoid using variables in the WHERE clauses of batch scripts, consider using an INDEX query hint to tell the Query Optimizer to use the available indexes instead of ignoring them and performing a table scan. This of course assumes that the indexes are highly selective. If the indexes are not highly selective, then a scan will most likely be more efficient than using the available indexes. [2000, 2005, 2008] Updated 2-4-2009

]]>

Leave a comment

Your email address will not be published.