Performance Tuning SQL Server Joins – Page 2

If you have to regularly join four or more tables to get the recordset you need, consider denormalizing the tables so that the number of joined tables is reduced. Often, by adding one or two columns from one table to another, the number of joins can be reduced, boosting performance. [6.5, 7.0, 2000, 2005] Updated 11-1-2005

*****

If your join is slow, and currently includes hints, remove the hints to see if the optimizer can do a better job on the join optimization than you can. This is especially important if your application has been upgraded from previous versions of SQL Server. [6.5, 7.0, 2000, 2005] Updated 11-1-2005

*****

One of the best ways to boost JOIN performance is to ensure that the JOINed tables include an appropriate WHERE clause to minimize the number of rows that need to be JOINed.

For example, I have seen many developers perform a simple JOIN on two tables, which is not all that unusual. The problem is that each table may contain over a million rows each. Instead of just JOINing the tables, appropriate restrictive clauses needed to be added to the WHERE clause of each table in order to reduce the total number of rows to be JOINed. This simple step can really boost the performance of a JOIN of two large tables. [6.5, 7.0, 2000, 2005] Updated 11-1-2005

*****

In the SELECT statement that creates your JOIN, don’t use an * (asterisk) to return all of the columns in both tables. This is bad form for a couple of reasons. First, you should only return those columns you need, as the less data you return, the faster your query will run. It would be rare that you would need all of the columns in all of the tables you have joined. Second, you will be returning two of each column used in your JOIN condition, which ends up returning way more data that you need, and hurting performance.

Take a look at these two queries:

USE Northwind
SELECT *
FROM Orders
INNER JOIN [Order Details]
ON Orders.OrderID = [Order Details].OrderID

and

USE Northwind
SELECT Orders.OrderID, Orders.OrderDate,
[Order Details].UnitPrice, [Order Details].Quantity,
[Order Details].Discount
FROM Orders
INNER JOIN [Order Details]
ON Orders.OrderID = [Order Details].OrderID

Both of these queries perform essentially the same function. The problem with the first one is that it returns not only too many columns (they aren’t all needed by the application), but the OrderID column is returned twice, which doesn’t provide any useful benefits. Both of these problems contribute to unnecessary server overhead, hurting performance. The moral of this story is never to use the * in your joins. [6.5, 7.0, 2000, 2005] Updated 7-24-2006

*****

While high index selectivity is generally an important factor that the Query Optimizer uses to determine whether or not to use an index, there is one special case where indexes with low selectivity can be useful speeding up SQL Server. This is in the case of indexes on foreign keys. Whether an index on a foreign key has either high or low selectivity, an index on a foreign key can be used by the Query Optimizer to perform a merge join on the tables in question. A merge join occurs when a row from each table is taken and then they are compared to see if they match the specified join criteria. If the tables being joined have the appropriate indexes (no matter the selectivity), a merge join can be performed, which is often much faster than a join to a table with a foreign key that does not have an index. [7.0, 2000, 2005] Updated 7-24-2006

*****

For very large joins, consider placing the tables to be joined in separate physical files in the same filegroup. This allows SQL Server to spawn a separate thread for each file being accessed, boosting performance. [6.5, 7.0, 2000, 2005] Updated 7-24-2006

*****

Don’t use CROSS JOINS, unless this is the only way to accomplish your goal. What some inexperienced developers do is to join two tables using a CROSS JOIN, and then they use either the DISTINCT or the GROUP BY clauses to “clean up” the mess they have created. This, as you might imagine, can be a huge waste of SQL Server resources. [6.5, 7.0, 2000, 2005] Updated 7-24-2006

*****

If you have the choice of using a JOIN or a subquery to perform the same task, generally the JOIN (often an OUTER JOIN) is faster. But this is not always the case. For example, if the returned data is going to be small, or if there are no indexes on the joined columns, then a subquery may indeed be faster.

The only way to really know for sure is to try both methods and then look at their query plans. If this operation is run often, you should seriously consider writing the code both ways and test it, and then select the most efficient code. [6.5, 7.0, 2000, 2005] Updated 8-21-2006

Continues…

Leave a comment Cancel reply