Performance Tuning SQL Server Joins
One of the best ways to boost JOIN performance is to limit how many rows need to be JOINed. This is especially beneficial for the outer table in a JOIN. Only return absolutely only those rows needed to be JOINed, and no more. [6.5, 7.0, 2000, 2005] Updated 7-25-2005
If you perform regular joins between two or more tables in your queries, performance will be optimized if each of the joined columns have their own indexes. This includes adding indexes to the columns in each table used to join the tables. Generally speaking, a clustered key is better than a non-clustered key for optimum JOIN performance. [6.5, 7.0, 2000, 2005] Updated 7-25-2005
If you have two or more tables that are frequently joined together, then the columns used for the joins on all tables should have an appropriate index. If the columns used for the joins are not naturally compact, then considering adding surrogate keys to the tables that are compact in order to reduce the size of the keys, thus decreasing read I/O during the join process, increasing overall performance. [6.5, 7.0, 2000, 2005] Updated 7-25-2005
JOIN performance has a lot to do with how many rows you can stuff in a data page. For example, let’s say you want to JOIN two tables. Most likely, one of these two tables will be smaller than the other, and SQL Server will most likely select the smaller of the two tables to be the inner table of the JOIN. When this happens, SQL Server tries to put the relevant contents of this table into the buffer cache for faster performance. If there is not enough room to put all the relevant data into cache, then SQL Server will have to use additional resources in order to get data into and out of the cache as the JOIN is performed.
If all of the data can be cached, the performance of the JOIN will be faster than if it is not. This comes back to the original statement, that the number of rows in a table can affect JOIN performance. In other words, if a table has no wasted space, it is much more likely to get all of the relevant inner table data into cache, boosting speed. The moral to this story is to try to get as much data stuffed into a data page as possible. This can be done through the use of a high fillfactor, rebuilding indexes often to get rid of empty space, and to optimize datatypes and widths when creating columns in tables. [6.5, 7.0, 2000, 2005] Updated 7-25-2005
Keep in mind that when you create foreign keys, an index is not automatically created at the same time. If you ever plan to join a table to the table with the foreign key, using the foreign key as the linking column, then you should consider adding an index to the foreign key column. An index on a foreign key column can substantially boost the performance of many joins. [6.5, 7.0, 2000, 2005] Updated 7-25-2005
Avoid joining tables based on columns with few unique values. If columns used for joining aren’t mostly unique, then the SQL Server optimizer may not be able to use an existing index in order to speed up the join. Ideally, for best performance, joins should be done on columns that have unique indexes. [6.5, 7.0, 2000, 2005] Updated 7-25-2005
For best join performance, the indexes on the columns being joined should ideally be numeric data types, not CHAR or VARCHAR, or other non-numeric data types. The overhead is lower and join performance is faster. [6.5, 7.0, 2000, 2005] Updated 7-25-2005
For maximum performance when joining two or more tables, the indexes on the columns to be joined should have the same data type, and ideally, the same width.
This also means that you shouldn’t mix non-Unicode and Unicode datatypes. (e.g. VARCHAR and NVARCHAR). If SQL Server has to implicitly convert the data types to perform the join, this not only slows the joining process, but it also could mean that SQL Server may not use available indexes, performing a table scan instead. [6.5, 7.0, 2000, 2005] Updated 7-25-2005
When you create joins using Transact-SQL, you can choose between two different types of syntax: either ANSI or Microsoft. ANSI refers to the ANSI standard for writing joins, and Microsoft refers to the old Microsoft style of writing joins. For example:
ANSI JOIN Syntax
SELECT fname, lname, department
FROM names INNER JOIN departments ON names.employeeid = departments.employeeid
Former Microsoft JOIN Syntax
SELECT fname, lname, department
FROM names, departments
WHERE names.employeeid = departments.employeeid
If written correctly, either format will produce identical results. But that is a big if. The older Microsoft join syntax lends itself to mistakes because the syntax is a little less obvious. On the other hand, the ANSI syntax is very explicit and there is little chance you can make a mistake.
For example, I ran across a slow-performing query from an ERP program. After reviewing the code, which used the Microsoft JOIN syntax, I noticed that instead of creating a LEFT JOIN, the developer had accidentally created a CROSS JOIN instead. In this particular example, less than 10,000 rows should have resulted from the LEFT JOIN, but because a CROSS JOIN was used, over 11 million rows were returned instead. Then the developer used a SELECT DISTINCT to get rid of all the unnecessary rows created by the CROSS JOIN. As you can guess, this made for a very lengthy query. I notified the vendor’s support department about it, and they fixed their code.
The moral of this story is that you probably should be using the ANSI syntax, not the old Microsoft syntax. Besides reducing the odds of making silly mistakes, this code is more portable between database, and eventually, I imagine Microsoft will eventually stop supporting the old format, making the ANSI syntax the only option. [6.5, 7.0, 2000, 2005] Updated 11-1-2005