SQL Server Join Nastiness Tamed: Finding Joy In The INNER LOOP

Example 3

In the final example, the query is of the form:
SELECT Table2.Column0, Table3.Column1
FROM Table1 JOIN Table2 JOIN Table3
WHERE Table1.Column2 >= @param1 AND Table1.Column2 <= @param2

This is a nicely bound query where Table1 generates quite a few rows, but not enough to justify the two full index scans and expensive sort that power the merge join. View execution plan.

With the hint, the execution path cleans up nicely, with Table1’s row lookup sharing the cost with looping hits into Table2’s index. View execution plan.

Caveats

Keep the following in mind when using the INNER JOIN hint:

1. Only Use With Low Row Count Queries

If your query returns a lot of rows (>1000), or legitimately needs to process a lot of rows to work correctly, and you know this going in, then this technique probably won’t help you much for that query. Loops are simple and efficient when the row counts are low, and you should force this issue via this join hint when you know the row counts are low but SQL does not. But when the row counts are high, deep looping structures start to multiply the amount of work that SQL has to do. In terms of “code”, what SQL is trading is looping code like this:

<looping pseudo-code for original example>

for each cur_accountid in accounts.where(name = “Michael”).accountids
for each file in files.where(accountid = cur_accountid)
output file.filename
next
next

</looping pseudo-code>

for hashing code like this:

<hashing pseudo-code for original example>

files_hashtable = build_hashtable(files.accountids)
for each cur_accountid in accounts_hashtable.accountids
for each file in files_hashtable.find(cur_accountid)
output file.filename
next
next

</hashing pseudo-code>

The hashing mechanism is fine if there are lots of accounts and lots of files because “files_hashtable.find(cur_accountid)” is much faster than “files.where(accountid = cur_accountid)” from the looping method. It’s just building the hash table that’s the killer. For merge joins it’s the sorting and the amount of data that has to get pumped through the merge to satisfy the join and/or query criteria that are the killers.

2. Scratch And Sniff, Rinse And Repeat

Take a query that is causing hash and/or merge joins and ruining your day with bad performance, get the table order right for your joins (see below), then add the INNER LOOP join hint and then see what smokes using the Query Analyzer in terms of execution path and execution time without affecting your application. Don’t take my word for the appropriateness of this technique to your problem: give it a try and see what smokes.

This technique will not change the correctness of your query, but it is not a “one size fits all” solution. You may find that you want to selectively use this technique in some areas but to not use it in others avoid too much deep looping involving lots of data. It’s perfectly reasonable to only use this technique for some joins in a query and not for others if good hashes or merges fall out elsewhere in your query. Not all hashes and merges are bad, just ones that involve accessing lots of data when only a little data needs to be accessed.

3. Table Order Becomes Very Important

The biggest caveat with using this technique is that the order in which you specify the tables in the join becomes the order in which the tables are processed by SQL, even if you only use this technique for a subset of the joins in your query. This means that you need to order the tables such that you have criteria — and indices that support the criteria — that quickly prune down the number of rows involved so that SQL doesn’t have to deeply loop over large amounts of data.

In the first “List all files in “Michael’s” account” example above, it’s assumed that I have an index on the Name column for accounts — hopefully one that includes the AccountID column as well to avoid a costly bookmark lookup — and that only one AccountID will be generated which will be used to query into Files, where I presumably have an AccountID index as well. If I had reversed the order of the tables in the join, the query would have performed exceptionally poorly. SQL would have walked all files accessing their not-very-unique account IDs, then looped all of that against the accounts table checking the names of the accounts against the WHERE condition. All bad. Basically, you need to know what you’re doing with this technique in terms of the execution path and the relationship between your query criteria and the indices you have have available because it’s all in your hands with this technique.

Continues…

Leave a comment

Your email address will not be published.