Whether to use UNION or OR in SQL Server Queries

Recently I came across with an article on DB2 about using Union instead of OR. So I thought of carrying out a research on SQL Server on what scenarios UNION is optimal in and which scenarios OR would be best. I will analyze this with a few scenarios using samples taken  from the AdventureWorks database Sales.SalesOrderDetail table.

Scenario 1: Selecting all columns

So we are going to select all columns and you have a non-clustered index on the ProductID column.

--Query 1 : OR
SELECT * FROM Sales.SalesOrderDetail
WHERE ProductID = 714 OR ProductID =709
     OR ProductID =998 OR ProductID =875
     OR ProductID =976 OR ProductID =874
--Query 2 : UNION
SELECT * FROM Sales.SalesOrderDetail
WHERE ProductID = 714
UNION
SELECT * FROM Sales.SalesOrderDetail
WHERE ProductID = 709
UNION
SELECT * FROM Sales.SalesOrderDetail
WHERE ProductID = 998
UNION
SELECT * FROM Sales.SalesOrderDetail
WHERE ProductID = 875
UNION
SELECT * FROM Sales.SalesOrderDetail
WHERE ProductID = 976
UNION
SELECT * FROM Sales.SalesOrderDetail
WHERE ProductID = 874

So query 1 is using OR and the later is using UNION. Let us analyze the execution plans for these queries.

Query 1

Query 2

As expected Query 1 will use Clustered Index Scan but Query 2, uses all sorts of things. In this case, since it is using multiple CPUs you might have CX_PACKET waits as well.

Let’s look at the profiler results for these two queries:

CPU

Reads

Duration

Row Counts

OR

78

1252

389

3854

UNION

250

7495

660

3854

You can see from the above table the UNION query is not performing well as the  OR query though both are retuning same no of rows (3854).These results indicate that, for the above scenario UNION should be used.

Scenario 2: Non-Clustered and Clustered Index Columns only



--Query 1 : OR
SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail
WHERE ProductID = 714 OR ProductID =709
     OR ProductID =998 OR ProductID =875
     OR ProductID =976 OR ProductID =874
GO
--Query 2 : UNION
SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail
WHERE ProductID = 714
UNION
SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail
WHERE ProductID = 709
UNION
SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail
WHERE ProductID = 998
UNION
SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail
WHERE ProductID = 875
UNION
SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail
WHERE ProductID = 976
UNION
SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail
WHERE ProductID = 874
GO

So this time, we will be selecting only index columns, which means these queries will avoid a data page lookup. As in the previous case we will analyze the execution plans:

Query 1

Query 2

Again, Query 2 is more complex than Query 1. Let us look at the profile analysis:

CPU

Reads

Duration

Row

Counts

OR

0

24

208

3854

UNION

0

38

193

3854

In this analyzis, there is only slight difference between OR and UNION.

Scenario 3: Selecting all columns for different fields

Up to now, we were using only one column (ProductID) in the where clause.  What if we have two columns for where clauses and let us assume both are covered by non-clustered indexes?

--Query 1 : OR
SELECT *
FROM Sales.SalesOrderDetail
WHERE ProductID = 714
      OR CarrierTrackingNumber LIKE 'D0B8%'
--Query 2 : UNION
SELECT *
FROM Sales.SalesOrderDetail
WHERE ProductID = 714
UNION
SELECT *
FROM Sales.SalesOrderDetail
WHERE CarrierTrackingNumber
 LIKE 'D0B8%'

Query 1

Query 2:

As we can see, the query plan for the second query has improved. Let us see the profiler results.

CPU

Reads

Duration

Row

Counts

OR

47

1278

443

1228

UNION

31

1334

400

1228

So in this case too, there is little difference between OR and UNION.

Scenario 4: Selecting Clustered index columns for different fields

Now let us go only with clustered indexes:

--Query 1 : OR
SELECT *
FROM Sales.SalesOrderDetail
WHERE ProductID = 714
      OR CarrierTrackingNumber LIKE 'D0B8%'
--Query 2 : UNION
SELECT *
FROM Sales.SalesOrderDetail
WHERE ProductID = 714
UNION
SELECT *
FROM Sales.SalesOrderDetail
WHERE CarrierTrackingNumber
 LIKE 'D0B8%'

Query 1

Query 2

Now both execution plans are almost identical except is an additional Stream Aggregate is used in the first query. This means UNION has advantage over OR in this scenario. Let us see profiler results for these queries again.

CPU

Reads

Duration

Row

Counts

OR

0

319

366

1228

UNION

0

50

193

1228

Now see the differences, in this scenario UNION has somewhat of an advantage over OR.

Conclusion

Using UNION or OR depends on the scenario you are faced with. So you need to do your analyzing before selecting the appropriate method. Also, above the four scenarios are not all an exhaustive list of scenarios, I selected those for the broad description purposes only.



Related Articles :

18 Responses to “Whether to use UNION or OR in SQL Server Queries”

  1. Dinesh,

    Thank you for the most interesting tests and results – very insightful.

    In addition to comparing OR versus UNION approaches, have you also considered comparing against a UNION ALL approach? I have heard that it can be better performing than UNION when you do not care if duplicate rows are returned (or know that the UNIONed results will not have duplicate rows).

    Thanks again for your post.

    Scott R.

  2. Scott,

    Doing a UNION ALL will most certainly reduce CPU as well as, to a lesser degree, the IO costs.

    However, if you want to ensure that the resultset matches what is returned by the “OR” query, then you would have to use UNION in order to eleminate duplicates.

    Having said that, there are scenarios, like DataWarehouse ETL, where using UNION ALL and then removing duplicates may be more performant than using UNION. So, I suppose, you would have to do some experimentation on what works better for your workload.

    HTH.

    • I would have to disagree. The OR option doesn’t detect duplicates and eliminate them. Yet the UNION operator is doing this. UNION ALL would be the equivalent to the OR statement.

      To make the first query the equivalent of a UNION, not a UNION ALL you would need to rewrite the first query to

      SELECT DISTINCT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail
      WHERE ProductID = 714 OR ProductID =709
      OR ProductID =998 OR ProductID =875
      OR ProductID =976 OR ProductID =874

      Your comment is awaiting moderation.

  3. Hi Dinesh, thanks for this article! Just a comment to the last example – the query in the Scenario 4 is incorrect, it’s just copied from the Scenario 3. There should be “SELECT SalesOrderID, SalesOrderDetailID …” instead of “SELECT * …”

  4. You forgot the IN() syntax. It is defined as a chain of ORs in the Standards. However, Some SQL products treat it as special cases when the list is long enough. The two internal optimizat6ions are:
    1) Put the list is put into a hash table so access is done in one probe
    2) Put the list into a tree structure, like a heap sort tree, so it is searched in log2(n) probes.

  5. Nice.

    There is also using an INSERT which your original would have six queries instead of one set. I find this sometimes to be quicker.

  6. This is very informative, however what if we use IN clause instead of OR or UNION.. as in most of the cases IN or EXIST comes to mind if we have to compare to a list of values

  7. “You can see from the above table the UNION query is not performing well as the OR query though both are retuning same no of rows (3854).These results indicate that, for the above scenario UNION should be used.”

    Are you sure of your conclusion here? Surely in the above scenario UNION should NOT be used?

    Also beware of using UNION to remove duplicates, as it will remove also duplicates in each record set. e.g. if you have a table (tbl) with 1 field (myfield) and 2 rows (both rows contain ‘abc’) then

    select myfield from tbl
    UNION
    select myfield from tbl

    will return just one row (whereas UNION ALL would return 4).

  8. At the end of Scenario #1 you say “You can see from the above table the UNION query is not performing well as the OR query though both are retuning same no of rows (3854).These results indicate that, for the above scenario UNION should be used.”

    So you are recommending that even though the UNION uses more CPU, more reads, takes longer, that it is still the recommended operator ??? Perhaps I am missing something, or perhaps it is just a type-o.

  9. Eek. UNION is most certainly not a like for like replacement for an OR operator.

    UNION (minus the ALL) is the equivalent of a SELECT DISTINCT, so could end up with unexpected results, depending on what you are doing.

    If ever I see a DISTINCT in a query, alarm bells automatically start ringing.
    Not because it doesn’t have its uses, but that in the vast majority of cases when people use it to eliminate duplicates they’ve done so because they didn’t fully understand what they were trying to achieve or have some misunderstanding about the data model.

    So assuming this is about the UNION ALL vs OR, short of any huge gain in performance I’d say the OR is both aesthetically more pleasing (less code) and more obvious as to what the semantics of the query are about.

    As others have said an IN clause will usually be optimised out identically to a simple OR clause and is even neater.

    On a slight aside, in a partitioned view which is effectively a UNION ALL query against identical tables with different check constraints, the optimiser would automatically discount any values that weren’t in the OR clause.

  10. IN is a fancier way for OR. Although it reduces the amount of code you have to write, it stills works as OR internally. With that being said, you will see no or very little difference in performance while using (IN or OR).

  11. Regarding Scenario 1, you state that UNION should be used? Why is this, when it has higher CPU and Reads.

    Also the screenshots for the plans in Scenario 3 & 4 look like they need to be swapped around? nonclustered vs clustered index use

  12. Hi all,
    please consider that if you have many values for the same condition, you have to dinamically create a select string concatenating many UNION. this statement cannot be used with, i.e., sp_executesql or stored procedure because the command is every time generated at “runtime” (not compiled?).
    sometimes performance are less important than clarity and readness specially when you need to solve problems in production. (consider the effort to analyze many selct in union vs a more simple and clear IN )

  13. How did you generate the tables that compare CPU, Reads, Duration and Row Count? An article explaining how to would be extremely useful.

  14. thanks. Just curious how you got the Reads and Duration numbers?

  15. in this example, there would be no difference at all between UNION and UNION ALL, but the latter option would perform better. Choosing normal union (‘set union’) in this case, is wrong.

  16. The analysis is too simplistic to be useful in any real world scenario. Also, the missing index warning in your query plans (the green text) means the whole exercise was executed on a non-optimal design. Perhaps the speed differences between UNION and OR were entirely due to the missing index.

  17. Scenario 1:

    “the UNION query is not performing well as the OR query though both are retuning same no of rows (3854).These results indicate that, for the above scenario UNION should be used.”

    I think you mean OR should be used in this case.

Software Reviews | Book Reviews | FAQs | Tips | Articles | Performance Tuning | Audit | BI | Clustering | Developer | Reporting | DBA | ASP.NET Ado | Views tips | | Developer FAQs | Replication Tips | OS Tips | Misc Tips | Index Tuning Tips | Hints Tips | High Availability Tips | Hardware Tips | ETL Tips | Components Tips | Configuration Tips | App Dev Tips | OLAP Tips | Admin Tips | Software Reviews | Error | Clustering FAQs | Performance Tuning FAQs | DBA FAQs |