Improving 2D Range Query Performance in SQL Server

When using the BETWEEN operator on multiple columns, you are likely using a 2D range query.  Such queries perform very poorly in SQL Server.   This article will tell you how you can often use additional information to rewrite such queries for much better performance.  We’ll conclude by rewriting an actual BETWEEN query in a manner that increases performance to more than 100 times faster. (!)

What is a 2D range query?

A 2D range query is one where two independent columns are being tested against a range of values.  For instance, the query:

SELECT*FROM MyTable WHERE @constant_date BETWEEN DateStart AND DateEnd

Has two degrees of freedom: both DateStart and DateEnd are being range tested.  DateStart  must lie between {min_date} and @constant_date, and DateEnd between @constant_date and {max_date}.  This is a 2D range query.

Note: the very similar-looking query:

SELECT*FROM MyTable WHERE DateColumn BETWEEN @constant_date1 AND @constant_date2

Is not a 2D range query.  It is one-dimensional– only one column is being queried.

You can have 2D range queries without using the BETWEEN operator.  For instance, the query:

SELECT*FROM MyTable WHERE Column1 > @SomeValue AND Column2 > @SomeOtherValue

Is also a 2D range query, and one that cannot be easily handled by a B-Tree Index.

There can be higher dimensional queries as well. For instance, 3D range queries are common for geospatial data.

The Problem

SQL Server uses B-Tree indexes for everything but spatial data.  A B-Tree is single-dimensional and cannot be properly utilized in 2D range queries.  Taking a closer look at our sample query:

…WHERE @date BETWEEN DateStartANDDateEnd

You can create an index onDateStart or DateEnd, or the composite index (DateStart, DateEnd).  An index onDateStart alone allows the QO (Query Optimizer) to narrow the range of values only to the range {min_date} to @date.   The rows must then be scanned to find those that satisfy DateEnds’s portion of the query.  Statistically, this averages out to only eliminating half the rows.   Unless @date happens to be very close to what the index stats says is the highest value, the engine is therefore better off doing a full table scan.  The same problem applies to an index on DateEnd.

What about the composite index?  That doesn’t help either—values for DateEnd are sorted only within each specific value of DateStart.  Every time the index moves to a new value of DtStart, an entire new B-Tree is built for DateEnd values that correspond to that specific start date.  The composite index gains you a small amount because it’s coveringthe query– it contains both search arguments and therefore saves you some RIDs (row ID) lookups.  But, on average, you still have to scan half the entire index to answer the query.  If you have other columns in the query, the QO mayfind its cheaper to do a full table scan anyway.

Note: quadtree indexes solve this problem for the 2D case, just as octrees work well for 3D.   In the second half of this article, I will tell you how you can utilize such indexes in SQL Server for non-spatial data types, but this article will focus on improving the performance of higher-dimension range queries with a standard B-Tree index.

Continues…

Leave a comment

Your email address will not be published.