Improving 2D Range Query Performance in SQL Server

When using the BETWEEN operator on multiple
columns, you are likely using a 2D range query.  Such queries perform very
poorly in SQL Server.   This article will tell you how you can often use
additional information to rewrite such queries for much better performance. 
We’ll conclude by rewriting an actual BETWEEN query in a manner that increases
performance to more than 100 times faster. (!)

What is a 2D range query?

A 2D range query is one where two
independent columns are being tested against a range of values.  For instance,
the query:

SELECT*FROM MyTable WHERE @constant_date BETWEEN
DateStart AND DateEnd

Has two degrees of freedom: both DateStart
and DateEnd are being range tested.  DateStart  must lie between {min_date} and
@constant_date, and DateEnd between @constant_date and {max_date}.  This is a
2D range query.

Note: the very similar-looking query:

SELECT*FROM MyTable WHERE DateColumn BETWEEN
@constant_date1 AND @constant_date2

Is not a 2D range query.  It
is one-dimensional– only one column is being queried.

You can have 2D range queries without using
the BETWEEN operator.  For instance, the query:

SELECT*FROM MyTable WHERE
Column1 > @SomeValue AND Column2 > @SomeOtherValue

Is also a 2D range query, and one that
cannot be easily handled by a B-Tree Index.

There can be higher dimensional queries as
well. For instance, 3D range queries are common for geospatial data.

The Problem

SQL Server uses B-Tree indexes for
everything but spatial data.  A B-Tree is single-dimensional and cannot be
properly utilized in 2D range queries.  Taking a closer look at our sample
query:

…WHERE @date BETWEEN DateStartANDDateEnd

You can create an index onDateStart or
DateEnd, or the composite index (DateStart, DateEnd).  An index onDateStart
alone allows the QO (Query Optimizer) to narrow the range of values only to the
range {min_date} to @date.   The rows must then be scanned to find those that
satisfy DateEnds’s portion of the query.  Statistically, this averages out to
only eliminating half the rows.   Unless @date happens to be very close to what
the index stats says is the highest value, the engine is therefore better off
doing a full table scan.  The same problem applies to an index on DateEnd.

What about the composite index?  That
doesn’t help either—values for DateEnd are sorted only within each specific
value of DateStart.  Every time the index moves to a new value of DtStart, an
entire new B-Tree is built for DateEnd values that correspond to that specific
start date.  The composite index gains you a small amount because it’s coveringthe
query– it contains both search arguments and therefore saves you some RIDs
(row ID) lookups.  But, on average, you still have to scan half the entire
index to answer the query.  If you have other columns in the query, the QO mayfind
its cheaper to do a full table scan anyway.

Note: quadtree indexes solve this problem
for the 2D case, just as octrees work well for 3D.   In the second half of this
article, I will tell you how you can utilize such indexes in SQL Server for
non-spatial data types, but this article will focus on improving the
performance of higher-dimension range queries with a standard B-Tree index.

Continues…

Pages: 1 2




Array

No comments yet... Be the first to leave a reply!