Techniques for Indexing Low-Selectivity Columns in SQL Server

Indexed Views

The best solution would be to isolate the rows from the table that we want from those we do not — in a way, to “pre-select” the rows matching the criteria, before the query is even issued. What if we create a view containing the keys of only the rows we want, and index it, instead of the base table? This way we have a subset of the keys that match our criteria, persisted so it’s always at the ready, which could be joined to the base table to produce the right set of results. It also seems that the view should be much smaller than a covering index, because it persists only some of the keys.

If you’ve never used an indexed view, it’s an interesting construct: it is like a view, in that it contains a stored select statement that references data in base tables, but SQL Server will persist the data that the view returns, as if it were actually a table itself, instead of fetching the data from the base tables on demand. Any changes to the underlying data are automatically applied to the stored data composing the view. Indexed Views are often applied to “pre-resolve” complicated select statements that would otherwise take significant time to construct, especially those that are run frequently. See Books Online for more details, and be sure to read about the restrictions on indexed views related to connection settings and different editions of SQL Server.

create view keysOnly with schemabinding

     as ( select akey

      from dbo.baseTable

      where validAddr = 1 and getsMail = 1 )

GO

create unique clustered index IX_keysOnly on keysOnly( aKey )

Using this setup there is good news and bad. The following query now has query plan cost 2.87 and reads 2019, which is a significant improvement over the baseline, because the keys are in effect already selected by being persisted in the view. But it fetches only the keys:

select aKey from keysOnly with( noexpand )

The problem is that we need the data from the base table, not just the keys. The real information we need is something like this:

select fname, lname from basetable t where validAddr = 1 and getsMail = 1

In order to get that using the indexed view, we need to join the view to the base table, which, it turns out, is expensive:

select fname, lname

from keysOnly v with( noexpand )

inner join basetable t on v.akey = t.akey

We now see a query plan cost = 36. This is because SQL Server has to scan / merge-join both objects, and performing the join is actually more expensive than simply scanning the base table. The reason is that the join itself demands a table scan to locate the matching rows. Any advantage we gain by storing the correct keys is cancelled by the work needed to fetch the rest of the data from the base table.

From this result, it’s clear that the indexed view needs to contain the results for the query and not just the keys. So, I’m going to create an indexed view that essentially duplicates all the required data from the base table, for rows that match the criteria I need to select. This is something like a covering index, except that it stores only the matching rows, not all the rows, from the base table. I’m also going to include the State column from the base table, for additional tests below. The price for this, obviously, is the additional storage needed to persist the view:

create view completeSubset with schemabinding

     as ( select akey, fname, lname, state

      from dbo.baseTable

           where validAddr = 1 and getsMail = 1 )

GO

create unique clustered index IX_completeSubset on completeSubset( aKey )

GO

select fname, lname from completeSubset with( noexpand )

This select statement now has query plan cost = 4.9 and reads = 4843, which is about 1/5 the original cost. We’ve hit on something that will make a meaningful improvement. Adding other criteria to the query still shows similar benefit. Let’s look for all the people in the sample data with state = Montana:

Selecting from the base table:

select fname, lname, state

from dbo.baseTable

where validAddr = 1 and getsMail = 1

     and state = ‘MT’

Query plan cost: 21.57

And from the indexed view:

select fname, lname, state

from completeSubset with( noexpand )

where state = ‘MT’

Query plan cost: 5.05

Continues…

Leave a comment

Your email address will not be published.