Create an index on a field that has a lot of null value | SQL Server Performance Forums
SQL Server Performance Forum – Threads Archive
Create an index on a field that has a lot of null valueI like to know what the impact it may have if I create an index on a field that has a lot of null value (say 30% are null). Would it be better for me to default them to zero (like 1/1/1900)? To be specific, let say both queries below return 3,000 records and I have an index on the datecreate field. From the optimizer’s standpoint, would the optimizer function more efficiently if the datecreate has no null value comparing to datecreate that has null value. Also, would the optimizer in SQL 2008 handle the null value better than SQL 2005 so I don’t even need to ask this question? What does the index tree look like for null value?
select ID from tblemployee where datecreate between ‘1/1/2009’ and ‘8/1/2009’.
Select ID from tblemployee where datecreate is null
One of the things that immediately come to mind is that in SQL Server 2008 you can take advantage of filtered indexes that can ignore NULLs altogether. The most advantage here is that you can reduce the amount of data stored in an index. Another advantage is that index maintenance operations due to DML statement can be reduced.
My experience in SQL 2005 is that the null values are treated like any other as far as index lookups are concerned.
Note that anytime you have a large number of one value (or in this case null), poor performance could result from use of an index on that column. The reason is that a query plan may get generated that performs lookups using that index when a typical non-common value is used in the query. This same query plan will perform poorly when the common value (be it null or zero) is supplied. As another post describes this problem can be resolved in SQL 2008 by using filtered indexes.
Hi, I tried filtered indexes, it can be useful here are my tests: http://community.ugiss.org/blogs/sgovoni/archive/2009/09/12/filtered-indexes.aspx
Sgovoni’s reference helps to some extent if you are really looking for some storage space and limits of performance optimization.
A filtered index is an optimized nonclustered index, especially suited to cover queries that select from a well-defined subset of data. It uses a filter predicate to index a portion of rows in the table. A well-designed filtered index can improve query performance, reduce index maintenance costs, and reduce index storage costs compared with full-table indexes.
BOL clarifies that: In order to design effective filtered indexes, it is important to understand what queries your application uses and how they relate to subsets of your data. Some examples of data that have well-defined subsets are columns with mostly NULL values, columns with heterogeneous categories of values and columns with distinct ranges of values. The following design considerations give a variety of scenarios for when a filtered index can provide advantages over full-table indexes.