SQL Server Performance

Use of Nulls discouraged?

Discussion in 'Performance Tuning for DBAs' started by x002548, Jan 2, 2003.

  1. x002548 New Member

    I found this in the FAQ section and found it hard to believe that the use of Nulls id descouraged? What are you to do? Fill a char field with space? Int with 0? And what about Date? 01/01/1900? These are all valid values. I took the 20 question quiz and it had a question on this as well, mostly in reference to access data. I Thought an Index on the column would work well, but the quiz said to make the column not null with a default value. Huh? Isn't there more overhead associated with that?

    Here is the FAQ, but does anyone else have any insight in to nulls? Doesn't Nulls, being attributive nonexistance, have a place? I always thought so.

    SQL Server Performance Questions & Answers


    Question

    Can the use of NULLS in a database affect performance?




    Answer

    Yes, SQL Server's performance can be affected by using NULLS in your database. There are several reasons for this.

    First, NULLS that appear in fixed length columns (CHAR) take up the entire size of the column.
    So if you have a column that is 25 characters wide, and a NULL is stored in it, then SQL Server must store
    25 characters to represent the NULL value. This added space increases the size of your database, which in turn
    means that it takes more I/O overhead to find the data you are looking for. Of course, one way around this is
    to use variable length fields instead. When NULLs are added to a variable length column, space is not
    unnecessarily wasted as it is with fixed length columns.

    Second, use of the ISNULL clause in your WHERE clause means that an index cannot be used for the query,
    and a table scan will be performed. This can greatly reduce performance.

    Third, the use of NULLS can lead to convoluted Transact-SQL code, which can mean code that doesn't run
    efficiently or that is buggy.

    Ideally, NULLs should be avoided in your SQL Server databases.
  2. bradmcgehee New Member

    Thanks for posting this question in the forum, and I hope we get some feedback from users with practical experience in this area.

    I want to start my feedback first saying that I don't recommend that you should never use NULLS, but that if you have a choice, that they should be avoided.

    While the use of NULLS can cause a variety of potential problems, the one that bothers me the most is the use of ISNULL to identify them, which is a common practice. The probem is that ISNULL can't take advantage of an available indexes, and a table scan has to be performed. This is because each row of the table has to be examined, row by row, to see if the column is NULL or not (NULLS aren't indexed). This can greatly slow down some applications and put undue pressure on SQL Server's resources.

    If you don't use ISNULL in your code, and you don't mind that they take up extra space (in CHAR columns), and if you feel comfortable enough with them so that you write correct and logical T-SQL code, then using NULLS is OK, if you don't have any better solutions.


    ------------------
    Brad M. McGehee
    Webmaster
    SQL-Server-Performance.Com
  3. x002548 New Member

    Brad,

    Thanks for the Reply. Yes Using IsNull (or any function in the predicate) becomes a stage 2 predicate (non sargable?) and a scan is required. But in most cases in a predicate, if I'm going after data that I need to interogate, then more than likely I'm not looking for the non existance of a thing, and nulls are immeditaley eliminated from the result set.

    Just curious about Nulls and indexes. After statistics in an index (that contains nulls) are updated, don't all of the nulls sort to the "top", and hence make a very efficient index when trying to find rows with hulls in that column.

    Also too, doesn't it make for more efficient code when trying to find relational non-existance instead of doing NOT EXISTS, which will scan anyway.

    For example woudln't trying find all rows in table1 that don't exist in table 2 be more efficient like This:

    Select col1
    from table1 l
    left join table2 r
    where l.key = r.key
    and r.key Is Null

    Instead of:

    Select col1
    from table1 l
    where Not Exists (Select 1 From Table2 r
    Where l.key = r.key)

    Just like anyones thoughts?

    Thanks

    Brett
  4. tkelley New Member

    Well, here are my comments...

    1. I typically always use variable length fields, mainly due to my old Oracle prgramming background.
    2. I avoid nulls as well due to programming problems I've experienced in the past mainly in ASP.
    3. In the select above research the use of inner and outer joins. I believe if you use a join properly you may get your result. Or possibly use where the r.key = "".

    Just some ideas and thoughts..

    ----------
    T Kelley
    MS, MCDBA, OCA, CIW

  5. bradmcgehee New Member

    As tkelley has suggested, the use of joins would be a much more efficient way of doing this. From my understanding, NULLS do not float to the top of an index, and that is why indexes aren't useful when trying to access them. Can anyone explain exactly how NULLS are stored in indexes? I have never seen this information.


    ------------------
    Brad M. McGehee
    Webmaster
    SQL-Server-Performance.Com
  6. tkelley New Member

    I found an interesting article at<a target="_blank" href=http://msdn.microsoft.com/msdnmag/issues/02/07/datapoints/default.aspx>http://msdn.microsoft.com/msdnmag/issues/02/07/datapoints/default.aspx</a><br />that may be worth looking at regarding using Inner Joins and Nulls in regards to performance. Not real detailed, but informative.<br /><br /><img src='/community/emoticons/emotion-1.gif' alt=':)' /><br /><br />----------<br />T Kelley<br />MS, MCDBA, OCA, CIW<br /><br />
  7. alzdba Member

    I guess the best way is to consider NULLs as an option and not as a default. Tell your rdbms what you know. If you know it's makes no diference if the column is blanc/zero/predefined-date or it is null, don't us the null, but use the "default"-value instead. This has the advantage that the "is null"-scanning may be avoided. If it makes a diference to know you don't know the columns value, then use the null-option. Its of no use to make you column nullable if it always has a value. Your db-users that write queries, have to know and understand your datamodel. If they mix-up e.g. nulls and zeroes, then where does is go ? If you use default values, everyone using you db, would have to know they are default values. This is certanly true for dates. If I would have a shipment-date to be null and this would tipicaly be in 2 % of the rows, finding the "to-be-shipped" would result in a tablescan. Having a default value (04/01/5555) would avoid this , but would mean everyone should know and use it when querying. Development-time vs performance-cost. Consider it to be like using the ADO-storedprocedure-REFRESH-methode in all your applications vs only on those which realy need this funcionality.

Share This Page