I found this in the FAQ section and found it hard to believe that the use of Nulls id descouraged? What are you to do? Fill a char field with space? Int with 0? And what about Date? 01/01/1900? These are all valid values. I took the 20 question quiz and it had a question on this as well, mostly in reference to access data. I Thought an Index on the column would work well, but the quiz said to make the column not null with a default value. Huh? Isn't there more overhead associated with that? Here is the FAQ, but does anyone else have any insight in to nulls? Doesn't Nulls, being attributive nonexistance, have a place? I always thought so. SQL Server Performance Questions & Answers Question Can the use of NULLS in a database affect performance? Answer Yes, SQL Server's performance can be affected by using NULLS in your database. There are several reasons for this. First, NULLS that appear in fixed length columns (CHAR) take up the entire size of the column. So if you have a column that is 25 characters wide, and a NULL is stored in it, then SQL Server must store 25 characters to represent the NULL value. This added space increases the size of your database, which in turn means that it takes more I/O overhead to find the data you are looking for. Of course, one way around this is to use variable length fields instead. When NULLs are added to a variable length column, space is not unnecessarily wasted as it is with fixed length columns. Second, use of the ISNULL clause in your WHERE clause means that an index cannot be used for the query, and a table scan will be performed. This can greatly reduce performance. Third, the use of NULLS can lead to convoluted Transact-SQL code, which can mean code that doesn't run efficiently or that is buggy. Ideally, NULLs should be avoided in your SQL Server databases.
Thanks for posting this question in the forum, and I hope we get some feedback from users with practical experience in this area. I want to start my feedback first saying that I don't recommend that you should never use NULLS, but that if you have a choice, that they should be avoided. While the use of NULLS can cause a variety of potential problems, the one that bothers me the most is the use of ISNULL to identify them, which is a common practice. The probem is that ISNULL can't take advantage of an available indexes, and a table scan has to be performed. This is because each row of the table has to be examined, row by row, to see if the column is NULL or not (NULLS aren't indexed). This can greatly slow down some applications and put undue pressure on SQL Server's resources. If you don't use ISNULL in your code, and you don't mind that they take up extra space (in CHAR columns), and if you feel comfortable enough with them so that you write correct and logical T-SQL code, then using NULLS is OK, if you don't have any better solutions. ------------------ Brad M. McGehee Webmaster SQL-Server-Performance.Com
Brad, Thanks for the Reply. Yes Using IsNull (or any function in the predicate) becomes a stage 2 predicate (non sargable?) and a scan is required. But in most cases in a predicate, if I'm going after data that I need to interogate, then more than likely I'm not looking for the non existance of a thing, and nulls are immeditaley eliminated from the result set. Just curious about Nulls and indexes. After statistics in an index (that contains nulls) are updated, don't all of the nulls sort to the "top", and hence make a very efficient index when trying to find rows with hulls in that column. Also too, doesn't it make for more efficient code when trying to find relational non-existance instead of doing NOT EXISTS, which will scan anyway. For example woudln't trying find all rows in table1 that don't exist in table 2 be more efficient like This: Select col1 from table1 l left join table2 r where l.key = r.key and r.key Is Null Instead of: Select col1 from table1 l where Not Exists (Select 1 From Table2 r Where l.key = r.key) Just like anyones thoughts? Thanks Brett
Well, here are my comments... 1. I typically always use variable length fields, mainly due to my old Oracle prgramming background. 2. I avoid nulls as well due to programming problems I've experienced in the past mainly in ASP. 3. In the select above research the use of inner and outer joins. I believe if you use a join properly you may get your result. Or possibly use where the r.key = "". Just some ideas and thoughts.. ---------- T Kelley MS, MCDBA, OCA, CIW
As tkelley has suggested, the use of joins would be a much more efficient way of doing this. From my understanding, NULLS do not float to the top of an index, and that is why indexes aren't useful when trying to access them. Can anyone explain exactly how NULLS are stored in indexes? I have never seen this information. ------------------ Brad M. McGehee Webmaster SQL-Server-Performance.Com
I found an interesting article at<a target="_blank" href=http://msdn.microsoft.com/msdnmag/issues/02/07/datapoints/default.aspx>http://msdn.microsoft.com/msdnmag/issues/02/07/datapoints/default.aspx</a><br />that may be worth looking at regarding using Inner Joins and Nulls in regards to performance. Not real detailed, but informative.<br /><br /><img src='/community/emoticons/emotion-1.gif' alt='' /><br /><br />----------<br />T Kelley<br />MS, MCDBA, OCA, CIW<br /><br />
I guess the best way is to consider NULLs as an option and not as a default. Tell your rdbms what you know. If you know it's makes no diference if the column is blanc/zero/predefined-date or it is null, don't us the null, but use the "default"-value instead. This has the advantage that the "is null"-scanning may be avoided. If it makes a diference to know you don't know the columns value, then use the null-option. Its of no use to make you column nullable if it always has a value. Your db-users that write queries, have to know and understand your datamodel. If they mix-up e.g. nulls and zeroes, then where does is go ? If you use default values, everyone using you db, would have to know they are default values. This is certanly true for dates. If I would have a shipment-date to be null and this would tipicaly be in 2 % of the rows, finding the "to-be-shipped" would result in a tablescan. Having a default value (04/01/5555) would avoid this , but would mean everyone should know and use it when querying. Development-time vs performance-cost. Consider it to be like using the ADO-storedprocedure-REFRESH-methode in all your applications vs only on those which realy need this funcionality.