Data Types and NULL values | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Data Types and NULL values

Guys, I’m trying to find out specific information on Data Types and NULL values performance wise. I can find plenty of information on the web saying to use narrow column definitions to improve speed because of the reduced time to read from disk – disk being the slowest part of a db – but no specifics on percentage speed improvements for any scenarios. Similarly, I can’t find any specific measures for the performance enhancement gained by not using NULLs. I guess this is a harder one to quantify, because it comes down mainly to the query being non SARGable if the NULL column is involved on the where clause, so potentially being very much slower due to inability to use index, but still, I would have thought there might have been some studies performed. Anyone have any documents, any links, anything, to quantify the difference these items make, performance-wise? I understand I should run tests myself on my data, but I want to have some independent information as well. Everyone just seems to say it’s faster, but not by how much. Dave Hilditch.
Hi ya, whilst only my personal opinion, I don’t think that nullable columns present a measurable performance difference… a nullable column will be considered for index usage, the null value is a valid value in the index, and it will link intermediate pages, data pages in exactly the same way. The only difference is that a nullable column has an extra byte or two to store the length of the field… Cheers
Twan
Here’s an interesting posting from SQL Server Central.com on the use of NULL in general.<br /><blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><br />First off I will state I disagree with CJ Date and the others who think NULL is a waste in SQL. NULL does not apply only to SQL but to most all languages and in fact came from the others as far as concept. NULL is the abstract expression of the unknown. Without the abstract value of NULL how then do you define a completely unknown value. Some suggest defaults but give no clear idea as to how these defaults solve the NULL problem as they will still represent the unknown. Now consider this when a datatype is fixed length such as an int or char how is the data express in the database when it does exist. Here is the DBCC PAGE output from a table I created tx with column d int, c char(10), and a int. First off ignore the 00160010 at the begining of each hex set as it has special meaning but not important here. Note: int is 4 bytes char(10) is 10 and int is 4 for a total of 18 bytes In Slot 0 (record 1 on the page) the space from the [ and ] is 20 but the 0003 in the last set can be ignored because it has to do with end of data so subtract 2 (HEX is base on 00 thru FF as 1 byte and we are removing 2 sets). Thus we have 18 total bytes. Also note the order of bytes is funny in that 00000001 is actually read 00010000 and so on to read out the rows in this view. 436B4060: 00160010 [00000001 00000000 00000000 ……………. 436B4070: 00000000 00030000] 06 ……… Now none of that is key what is key is the 06 at the very end seperate from the rest in this case. It is important in the fact that columns 2 and 3 are NULL in the first example. Now as you see it is 6 but the key is it is a bitmask. Column 1 = 1, 2 = 2, 3 = 4, 4 = 8 and so on up to 1,024 columns So 2 + 4 = 6 the bitmask tells me column 2 and 3 are NULL as this is the NULL bitmask. <br /><br />Slot 0 Offset 0x60<br />——————<br />Record Type = PRIMARY_RECORD <br />Record Attributes = NULL_BITMAP <br />436B4060: 00160010 00000001 00000000 00000000 …………….<br />436B4070: 00000000 00030000 06 ………<br />d = 1 <br />c = [NULL] <br />a = [NULL] <br /><br />Slot 1 Offset 0x79<br />——————<br />Record Type = PRIMARY_RECORD <br />Record Attributes = NULL_BITMAP <br />436B4079: 00160010 00000002 00000000 00000000 …………….<br />436B4089: 00000000 00030000 06 ………<br />d = 2 <br />c = [NULL] <br />a = [NULL] <br /><br />Slot 2 Offset 0x92<br />——————<br />Record Type = PRIMARY_RECORD <br />Record Attributes = NULL_BITMAP <br />436B4092: 00160010 00000003 2020207a 20202020 ……..z <br />436B40A2: 00002020 00030000 04 …….<br />d = 3 <br />c = z <br />a = [NULL] <br /><br />Slot 3 Offset 0xab<br />——————<br />Record Type = PRIMARY_RECORD <br />Record Attributes = NULL_BITMAP <br />436B40AB: 00160010 00000004 2020207a 20202020 ……..z <br />436B40BB: 00012020 00030000 00 …….<br />d = 4 <br />c = z <br />a = 1<br /><br />Ok so under the hood the data is still stored but with 0 and a bitmask to tell the engine which columns are NULL. Now from there what happens with varchars. It is a bit different. The table was defined as ty with column a int, b varchar(5), c tinyint, and d char(<img src=’/community/emoticons/emotion-11.gif’ alt=’8)’ />. Here is the DBCC PAGE output <br /><br />Slot 0 Offset 0x60<br />——————<br />Record Type = PRIMARY_RECORD <br />Record Attributes = NULL_BITMAP <br />43808060: 00110010 00000001 00000000 00000000 …………….<br />43808070: 0e000400 ….<br />a = 1 <br />b = [NULL] <br />c = [NULL] <br />d = [NULL] <br /><br />Slot 1 Offset 0x74<br />——————<br />Record Type = PRIMARY_RECORD <br />Record Attributes = NULL_BITMAP <br />43808074: 00110010 00000001 00000002 00000000 …………….<br />43808084: 0a000400 ….<br />a = 1 <br />b = [NULL] <br />c = 2 <br />d = [NULL] <br /><br />Slot 2 Offset 0x88<br />——————<br />Record Type = PRIMARY_RECORD <br />Record Attributes = NULL_BITMAP <br />43808088: 00110010 00000001 61616102 20202061 ………aaaa <br />43808098: 02000420 …<br />a = 1 <br />b = [NULL] <br />c = 2 <br />d = aaaa <br /><br />Slot 3 Offset 0x9c<br />——————<br />Record Type = PRIMARY_RECORD <br />Record Attributes = NULL_BITMAP VARIABLE_COLUMNS <br />4380809C: 00110030 00000001 61616102 20202061 0……..aaaa <br />438080AC: 00000420 00190001 63 …….c<br />a = 1 <br />b = c <br />c = 2 <br />d = aaaa <br /><br />Now there are a few differences here that make using NULLable varchars good and tell us a bit about the engine. In Slot 0 we notice we are 5 bytes short of what a usual record looks like when all fixed length, this is because nothing is written for the varchar field when not filled. But we see the fist byte on the second row has 0e00 whih is what we are looking for here. What has happened is the engine bitmasks the fact 2,3, and 4 are NULL (in other words 2+4+8=14 or 0E in HEX) so the engine uses the NULL marker even for varchar fields. Proceed to SLOT 2 and SLOT 3 and you see the respective values for NULL bitmask 0a (columsn 2 and 4) and 02 (column 2 only). Now the fun comes when we add data to the varchar our row grows by several bytes but not neccessarily equal to the 5 for the varchar. What happens here is there is a marker added to tell the engine where the varchar data for the first non-fixed-length column begins and takes up 2 bytes itself (this is the offset from begining of record). a new one is added for each varchar column that is put into play and all come before the varchar data for all columns is written (or any non-fi xed-length datatype). The key thou is the bitmask for columns is used under all cirumstances to note when there are NULL values. So in the case of a non-fixed-length column that may have NULLs using a default instead is a waste of space which is a commoditity best served otherwise. Also, if you are going to use a Non-NULL value how do you deal with it more effectively (or at least as effectively) compared to the NULL bitmask. You still have to dream up something that will add the overhead for NULL to be dealt with but you will rely on the developer coding a NULL or UNKNOWN value process which opens the door for many errors with a strict guideline. My opinion is use NULL when possible but in some situations it may make more sense to use a default for no data such as ” because it may be easier to deal with user input in char values. Just be carefull that good records do not have a overlap issue such a $0 != $0 becuase one is unknown and oen represents no value. As for non-fixed-length datatypes thou, it is still up to you, but keep in mind you save space with NULL. I hope all that helps a bit or leads you to some understanding of NULL and it’s handling but as for the whole NULLs are not valueable I am afraid that is not the cause becuase you will always run into the need to express an unknown value as opposed to a non-existant value. <br /><hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote"><br />I think using NULLs when you have to implement this three-valued logic is a great thing. For everything else, try to avoid them.<br /><br />FWIW<br /><br />———————–<br />–Frank<br /<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a><br />———————–<br />
I would have thought a good way of representing a null value would be to have no value at all i.e normalise the tables properly. e.g you could have an authors table and it maybe a column that lists their first book. authorid int not null
authorname varchar(50) not null
firstbook varchar(50) null or you could have two tables, authors and books, and just not have an entry in the books table for a particular author if he has no first book. Anyway, I disagree with myself above in certain situations, as there are times where this approach is unusable due to resulting 10 table joins. Still, does anyone have any specifics about speed issues, or was it all just an urban myth? I was taught this in MCDBA course, and through all my readings on the internet, newsgroups etc, but have never seen any stats to prove it. Also, the original question was a two part question really and the other part I believe could be more imporant – the width of columns. For example, we have a table that currently has tinyint, smallint, int, bigint then multiple reals (nullable). Joins *frequently* (i.e. always) occur on the tinyint, smallint, and bigint columns in that order. I haven’t gotten round to performing any test scenarios yet to determine what difference it would make to our environment if we change our tinyint and smallint columns to int columns but I expect it would make a noticable difference. Does anyone know where there are any independent tests that show the difference joining on these data types would make? Dave Hilditch
]]>