How Using GUIDs in SQL Server Affect Index Performance

Testing Select Performance

The CPU time for all single row selects took 0ms. However, the average CPU and elapsed times for the 70,000 selected rows varied for each test. On average, Test1 performed better on CPU time than the other two tests, but Test2 faired better on logical reads and elapsed time than the other two tests. This was because the size of a Test2 row is 9 bytes, while the size of the Test1 and Test3 rows was 17 bytes.
Test1 caused a tremendous amount of page splits, and had a scan density around 12% when I ran a DBCC SHOWCONTIG after the inserts had completed. The Test2 table had a scan density around 98%, and the Test3 table had a scan density around 86% after the inserts (I could have rebuilt the indexes after the data load, but I didn’t).

All of the CPU times were close enough in range to be circumstantial, but the elapsed time varied among them all. I think the variance is because of the nature of the data being returned to Query Analyzer. For a more accurate test of elapsed time I could have used osql, but I didn’t. The tests seem to prove that the binary comparison of the GUID performs quite well with the other alternatives.

Conclusion

It would seem, as a result of this testing, that the uniqueidentifier data type performs about the same as an integer data type when filtering the data through the WHERE clause.

There are only a couple of downsides I can see in using the uniqueidentifier. First, and probably the most important, is the size of the column (which can be an issue). It adds overhead to a system to have to return more data pages than possibly necessary (I/O’s are the most expensive of system costs). The large size could in turn cause more page splits during inserts, depending on whether the data is static or not (but so can adding a non-necessary column to a table). The GUID being part of the index as well as being used as a pointer to leaf level pages is also very cumbersome and requires more space than it really needs to.

Second, with the use of the NEWID() function, inserts will be random and therefore will by nature cause page splits, while using an integer identity column will place the records sequentially into the table. “Ah” you say, “but the use of the function that parses the NEWID() and adds the converted getdate() value to the end does this as well.” Yes, it does, and still it remains that I don’t often load 4 million rows into a table, and so for me, that is not as big a deal (just don’t use the NEWID() when doing a mass load into a table).
The third issue is having to type a uniqueidentifier value into a query manually (more potential for user error).

So in short, it looks as though if you plan carefully and design wisely, then the use of the uniqueidentifier can perform as well in SELECT statements as an integer. Inserts are a different story, and we already discussed, that as long as they are sequential values, it really shouldn’t matter.

Copyright 2005 Zach Nichter

]]>

Leave a comment

Your email address will not be published.