SQL Server Performance Forum – Threads Archive

Join Help

For best join performance, the indexes on the columns being joined should be numeric data types, not CHAR or VARCHAR. The overhead is lower and join performance is faster. [6.5, 7.0, 2000] Is this true if I have CHAR(3) data
which is 1 byte smaller the the int data size and I’m using a table of about 100 rows any thoughts?

I can’t say without testing, but I feel a numeric data type will still be faster. This is because it is faster for SQL Server to deal with numbers than characters. Any other opinions, or anyone willing to perform some testing? —————————–
Brad M. McGehee, MVP
Webmaster
SQL-Server-Performance.Com

Already on the case <img src=’/community/emoticons/emotion-1.gif’ alt=’

‘ /> Here are my findings: Following is a script to set up and populate 4 tables. 2 tables with a char(3) field, clustered, and 2 tables with an int field clustered. Data is populated randomly into it, and then the two char tables are joined, and likewise for the int tables. To be as fair a test as possible, the char(3) data is designed to be such that there is the same distribution and selectivity as the ints (the int is random 1000, the char field also has 1000 possible combinations (10*10*10)) I also decided to add two int fields not in the clustered index to each tables, to be a real world test. <pre> drop table intjoin drop table charjoin create table intjoin (intcol integer, somedata int, somedata2 int) create clustered index idx_int ON intjoin (intcol) create table charjoin (charcol char(3), somedata int, somedata2 int) create clustered index idx_char ON charjoin (charcol) drop table intjoin2 drop table charjoin2 create table intjoin2 (intcol integer, somedata int, somedata2 int) create clustered index idx_int ON intjoin2 (intcol) create table charjoin2 (charcol char(3), somedata int, somedata2 int) create clustered index idx_char ON charjoin2 (charcol) SET NOCOUNT ON DECLARE @cnt int DECLARE @tmpstr char(3) SELECT @cnt = 50000 while (@cnt > 0) begin insert into intjoin (intcol, somedata, somedata2) values (FLOOR(RAND()*1000), FLOOR(RAND()*1000), FLOOR(RAND()*1000)) SELECT @tmpstr = CHAR(FLOOR(RAND()*10)+65)+CHAR(FLOOR(RAND()*10)+65)+CHAR(FLOOR(RAND()*10)+65) insert into charjoin (charcol, somedata, somedata2) values (@tmpstr, FLOOR(RAND()*1000), FLOOR(RAND()*1000)) insert into intjoin2 (intcol, somedata, somedata2) values (FLOOR(RAND()*1000), FLOOR(RAND()*1000), FLOOR(RAND()*1000)) SELECT @tmpstr = CHAR(FLOOR(RAND()*10)+65)+CHAR(FLOOR(RAND()*10)+65)+CHAR(FLOOR(RAND()*10)+65) insert into charjoin2 (charcol, somedata, somedata2) values (@tmpstr, FLOOR(RAND()*1000), FLOOR(RAND()*1000)) SELECT @[email protected] end SET NOCOUNT OFF </pre> We can see from the execution plan, that as expected, there are no discrepancies so far. Now run a profile on the following <pre> select * from charjoin A LEFT JOIN charjoin2 B ON (A.charcol = B.charcol) select * from intjoin A LEFT JOIN intjoin2 B ON (A.intcol = B.intcol) </pre> My results…. <pre> STMT READS CPU DURTN COUNT select * from charjoin A LEFT JOIN charjoin2 B ON (A.charcol = B.charcol) 1080 8221 129296 2499446 select * from intjoin A LEFT JOIN intjoin2 B ON (A.intcol = B.intcol) 1160 6800 128575 2501848 </pre> So, although the intjoin required marginally more reads (we can probably attribute this to a slight variation in the data distribution, because notice the record count is understandably different also), it actually completed ever so slightly quicker. Personally I would explain this by the fact a 4 byte integer fits nicely into a 32 bit register, and a XOR of two cpu registers to test equality is computationally less costly than a string comparison (as we can see from the difference in cpu cost). [edit] I suspect the above argument is flawed <img src=’/community/emoticons/emotion-1.gif’ alt=’

‘ /> I have since realised the execution plan uses a hash join and so no such comparison will take place, it will be hash values which are compared instead, which should be fairly similar for both data types. [/edit] Either way, there seems to be not a significant difference between int and char(3), and in your case it is almost negligible for a table of 100 records.

You have not mentioned the size of both tables but the thing to bear in mind is that providing there is no other limiting factor, just make sure that any column in a join (and a where clause) is indexed.
One factor may be load performance that could be impacted by too many indexes. If the tables are both small, you will notice no difference and other factors such as other processes are much more likely to affect your performance.

]]>