Some Advice Required….. | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Some Advice Required…..

I have been given the task of optimizing a SQL server 2000 reporting database and have come up with some ideas as to how I can achieve this, but wanted some input before I plow in. 1) Is there any benefit in converting datetime columns to integers (I dont need the time portion)? 2) While essentially the same operation, is there a difference in performance between the BETWEEN comparison and a pair of >= and <= operators? It always seems to me that BETWEEN is slower. 3) The dataset was originally created as one wide denormalized table in order to speed up reporting, but I feel it is now too "wide" to be effective. I was planing to remove the 12 varchar columns we currently search on an place them into a lookup table. Each value taken out would be assigned a "mask" value, which would populate a pre defined number of bits within a search column. The varchar fields will then be removed from the original table and a search value created for each record as a BIGINT column. Search queries would then be performed using a logical AND (&) against this search value, hence allowing multiple column comparisons in one single WHERE clause. I dont feel I have explained this very well (but in my head I know exactly what I intend to do) and so I have outlined an example below: In this example, Brand has been assigned 2 bits of the search field (bits 0-1) and Product has been assigned 4 bits (bits 2-5). Lookup Table: Parameter Value Mask
Brand BA 1
Brand BB 2
Brand BC 3
Product PA 4
Product PB 8
Product PC 12
Product PD 16
Original Data: Record Brand Product
1 BA PA
2 BA PD
3 BC PB
4 BB PC
etc……. New Data: Record SearchValue
1 5
2 17
3 15
4 14
etc……. Sample query to extract Brand BA and Product PD: SELECT *
FROM <Table>
WHERE SearchValue & 63 = 17 Hmmmm……..
1>I dont think you require to convert the date into integer format as whenever while selecting you need to use always convert function for presentation issues.
2>Between is the not slower but much better then >=,<= operators. It makes the query more sargeable.
3>needs more analysis.
The date fields are never displayed, they aer only used for filtration. With this in mind, is there any benefit in using integer comparison over datetime comparison? Hmmmm……..
Is this a mere reporting database?
How is it filled with data? Any regular intervals?
How many rows are we talking about? DATETIME calculations are essentially nothing else but INTEGER operations, since the DATETIME data type consists of two INTEGERS. however, even if setting the time portion to midnight can be done very fast, it should be faster when you do not have to do it at all. So, if you can easily get rid of the time portion in your import routine, it might be worth trying it out. But this depends on how you get the data from the prod database to this reporting database. Personally I don’t think there is a significant difference between >=, <= and BETWEEN. When you look at the executions plans, you’ll observe that BETWEEN is resolved to a >=, <= combination anyway. Another advantage I see is that a DATETIME take up 8 bytes, while an INTEGER only occupies 4 bytes. Thus you can store more data on each single page which results in lesser IO operations to fetch the data. This should result in an increase in performance. —
Frank Kalis
Microsoft SQL Server MVP
http://www.insidesql.de
Ich unterstütze PASS Deutschland e.V. http://www.sqlpass.de)

This is purely a reporting database which is populated overnight by a set of DTS packages. The final data set is around 1.5 million records, which due to the large amount of denormalised data, is currently sized at around 1.2GB Hmmmm……..
Any chance you build up a test scenario on a dev machine to see how both approaches perform?<br /><br />FWIW, SQL Server 2005 will introduce a new data type "date". If you’re planning to upgrade soon…[<img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ />]<br /><br />–<br />Frank Kalis<br />Microsoft SQL Server MVP<br /<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a><br />Ich unterstütze PASS Deutschland e.V. <a target="_blank" href=http://www.sqlpass.de>http://www.sqlpass.de</a>) <br />
<blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><i>Originally posted by FrankKalis</i><br /><br />Any chance you build up a test scenario on a dev machine to see how both approaches perform?<br /><br />FWIW, SQL Server 2005 will introduce a new data type "date". If you’re planning to upgrade soon…[<img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ />]<br /><br />–<br />Frank Kalis<br />Microsoft SQL Server MVP<br /<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a><br />Ich unterstütze PASS Deutschland e.V. <a target="_blank" href=http://www.sqlpass.de>http://www.sqlpass.de</a>) <br /><br /><hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote"><br /><br />We do not currently have a test environment which is anywhere near the live environment, so comparisons would not be indicative of the actual gains made by using the new method. <br /><br />Also, it would be a very long process to set up this whole scenario in test, whihc is why I have come here to see if anybody knew off the top of their head whether or not this would be a viable solution.<br /><br />Hmmmm……..
Hmmmm……..[<img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ />]<br />It *looks* like a viable solution. But what do you expect? Would you implement such a solution (or not), just because someone in an online community confirmed that it is a good idea (or not)?<br />Without actually extensive testing i your environment it also looks a heart surgery without any anaesthesia to me. Why can’t you test it? After all, 1,500,000 rows with 1,2 GB of data really isn’t that much.<br /><br />–<br />Frank Kalis<br />Microsoft SQL Server MVP<br /<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a><br />Ich unterstütze PASS Deutschland e.V. <a target="_blank" href=http://www.sqlpass.de>http://www.sqlpass.de</a>) <br />
quote:1) Is there any benefit in converting datetime columns to integers (I dont need the time portion)?
Better use SmallDateTime. Smaller column size means more data fits the buffer which helps peformance.
quote:2) While essentially the same operation, is there a difference in performance between the BETWEEN comparison and a pair of >= and <= operators? It always seems to me that BETWEEN is slower.
As Frank said between is always converted to <= and >= combination, so there is no difference.
quote:3) The dataset was originally created as one wide denormalized table in order to speed up reporting, but I feel it is now too "wide" to be effective. I was planing to remove the 12 varchar columns we currently search on an place them into a lookup table. Each value taken out would be assigned a "mask" value, which would populate a pre defined number of bits within a search column. The varchar fields will then be removed from the original table and a search value created for each record as a BIGINT column. Search queries would then be performed using a logical AND (&) against this search value, hence allowing multiple column comparisons in one single WHERE clause. I dont feel I have explained this very well (but in my head I know exactly what I intend to do) and so I have outlined an example below: In this example, Brand has been assigned 2 bits of the search field (bits 0-1) and Product has been assigned 4 bits (bits 2-5). Lookup Table: Parameter Value Mask
Brand BA 1
Brand BB 2
Brand BC 3
Product PA 4
Product PB 8
Product PC 12
Product PD 16
Original Data: Record Brand Product
1 BA PA
2 BA PD
3 BC PB
4 BB PC
etc……. New Data: Record SearchValue
1 5
2 17
3 15
4 14
etc……. Sample query to extract Brand BA and Product PD: SELECT *
FROM <Table>
WHERE SearchValue & 63 = 17
I don’t think it is a good idea. First, denormalization should be used very selectively as an exception. Second Lookup tables make your code more straight forward. Third, you can’t use indexes really effectivly with your approach. Since you are not really experienced in area we are talking about and you can’t or don’t want to test, better play safe. Denormalize your db and use lookup tables. 1.5 million records is almost lookup table size, so no special fancy solution is necessary to achieve good performance.
quote:2>Between is the not slower but much better then >=,<= operators. It makes the query more sargeable.
Bold part is not true.
Firstly, I take offence at the not really experienced comment. I have created a test environment and tested the new implementation, with results coming in around 25% quicker than the original. Surely if you are using 12 lookup tables joined to the main table in a single query (which is quite possible) this would create a large performance hit as SQL Server can only physically join 2 tables at any one time? This is the reason why the previous developer denormalised the table in the first place I believe. The main point I was looking for an answer on was the use of a bigint column and applying a bit mask for searching rather than searching multiple text fields. Hmmmm……..
quote:Firstly, I take offence at the not really experienced comment.
I don’t know why, it is clear from your qeustions that you are not mssql server expert yet.
quote:I have created a test environment and tested the new implementation, with results coming in around 25% quicker than the original.
Ok, but what is "the original"?
quote:Surely if you are using 12 lookup tables joined to the main table in a single query (which is quite possible) this would create a large performance hit as SQL Server can only physically join 2 tables at any one time? This is the reason why the previous developer denormalised the table in the first place I believe.
It depends on such query frequency. Even in that case normalized structure may perform better unless you select all or majority of rows.
quote:The main point I was looking for an answer on was the use of a bigint column and applying a bit mask for searching rather than searching multiple text fields.
I gave you the answer. I’ll be glad to give you more explanations if you are willing to discuss without feeling offended.
There is a big difference between expert and inexperienced and I never claimed to be an expert, but you are entitled to your opinion and so we shall say no more on the subject. More explainations would be most appreciated, especially as that is the reason why I logged this item on here in the first place. I will attempt to set up a normalised version of the data for comparison. Will post results back here. Feel free to comment in the meantime.
Hmmmm……..
quote:Originally posted by marlong I have created a test environment and tested the new implementation, with results coming in around 25% quicker than the original. Surely if you are using 12 lookup tables joined to the main table in a single query (which is quite possible) this would create a large performance hit as SQL Server can only physically join 2 tables at any one time? This is the reason why the previous developer denormalised the table in the first place I believe. The main point I was looking for an answer on was the use of a bigint column and applying a bit mask for searching rather than searching multiple text fields.
First question must be whether your production server is not simply 25% busier than your test server. The criteria for performance that you give is: how long does it take for results to show up on my screen. That isn’t a true measure of server performance, as your test environment could be running on your own workstation, and the production environment on a remote server where the network may well be responsible for part of the perceived performance loss. Yes, building joins can increase processing time, especially if you don’t have proper indexes to rely on. Also, your definition of "denormalized" seems to include using alpha values for foreign keys, instead of numbers. Normalization doesn’t say you have to use substitute keys. Finally, while the use of bitmasks for filtering is a very clever idea, you must understand that SQL Server cannot use any type of index if you force it to do a bitmask comparison: it must do a table scan cannot do an index seek, only a table scan, or an index scan at best.
Nobody is here to offend anyone! Period! It is common practise in OLAP and/or reporting environments to flatten a table as much as possible to avoid JOIN operations in order to increase performance. I am very curious why you think, that SQL Server (or any other matured RDBMS for that matter) should be able to join only 2 tables at a time. The query processors and the query optimizers of database systems are quite sophisticated pieces of software that are to a certain extend able to figure out how to do things best to achieve maximum performance. And as long as one cannot get hands on the sourcecode, you can’t really tell what’s going on during that phase of a query. But it is also obvious that speed should be automatically increased when you don’t need to figure how to "best" join at all and only need to access only one table. If your main point was an answer to your bitmask question, I’m sorry, but you didn’t make this clear. At least to me, anyway. Okay, here are some thought regarding bitmasked columns in SQL Server.
SQL Server is not optimized for such bit-wise operations. It is optimized for set-based operations. Queries like
SELECT *
FROM master..sysdatabases
WHERE status & 8 = 8 are not able to make use of an index, AFAIK. So, you always will face table scans. Not a nice outlook on 1,500,000 rows with 1,2 GB of data. I believe that what mmarovic meant, is that you can frequently observe such questions regarding bitmasks are asked by people coming from a client side development background. While it might be there a valid way, it isn’t so in a database. You might also want to search the Google groups for related questions. I think the consensus is to avoid such constructs at all. —
Frank Kalis
Microsoft SQL Server MVP
http://www.insidesql.de
Ich unterstütze PASS Deutschland e.V. http://www.sqlpass.de)

<blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><i>Originally posted by Adriaan</i><br /><br /><blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><i>Originally posted by marlong</i><br /><br />I have created a test environment and tested the new implementation, with results coming in around 25% quicker than the original. <br /><br />Surely if you are using 12 lookup tables joined to the main table in a single query (which is quite possible) this would create a large performance hit as SQL Server can only physically join 2 tables at any one time? This is the reason why the previous developer denormalised the table in the first place I believe.<br /><br />The main point I was looking for an answer on was the use of a bigint column and applying a bit mask for searching rather than searching multiple text fields.<hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote">First question must be whether your production server is not simply 25% busier than your test server. The criteria for performance that you give is: how long does it take for results to show up on my screen. That isn’t a true measure of server performance, as your test environment could be running on your own workstation, and the production environment on a remote server where the network may well be responsible for part of the perceived performance loss.<br /><br />Yes, building joins can increase processing time, especially if you don’t have proper indexes to rely on.<br /><br />Also, your definition of "denormalized" seems to include using alpha values for foreign keys, instead of numbers. Normalization doesn’t say you have to use substitute keys.<br /><br />Finally, while the use of bitmasks for filtering is a very clever idea, you must understand that SQL Server cannot use any type of index if you force it to do a bitmask comparison: it must do a table scan.<br /><hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote"> <br /><br />I have created the test database on the same server as the production environment so the 25% increase is an actual increase <img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ /><br /><br />There are no foreign keys in my denormalised data. It is one flat rowset.<br /><br />Point taken on the lack of indexes on bitmask comparison.<br /><br />Thx <img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ /><br /><br />Hmmmm……..
quote:There is a big difference between expert and inexperienced and I never claimed to be an expert, but you are entitled to your opinion and so we shall say no more on the subject.
I agree.
quote:I will attempt to set up a normalised version of the data for comparison. Will post results back here. Feel free to comment in the meantime.
Right thing to do. Put indexes on foreign key columns. Selective denormalization may help, but this really has to be tested against reallistic query sample as it is in production environment. Even "experts" can’t know for sure if and how much it will help without proper testing.
quote:Finally, while the use of bitmasks for filtering is a very clever idea, you must understand that SQL Server cannot use any type of index if you force it to do a bitmask comparison: it must do a table scan cannot do an index seek, only a table scan, or an index scan at best.
True. 25% performance improvement may come from index scan over the table scan. However, if index seek is used as when you have normal one-to-many relationship between lookup and ‘fact’ table and index on fk column and you have condition on lookup table that significantly restricts number of rows returned performance will be much better. Just adding more clarifications… Also denormalization repeats long ‘description’ columns in each fact table row which means less rows fits buffers, so more i/o has to be performed…
]]>