Discussion in 'T-SQL Performance Tuning for Developers' started by joker197cinque, Jun 5, 2003.
WHERE Grafico IS NOT NULL AND RIGHT(Grafico, 3) IN ('jpg', 'jpe', 'gif')
how about creating a calulated column on RIGHT(Grafico, 3) with an index?
You don't even need the IS NOT NULL part of the WHERE clause. The IN('jpg','jpe','gif') precludes the selection of any other value apart of those 3 file types in your Grafico column.
I think this will work as
WHERE Grafico like 'jpg%'
OR Grafico like 'jpe%'
OR Grafico like 'gif%'
Not very sure if this is the optimal solution. Try executing the same and see the execution plna if this makes sense.
I thought this is appropriate forum, so moved the thread from Perf.Tuning for DBA's forum.
Gaurav, I think Joker197 is trying to pick only records with a Grafico (read Graphic filename extension - My Italian is obviuosly poor) value of either jpg , gif or jpe. You put the '%' after the desired file extension instead of at the beginning. Re-writing your solution, It would then read like this:
WHERE Grafico like '%jpg'
OR Grafico like '%jpe'
OR Grafico like '%gif'
This will work fine because the IN clause will not support the wildcard character.
Sorry about that I confused RIGHT() with LEFT(). See the solution suggested by vbkenya is alternative to what was initially posted in forum. But I would like to mention that the solution suggested will not have any SARG Conditions as for a string comparison to be SARG, the % operatoe should come at end. Let me explain it this way- if the query optimizer has to llok for record like '%jpg', it has to scan through all the records and hence the condition is non-sarg.
I tested it on a table of 240,000 rows, on a column of varchar(20) and here are the results:
LIKE '%jpg' or Grafico like '%jpe' runs in 8000-8200 ms.
RIGHT(Grafico, 3) IN ('jpg', 'jpe', 'gif') runs in 1700 - 1800 ms
calculated column + INDEX on it
calc_column IN ('jpg', 'jpe', 'gif') runs in 40-60 ms !
Maintaining this calculated index is also an overhead. This may be a concern if there are too many inserts in the table.
....however, in Jokers case the logic involves looking at the last three characters (file extension) in that field. The other solution would be to break up the field into 2 columns, GraficoFilename and GraficoFileExtension and use the extension field for the query
SELECT......WHERE GraficoExtension IN ('jpg','jpe','gif')
or he could achieve almost the same results with
SELECT.....WHERE GraficoExtension ='jpg'
OR GraficoExtension ='jpe'
OR GraficoExtension ='gif'
OR even this shorter SARGable wildcard mask (Assuming that there are no other extensions that start with the letters 'jp'
SELECT.....WHERE GraficoExtension ='gif'
OR GraficoExtension LIKE 'jp%'
Try this...But if you do, an index on the Graphicoextension field would benefit all the three queries.
It's a non-clustered index on char/vharchar(3). Doesn't seem too hard to maintain. In the time it takes to run 3 LIKE '%xxx' searched, I could run 3 searches and a DBCC DBREINDEX()... <img src='/community/emoticons/emotion-1.gif' alt='' /><br /><br />Bambola.
Haven't tried those statistics yet but I believe that Joker should take that path or at least try it and tell us.
Great testing bambola. This is the kind of data I love to see.
Brad M. McGehee, MVP
Thanks Brad [:I]
Sorry I mentioned calculated index - I meant maintaining computed column. This means whenever I am doing a insert, I have to insert at an additional place (of course after preprocessing the data) and also the non-clustered index (which is not a big deal). Anyway, I think computed column is great idea if there are lot of queries running on the system.
If a computed column is implemented then the queries I posted above could still be used. The queries will require an index on this computed column.
Please bear in mind the side-effects of creating an index on this type of column - like the ANSI_NULLS, ANSI_PADDING, ANSI_WARNINGS, ARITHABORT, CONCAT_NULL_YIELDS_NULL, QUOTED_IDENTIFIER and NUMERIC_ROUNDABORT settings for connections that will need to utilize that index.
I don't think that we are dealing with a calculated column here. It appears that the values are all filenames and you are trying to get the extension of each filename, is this correct?
If it is then what you need to do is properly normalise your data. If this column was normalised then you would have columns: (i) filename; (ii) extension. This would resolve the issue and make searching on either column much easier.
All you need to do to implement this is change the original insert into the table to insert into the 2 columns instead of one.
Right. That's what I thought and suggested. Could Joker197cinque respond and clarify the usefulness of this line of thinking?
The idea with a calculated column was to find a solution that will not break the application logic. If changes to the database are possible, there are of course other solution to condiser like seasider suggested.
In both cases, added column or calculated column, I'll stick with
WHERE Grafico IN ('jpg','jpe','gif')
without % before or after.
Separate names with a comma.