Optimising LIKE queries… | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Optimising LIKE queries…

We have search capabilities in our web application which I am trying to improve the performance of. One aspect of the search (which we call "simple") allows the user to enter one or more words and only resources containing ALL of the words are returned. The requirements state that the search must be a wild-card search, returning results containing words starting with each of the entered search terms in any order. ie, If the criteria entered is ‘John McDonald’, then valid matches would be ‘John McDonald’, ‘John Q McDonald’, ‘McDonald, John’, ‘McDonald, Smith and Johnson’, but ‘LongJohn McDonald’ would not be a valid match. The snippit of code being used at the moment is something along the lines of:
WHERE (R.Description LIKE ‘%[^a-zA-Z]John%’ OR R.Description LIKE ‘John%’)
AND (R.Description LIKE ‘%[^a-zA-Z]McDonald%’ OR R.Description LIKE ‘McDonald%’) The [^a-zA-Z] is needed to ensure that the search term is at the start of a word (ignoring for the moment that the search is broken, and ‘123John *&^McDonald’ would be a valid match), but that also means that the OR part is needed for each term. The performance of this is barely acceptable. Until recently other slow, nasty bits in the search code ‘hid’ this as a performance issue, but I have been tidying up the code and optimising the cr*p out of it – this is now the slowest part of the whole deal. Any ideas on how to improve the performance here? Please bear in mind that the behaviour now cannot change – the existing code has been in production use for over two years (up to 500 simultaneous users), and changing the search behaviour is going to cause massive re-training and support headaches.

Anytime you use the LIKE clause and begin the search criteria with a wild card, SQL Server cannot use available indexes and a table scan has to be performed. In addition, the use of the OR clause also makes the query perform less than ideal. See these URLs for more information on both of these items: http://www.sql-server-performance.com/transact_sql.asp
http://www.sql-server-performance.com/optimizing_indexes.asp To prove this to yourself, run query in Query Analyzer and view the Execution Plan. You will see that no INDEX SEEKs are being used, only INDEX SCANS. Let me think about this some more and add more thoughts later on how you might be able to overcome this performance issue. In the meantime, please check out the two above URLs for some more background information.
——————
Brad M. McGehee
Webmaster
SQL-Server-Performance.Com
I’m not to sure on this but it might be an alternative.
Given that you’ve split the words up would it not be better to use something along these lines (I’m not sure, just a suggestion which might be worth checking up) : WHERE (R.Description LIKE ‘% John%’ OR R.Description LIKE ‘John%’)
AND (R.Description LIKE ‘% McDonald%’ OR R.Description LIKE ‘McDonald%’) (Just adding a space prior to the search term)
I’m assuming that no character would be valid before the term, even numeric so it might not be viable if you want , or . before the search term and also I’m not sure if it is faster but I would think so as it doesn’t have to negate the every character prior to the search term.. Just an idea, might not be feasible. Cheers
Shaun
World Domination Through Superior Software
quote:Originally posted by trifunk I’m not to sure on this but it might be an alternative.
Given that you’ve split the words up would it not be better to use something along these lines (I’m not sure, just a suggestion which might be worth checking up) : WHERE (R.Description LIKE ‘% John%’ OR R.Description LIKE ‘John%’)
AND (R.Description LIKE ‘% McDonald%’ OR R.Description LIKE ‘McDonald%’) (Just adding a space prior to the search term)
I’m assuming that no character would be valid before the term, even numeric so it might not be viable if you want , or . before the search term and also I’m not sure if it is faster but I would think so as it doesn’t have to negate the every character prior to the search term.. Just an idea, might not be feasible.

Unfortunately the behaviour must be identical to the existing code as this is a production system that is in pretty heavy use by a large number of users. Searches must find matches for ‘123abc’, ‘123-abc’, ‘xyz,abc’, ‘xyz abc’, ‘123/abc’, ‘(abc)’, etc…
Have you considered using a third-party program, such as index engine fromhttp://www.imptechnology.com/? This is a cool add-on program that can make wildcard searches much faster.
——————
Brad M. McGehee
Webmaster
SQL-Server-Performance.Com
]]>