SQL Server Performance Forum – Threads Archive
Full-Text VS LikeI’m looking at the difference in functionality between the full-text search and using the LIKE in my WHERE clause. My problem is that a statement like the one below doesn’t seem to have an equivalent when using the full-text search: SELECT * FROM Property WHERE Description like ‘%Nintendo%’
I know that there is a record with Nintendo mis-spelled, but this is in 2.4 million record table and performance is really bad with the like statement. Anybody have any suggestions?
Fundamental difference is Using a Like Operator is using a % and using FTS uses *. Like is pattern Matching and fulltext is word based I believe you are trying to compare the LIKE operator and using FTS , if you are using LIKE operator you would start seeing performance hits for table with more than 1M rows. Satya SKJ
This posting is provided â€œAS ISâ€ with no rights for the sake of knowledge sharing.
I did run some test and I definitely get better performance with FTS, but my problem is that functionality is not equivalent. If I am to execute the sample SELECT above, but with a CONTAINS instead of LIKE (and replace the % with a *), it will return less rows than my original sample query. So, if I use SELECT * FROM Property WHERE CONTAINS(Description, ‘*Nintendo*’
This will not pickup the record where the description has the word Nintendo mis-spelled NINITENDO I’m wondering if anybody knows of a way around this problem? Thanks, Francois PS: Now that I’ve re-re-read Satya’s reply, I do understand that the functionality difference is due to the fact that the LIKE clause is pattern matching, whereas the the CONTAINS and FTS in general is word based.
You seem to be searching for a single unknown character at a specific position in a word, which you can handle with LIKE, using the underscore:
SELECT * FROM Property WHERE Description LIKE ‘%nin_tendo%’ If you look at CONTAINS in BOL, it seems to be you can only use an * for the first part of the word you’re looking for.
Neither SQL Server’s T-SQL LIKE and "Full-text Search" (FTS) are designed to properly search for all (or most) possible mis-spelled words, such as the mis-spelled NINITENDO for Nintendo as there can be many other variations to the mis-spellings. You might want to consider using phonetic matching techniques, such as SOUNDEX and DIFFERENCE, see SQL Server 2000 BOL title "Comparing SOUNDEX and DIFFERENCE" for more info. There are also other techniques to do this type of searching that does not use either T-SQL LIKE or FTS, you can search Google for N-grams or trigrams for more info on techniques to properly search and find mis-spelled words in a SQL Server database.
….or you could resign yourself to T-SQL’s LIKE operator and commit this very heinous crime: SELECT * FROM Property WHERE Description LIKE ‘%nintendo%’
OR Description LIKE ‘%n_ntendo%’
OR Description LIKE ‘%ni_tendo%’
OR Description LIKE ‘%nin_endo%’
OR Description LIKE ‘%nint_ndo%’
OR Description LIKE ‘%ninte_do%’
OR Description LIKE ‘%ninten_o%’
OR Description LIKE ‘%nintend%’ Of course, this will require you to generate dynamic SQL since you don’t know beforehand what the user wants to search for. A daunting task but it will solve your problem for now. Some repetitive underscore inserting function. Ah! The price of being careful with a tool like T-SQL is sometimes high. I would recommend the previous post as a good beginning but if you are in a hurry and unlucky to have a cost-sensitive-over-the-shoulder-always-peeping-always-worried boss/client this option doesn’t look that bad. May God forgive me.
Actually, I’m not really trying to search for mis-spelled words. It just happens that the example I was using was mis-spelled. So, the use of the ‘_’ wildcard doesn’t help me much. The implementation of SOUNDEX in SQL Server is using a rather weak algorithm from what I can tell. Thanks for all your help on this topic!
Re-reading your orginal posting, it’s a bit un-clear as to exactly what you are looking for… However, if I can speculate a bit, it seems to me that you’re looking for the leading wildcard (%) function in T-SQL LIKE, but to used in the FTS query, such as: ‘search*word’. As this would find both the correct spelling of Nintendo and the mis-spelled NINITENDO, but would perform faster than using T-SQL LIKE. Correct? If so, then you’re out of luck with SQL 2000 Full-text Search as the functionality of a leading wildcard or really "word suffix", such as ‘*fish’ returning rows that contain catfish, goldfish, etc. is not implemented and most likely will not be implemented in SQL Server 2005 (Yukon) as well. However, combining SOUNDEX and N-grams *might* your best long term solution and you should review the following paper "Improving Precision and Recall for SOUNDEX Retrieval" at:
http://www.ir.iit.edu/publications/downloads/IEEESoundexV5.pdf as it is helpful in improving the SOUNDEX algorithm.
Yes John, that’s exactly what I’m trying to accomplish. Based on the posts here and on my own research, I’ve come to the same conclusion about FTS, it just won’t do what I’m looking for. I’ve been giving some thoughts about using something like SOUNDEX, but as I stated in an earlier post, SQL Server’s implementation is rather week. Thanks!