SQL Server Full Text Search Optimization

Full text indexing is a great feature that solves a long running database problem, the searching of textual data columns for specific words and phrases in SQL Server databases.

However, as the full-text search engine is a separate component of SQL Server (it uses the Microsoft Search Service) this can be a potential performance problem due to the way full-text search interacts with SQL Server.

Full-text search works really well when you are searching a text column and you are interested in all the rows that match your highly selective search criteria. However, full-text search is commonly used with other SQL Server predicates, such as those included in the WHERE clause.

For example, let’s say you wish to search the description of properties for sale for the word “bath”. However, you are only interested in flats with baths. The statistics of the table you will be searching are:

Total Properties Properties with bath’s
Houses 800,000 330,000
Flats 20,000 2,000
Apartments 15,000 1,200

To perform this query you could write it as follows:

FROM properties p
INNER JOIN containstable(properties, description, ‘bath’) t
ON p.PropertyID = t.[key]
WHERE p.type = ‘flat’

Now this is where full-text starts to under-perform. As I said above, this is due to the interaction between the Microsoft Search Service and SQL Server. Essentially what happens is the the Search Service is asked to search the catalog for rows which contain the word ‘bath’. The matching key rows are returned to SQL Server, and then this result set is filtered down to the only show the top ten when the type is “flat”.

The performance hit in the above example would be when the Search Service is returning the key rows to the Query Optimizer. In the example we are only interested in the top 10 results, but the search service returns 333,200 key rows to the Query Optimizer. The Optimizer filters these results to find those which are the type ‘flat’.

There is currently no ability for the Search Service to restrict the number of rows it returns to SQL Server, other than the textual search condition and the top_n_by_rank parameter.

The top_n_by_rank parameter of the containstable function limits the number of rows returned to the number you specify in order of descending rank. For example:

FROM properties p
INNER JOIN containstable( properties, description, ‘bath’,10) t
ON p.PropertyID = t.[key]
WHERE p.type = ‘flat’

While at first this appears to meet our needs, in fact it may result in no rows been returned. This is because the top_n_by_rank is evaluated by the Search Service before being returning the key rows to SQL Server. The top 10 rows that are returned to the Query Optimizer from the Search Service may not be of type=’flat’ and therefore be filtered out of the result set.


Pages: 1 2


No comments yet... Be the first to leave a reply!