SQL Server Performance

How to index or not index 'not selective enough'-columns

Discussion in 'Performance Tuning for DBAs' started by brimba, Aug 30, 2007.

  1. brimba New Member

    Hello
    I am trying to denormalize multiple tables and create an index to increase the performance of a search function that we have.
    The problem when I denormalize is that its hard to find what columns to index and what to not.
    Alot of the columns that I am quering against are not selective enough, but if I dont put an index on them then it will a burden since the bookmark lookup will be heavy.
    I am then thinking of creating an index over the selective columns, and then also add a bitmask column.
    The bitmask column should include all my non selective columns.
    Let me give you an example of how I am thinking.
    This could be a couple of non selective columns. Gender, LanguageId, HasPicture
    if I then create a bitmask and say
    Male = 1, Female = 2, England = 4, France = 8, HasPicture = 16, HasNoPicture = 32
    Then if I want to query for every male in france that has a picture (1+8+16) which gives me 25. Which most probably will be selective enough and also simple for sql-server to search and sort.

    I know that this kind of database design is disgusting, so if you might have other ideas please step forward and give me suggestions!
    Otherwise you could just comment the idea, or point to an article describing this method? I am sure that this must have been discussed before?
  2. Adriaan New Member

    Have you considered creating a view instead? If you need a composite index on columns from different tables, an indexed view is probably a better option than a denormalized table (which I hope you're only using for reporting).
  3. brimba New Member

    This solution is not only for reporting. I was afraid that an indexed view would cause alot of extra writes (more than the extra writes that is needed for the denormalized table), but I will of course try it out and see what happens.
    Could you please explain your idea with indexed views, as a solution against the problem with indexing multiple non selective columns?
  4. Adriaan New Member

    I don't think there is much sense in indexing a bitmask (integer?) column, a couple of appropriate indexes on the actual fields should be better. Try and see if you see many repeating values on a combination of two or three columns, and add a composite index for that combination.
    You might also consider moving the related columns to a lookup table, with an identity column that you then use as a FK on the main table, and indexes as described before. This should work if one or two columns have a fair bit of repeating data.
  5. brimba New Member

    Could you please describe the technique with a lookup table a bit more? Maybe an example?
  6. satya Moderator

  7. Adriaan New Member

    Sorry for the confusion - the kind of lookup table I meant was a table that holds a normalized (set of) attribute(s) that you assign to the main table through an FK column - like tblSales and tblClient, where tblSales has a ClientID column referring to the "lookup table" tblClient.
    If you do a SELECT DISTINCT on the columns that you want to cover with a bitmap, does it give you a much lower row count than against the whole table? In that case, you could normalize those columns into a lookup table, and index the lookup table.
  8. brimba New Member

    ahh! Thanks for explaining.
    About the indexed views, which I have been reading about some time now. As far as I understand it, the cost of an update of a base table will only take place if you update a column that is part of the indexed view. So (if this is correct) I am thinking that even OLTP databases actually can benefit from an indexed view, of course only when a couple of "requirements" are set. For example that the included columns are less updated, and that the base tables have few inserts/deletes.
    Maybe I am wrong here? Maybe an indexed view in an OLTP database is always bad? I may have missed some other issue, like some overhead of indexed views or something?
    As an example we can take a table with users. if we create an indexed view with SSN, FirstName, LastName, DateOfBirth. And we know that these data is almost never updated, then I guess that it actually can be a good thing?

  9. brimba New Member

    What if I want some values to be optional in the lookup table?
    When a user is searching there are some criterias that are required, but some are optional.
    For example, language is required while picture is optional. You may want to search for everyone with or without a picture.
    How is this possible with a lookup table?
    I tried to create a lookup table and when my query runs select all rows from the lookup table that fits my critera and then perform an IN() on those, but I guess that it can become a quite large number of options and therefore result in bad performance.
    Any ideas?


  10. Adriaan New Member

    You would have to run tests to see if there is any difference in performance for IN (subquery) or EXISTS (subquery) or INNER JOIN.
    If IN (subquery) and EXISTS (subquery) have similar performance, always use EXISTS.
    When it comes to optional filter criteria, try using sp_ExecuteSQL with proper parameters, and compile both the query statement and the parameter list as dynamic SQL. [edit: To make that work properly, you also need to put the actual parameters in dynamic SQL - so compile the whole call to sp_executeSQL as dynamic SQL.]
  11. brimba New Member

    ok. My query against the lookup table can return 15-20 rows and when I do a EXISTS or IN() the query will perform bad since there are so many options.
    Any other suggestions on how to implement a lookup table with optional lookup attributes? :)
  12. Adriaan New Member

    You could try using a table variable, inserting the matching key values from the lookup table in there, then using the table variable in your subquery.
    Did you try building a proper sp_ExecuteSQL with parameters, in dynamic SQL? Just one test run does not tell you everything: once the statements start to get fired against the database, you might get some query plan re-usage going.
  13. brimba New Member

    Thanks, I have tried all that, booth table variable and a temp table, no big difference. I also tried sp_executesql without any luck. The big problem seems to really be that there are som many options and therefore the exists() or in() will have problems with using the index in a good way?
  14. Adriaan New Member

    You mentioned that the lookup table would have something like 60 rows. That should evaluate pretty quickly for any combination of criteria.
    Give us the script that you're now using.
    By the way, you did add an identity column on the lookup table as the clustered index? And you added an INT column on the main table for the FK referring to the lookup table, with a non-clustered index? And the main table has a clustered index as well?

Share This Page