SQL Server Performance

Index Creation

Discussion in 'General DBA Questions' started by golden1237, Jan 29, 2010.

  1. golden1237 New Member

    Hi all,
    I have a archive table with 27+ millions records, recently we would like to make an archive records report.
    Since there are drop down list that will be displayed to end user,
    hence the queries used for the drop down list are "select distinct....."
    for example:select distinct column_1
    from table_1
    where column_1 <>''
    order by column_1
    Do I need to create non-clustered index for each column_x in table_1 ?
    It did improve the performance after the creation of index, however it is costly since the no of records are 27 millions, there is a total of 10 GB size consumed.
    Please kindly advice is there any better design of sql query or index, thanks.
  2. FrankKalis Moderator

    If the index consumes 10 GB for "only" 27 million rows, then I would guess the data type of column_1 is a fairly wide character string.
    I also guess that normalization is not an option? Looks like there may be room for some design optimizations.
    As an alternative it may be worth experimenting with putting your query into a view definition and index that view. However, this would still have the "disadvantage" that it is an online disk structure that consumes disk space.
    Do you really need the ORDER BY in your query or could potentially your client handle the sorting?
  3. golden1237 New Member

    Hi Frank,
    There are around 20 columns in the table. Due to the drop down list, we created non-clustered index on around 8 to 9 columns , appreciate if you can advice on my problem above without consuming too much disk space.
  4. preethi Member

    [quote user="golden1237"]
    Hi Frank,
    There are around 20 columns in the table. Due to the drop down list, we created non-clustered index on around 8 to 9 columns , appreciate if you can advice on my problem above without consuming too much disk space.
    [/quote]
    Are you trying to say that you display those 8 to 9 columns in the drop down list?
    If so, you are trying to solve a wrong problem.
    If not why do you need all the columns in the index.

  5. golden1237 New Member

    Hi,
    I think there are some misunderstanding there..
    What i trying to say is..
    For eg..
    There are a lot of drop down list such as
    a)serial no
    b)model name
    ,etc
    But those columns are not unique, hence i need to select distinct from 27 millions record for eg serial no (Which maybe return distinct records of just 1000 records)
    In that case, should i create non-clustered index on each of those drop down list column?
    Please suggest or advice if there is a better idea.
  6. preethi Member

    For your situation, I believe there is a fundamental problem:
    Those columns should come from a "look up table" where only distinct values are stored. But it needs quite a lot of application change. Any quick fix to this problem can affect the performance adversely.
    At this time you have no other choice than going for separate (non clustered) indexes for each of those columns.
    Otherwise, think of this quick and dirty fix: Create a lookup table and insert all distinct values. Add a trigger to check the lookup table and insert if it is not there. Now use the lookup table for your drop down. However, this will not take care of updates and deletes to those look up values. You need to do some house keeping task to resolve those issues.
    Another bad point of this solution is tThis will affect the performance of insert statements.
    Note: If any of those columns come from same table clustered index will not help you. Additionally, Clustered index will not solve your problem as in your scenario, you will move from (non clustered) index scan to clustered index scan which is like jumping from the frying pan to the fire.. Additionally, you will change the physical organization of the data which will affect all other indexes, and ALL operations
  7. golden1237 New Member

    Thanks for the suggestion.
    If I do not create the look up table, I still need to stick with those non-clustered index in the original table?
    Also, I have some doubts here:
    1) Apart from non-clustered index, will the creation of clustered index helpful?
    2) How can I calculate the individual size of the index of the table?
    sp_spaceused <table_name> seemed to produce the total size of the index size.
    I did refer to some reference in Microsoft website but I fail to understand. Hopefully somebody can explain to me.


  8. preethi Member

    [quote user="golden1237"]
    If I do not create the look up table, I still need to stick with those non-clustered index in the original table?
    [/quote] I am afraid so.
    [quote user="golden1237"]Apart from non-clustered index, will the creation of clustered index helpful?[/quote]
    Generally, having a clustered index helps many queries execute faster against the table. In your case,IT DEPENDS. If you don't have a clustered index, the table will be called a heap, where all the non clustered indexes will be forced to use an internal mechanism to identify the rows using file id, page id and row id (6 bytes in total) If you can get a key with lesser byte, it will reduce the index size and the query will execute faster. Additionally, maintaining the index will take less resources when you have a clustered index.
    As a rule of thumb, Have a clustered index on a key which does not have duplicate entries, small and not changes.
    [quote user="golden1237"]How can I calculate the individual size of the index of the table?[/quote]
    You can use this query:
    SELECT I.name, ips.page_count * 8 AS [Index size in KB]
    FROM sys.indexes i JOIN sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('Sales.SalesOrderDetail'),null, null, default) ips
    ON i.index_id = ips.Index_id
    WHERE i.object_id = OBJECT_ID('Sales.SalesOrderDetail')
    Hope this helps

  9. golden1237 New Member

    Thanks for the script.
    I did execute below scripts in one of the test environment with SQL 2000
    1) select dpages*8 as INDEX_SIZE from sysindexes where name='<index_name>'
    2) sp_spaceused <table_name>
    on a table which consists only of clustered index.
    The result of first script and second script is different.
  10. preethi Member

    Clustered index will not be added to index page in sp_SpaceUsed. As Clustered index points to the data pages at the leaf level, the clustered index size in the query will be almost equal to data column in Sp_SpaceUsed. So Index size will be somewhere close to total of Index size in query.
    I have seen slight differences at times, but they are related to statistics. If you see major differences, it is good time to update statistics.
    Hope this helps.
  11. golden1237 New Member

    [quote user="preethi"]
    As Clustered index points to the data pages at the leaf level, the clustered index size in the query will be almost equal to data column in Sp_SpaceUsed. So Index size will be somewhere close to total of Index size in query.
    [/quote]
    Then the creation of the clustered index will result a big increment for the table size [data and index]?
  12. preethi Member

    [quote user="golden1237"]
    [quote user="preethi"]
    As Clustered index points to the data pages at the leaf level, the clustered index size in the query will be almost equal to data column in Sp_SpaceUsed. So Index size will be somewhere close to total of Index size in query.
    [/quote]
    Then the creation of the clustered index will result a big increment for the table size [data and index]?
    [/quote]
    Not exactly. The key word here is "Points." When you have a table with clustered index, it will have the root page and intermediate pages separately. But it does not have a separate leaf page. instead of that the pages before the leaf pages points to the data pages. So data pages are the clustered index pages.
    You can read more about table and index architecture from this link:
    http://msdn.microsoft.com/en-us/library/aa174541(SQL.80).aspx
    Hope this helps:
    Note: I changed the subject to reflect the version
  13. golden1237 New Member

    Hi all,
    Could anybody advice on my second part of the stored procedure?
    As described in my previous post, this table consists of 27 millions records.
    I need your all help on the index creation.
    Really need your suggestion and advice.
    Thanksset @str = 'select distinct isnull(a.productcode,'''') ''productcode'' ,isnull(a.dlot,'''') ''dlot'' ,isnull(a.dlotsplitindicator,'''') ''dlotsplitindicator'',isnull(a.serialno,'''') ''serialno'',isnull(a.klot,'''') ''WIP'',isnull(a.productionlineno,'''') .................
    set @str = @str + N'from table1 a '
    set @str = @str + N'LEFT OUTER JOIN table2 b '
    set @str = @str + N'ON a.productcode=b.productcode '
    set @str = @str + N'and a.dlot=b.dlot '
    set @str = @str + N'and a.serialno=b.serialno '
    set @str = @str + N'LEFT OUTER JOIN table c '
    set @str = @str + N'ON b.userid=c.UserId '
    Set @stre = 'where (a.productcode like rtrim('''+@ProductCode+''') + ''%'' or rtrim('''+@ProductCode+''') = '''' )'
    if len(ltrim(rtrim(@Klot))) > 0
    begin
    set @stre = @stre + N' and (a.klot like rtrim('''+@Klot+''') + ''%'' or rtrim('''+@Klot+''') = '''' )'
    end
    if len(ltrim(rtrim(@Dlot))) > 0
    begin
    set @stre = @stre + N' and (a.dlot like rtrim('''+@Dlot+''') + ''%'' or rtrim('''+@Dlot+''') = '''' )'
    end
    if len(ltrim(rtrim(@SerialNo))) > 0
    begin
    set @stre = @stre + N' and (a.serialno like rtrim('''+@SerialNo+''') + ''%'' or rtrim('''+@SerialNo+''') = '''' )'
    end
    if len(ltrim(rtrim(@LineNo))) > 0
    begin
    set @stre = @stre + N' and (a.productionlineno like rtrim('''+@LineNo+''') + ''%'' or rtrim('''+@LineNo+''') = '''' )'
    end
    if len(ltrim(rtrim(@ModelName))) > 0
    begin
    set @stre = @stre + N' and (a.modelname like rtrim('''+@ModelName+''') + ''%'' or rtrim('''+@ModelName+''') = '''' )'
    endif len(ltrim(rtrim(@QAJudgeMean))) > 0
    begin
    set @stre = @stre + N' and (b.judgecode like rtrim('''+@QAJudgeMean+''') + ''%'' or rtrim('''+@QAJudgeMean+''') = '''' )'
    end
  14. preethi Member

    I am sorry, But I don't think adding an index will help you in this case.

    • The query you have specified will return the distinct combination of all columns. Unless you have additional processing of eliminating the duplicates at the application level, you will still get duplicate entries.
    If you have column A and Column B and each having 3 distinct values 1,2,3 and 7,8, 9 but you can get up to 9 different distinct value combination.

    • Why do you want to use dynamic SQL?
    • Remove all the trim, ISNULL logic from the SQL code, Move them to the application layer.
    • I still prefer different SQL statements to get each columns distinct values.
    • In this case, I don't think indexes will help you
    When comparing the work involved here, I prefer the use of triggers to insert those values into lookup tables. You may have to write a job to do the cleanup of tasks

    If you still want to stick to the query, and you need some indexes, create an index on 2-3 smaller columns (columns that take few bytes). Keep the columns which have few repeating values as the first column. Based on the output you need to decide whether you need to add additional columns.

  15. preethi Member

    [quote user="preethi"]Based on the output you need to decide whether you need to add additional columns. [/quote]
    ... or drop the non clustered index and live with what you have. [:(]
  16. golden1237 New Member

    Hi Preethi,
    Really appreciate the advice and knowledge sharing.
    If we create non-clustered index for those smaller column, will there be a performance issue since the "where" clause column is not index.
  17. preethi Member

    It depends on the amount of data filtered through DISTINCT values and where clause.
    You can try this: create an additional index on the columns in where clause. If you think the number of columns involved are wider you need to use the same principles we discussed before. AFter create both indexes, you execute the query and gt the execution plan. Based on the execution plan you may be able to determine what method to proceed.
    As mentioned in another(?) post, all depends on the data distribution. outsiders, like me may be able to guide on some directions. You need to make a decision on what will be the best method for you.
  18. golden1237 New Member

    [quote user="preethi"]
    I still prefer different SQL statements to get each columns distinct values.
    [/quote]
    Hi Preethi,
    May I know what do you mean by different SQL Statemets?

Share This Page