SQL Server Performance

PK Constraint and covering indexes

Discussion in 'Performance Tuning for DBAs' started by pmackay, Sep 20, 2005.

  1. pmackay New Member

    Hi again,

    I'm checking the indexes in our database, and I've found in several places the case that I'll explain here. The problem arises because I can't find a way to solve them.

    Suppose this sceneario:

    Table 1, columns 1 to 10.
    A primary key PK over Col1, Col2 and Col3.
    By desing or default, this PK is a clustered index.

    After a long time, someone (maybe me), created a new nonclustered index (NCI) in Col1, Col2, Col3, Col4, because we needed to filter by all of this columns.

    Today, I want to remove the NCI or the PK because they have a very similar structure, but I dont know how to do this.

    If I remove the NCI, I lost the filtering capacity, but If I remove the PK, well you know what will happen. Another choice is to add the Col4 to the PK, but that idea contradicts the essence of the PK.

    Thanks,

    Patrick MacKAY
    ASP.NET MVP
  2. satya Moderator

    How often you're performing reindexing and update statistics job on the database?
    For intermittent performance you can create a clustered and drop it, as it will take care of other non-clustered indexes in form.

    Satya SKJ
    Moderator
    http://www.SQL-Server-Performance.Com/forum
    This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
  3. FrankKalis Moderator

    Drop the nonclustered index and create a new one only on Col4. Since the clustered index keys are stored as part of the bookmark in the leaf level of the nonclustered index you have now a covering index on your four columns.

    --
    Frank Kalis
    Microsoft SQL Server MVP
    http://www.insidesql.de
    Ich unterstütze PASS Deutschland e.V. http://www.sqlpass.de)
  4. pmackay New Member

    Frank,

    I'm really surprised. I'll do that and find how the queries work. Satya, I rebuild the indexes and update the statistics once in a week, on sunday morning.

    Thanks

    Patrick MacKAY
    ASP.NET MVP
  5. mmarovic Active Member

    Have you tested performance with and without new non-clustered index? It may happen that using pk gives you good enough performance, so there is no need to add similar non-clustered index.

    Frank, the question I don't know answer is: Would values of col1-col3 in non-clustered index on col4 be ordered by 3 columns mentioned after they are already ordered by col4? I am not sure I expressed myself clear, I hope you understand.
    My concern is that index will be ordered by col4 but values of clustered index column would be in random order for the same col4 value.
  6. pmackay New Member

    MMarovic,

    I haven't tested the difference. It's a good point, but I guess that if I create the second index is because the performace was not good. I'll check it.

    Thanks.


    Patrick MacKAY
    ASP.NET MVP
  7. FrankKalis Moderator

    Mirko, I think I understand you. I think, the order of col4 isn't the same as the clustered index. So when you order by col4 the clustered index keys are not in the order as they are in the clustered index.
    Here's a small sample I've played with


    DROP TABLE #test
    CREATE TABLE #test
    (
    c1 INT
    , c2 INT
    , c3 INT
    , c4 INT
    CONSTRAINT pk_test PRIMARY KEY(c1, c2, c3)
    )

    CREATE NONCLUSTERED INDEX ix_test ON #test(c4)

    DECLARE @i INT
    SET @i = 0
    WHILE @i < 251
    BEGIN
    INSERT INTO #test (c1, c2, c3, c4) VALUES(@i*1, @i*2, @i*3, @i*4)
    SET @i = @i + 1
    END

    Now when you do a query like


    SELECT *
    FROM #test
    WHERE
    c1 = 100 AND
    c2 = 200 AND
    c3 = 300 AND
    c4 = 400

    you'll see that the nonclustered index is used. A query like



    SELECT *
    FROM #test
    WHERE
    -- c1 = 100 AND
    -- c2 = 200 AND
    c3 = 300 --AND
    -- c4 = 400

    also uses the nonclustered index. While this


    SELECT *
    FROM #test
    WHERE
    c1 = 100 AND
    -- c2 = 200 AND
    c3 = 300 --AND
    -- c4 = 400

    scans the clustered index. My point was that one can take advantage of the fact that the clustered index keys are stored as bookmarks in the nonclustered indexes. That allows you have more covering indexes as one might think. But it is also the drawback here. The composite clustered index blows up every nonclustered index in size. Thus, fewer rows fit on a page. I guess in the end it boils down again, to one's specific environment and one needs to test and experiment. But lately I've also come to the conclusion that clustered indexes are overestimated (or nonclustered indexes are underestimated) in term of usefulness and performance.

    --
    Frank Kalis
    Microsoft SQL Server MVP
    http://www.insidesql.de
    Ich unterstütze PASS Deutschland e.V. http://www.sqlpass.de)
  8. mmarovic Active Member

    Frank, my point is - if you ran next code:


    if exists(select * from sybobject where type = 'u' and name = 'test')
    DROP TABLE test
    go
    CREATE TABLE test( c1 INT , c2 INT , c3 INT , c4 INT)
    go
    create unique clustered index ix_clustered on test(c1, c2, c3)
    go
    CREATE NONCLUSTERED INDEX ix_test ON test(c4)
    go
    DECLARE @i INT
    SET @i = 0
    WHILE @i <= 100000
    BEGIN
    INSERT INTO test (c1, c2, c3, c4) VALUES(@i*(-1), @i*(-1), @i*(-1), @i%1000)
    SET @i = @i + 1
    END
    go
    and then run the code:

    select top 100 *
    from test (index=ix_test)

    I don't think output will be:
    col1 col2 col3 col4
    ------- ------- -------- -------
    -100000 -100000 -100000 0
    -99900 -99900 -99900 0

    .....
    as if it would be in case clustered index column values are ordered in leaf level of ix_test. If that's true (I can't test it here, I don't have sql server installed on my home pc), it means that query plan for:

    Select *
    from test (index=ix_test)
    where col1 = 0
    and col2 = 0
    and col3 = 0
    and col4 = 0
    would show range scan, not index seek.
    I'll test it today when I go to work, actually when I finish tasks from my today's to do list.
    [Edited] I redifined clustered index as unique, that may be important for our discussion. Also fixed the output.
  9. FrankKalis Moderator

    Mirko, just ran your code and it shows<br /><pre id="code"><font face="courier" size="2" id="code"><br />select top 100 *<br />from test (index=ix_test)<br /><br />c1 c2 c3 c4 <br />----------- ----------- ----------- ----------- <br />-100000 -100000 -100000 0<br />-99000 -99000 -99000 0<br />-98000 -98000 -98000 0<br />-97000 -97000 -97000 0<br />-96000 -96000 -96000 0<br />-95000 -95000 -95000 0<br />-94000 -94000 -94000 0<br />-93000 -93000 -93000 0<br />-92000 -92000 -92000 0<br />-91000 -91000 -91000 0<br />-90000 -90000 -90000 0<br />-89000 -89000 -89000 0<br />-88000 -88000 -88000 0<br />-87000 -87000 -87000 0<br />-86000 -86000 -86000 0<br />...<br /></font id="code"></pre id="code"><br />And when I run<br /><pre id="code"><font face="courier" size="2" id="code"><br />Select *<br /> from test (index=ix_test)<br /> where c1 = 0 and c2 = 0 and c3 = 0 and c4 = 0<br /></font id="code"></pre id="code"><br />this is my execution plan<br /><pre id="code"><font face="courier" size="2" id="code"><br />StmtText <br />------------------------------------------------------------------------------------------- <br />Select *<br /> from test (index=ix_test)<br /> where c1 = 0 and c2 = 0 and c3 = 0 and c4 = 0<br /><br />(1 row(s) affected)<br /><br />StmtText <br />------------------------------------------------------------------------------------------------------------------------------------------------------ <br /> |--Index Seek(OBJECT<img src='/community/emoticons/emotion-6.gif' alt=':(' />[tempdb].[dbo].[test].[ix_test]), SEEK<img src='/community/emoticons/emotion-6.gif' alt=':(' />[test].[c4]=0 AND [test].[c1]=0 AND [test].[c2]=0 AND [test].[c3]=0) ORDERED FORWARD)<br /></font id="code"></pre id="code"><br />But now I am not sure what your point here is. A nonclustered index determines also a physical order. Pretty much like a clustered index.<br /><br />--<br />Frank Kalis<br />Microsoft SQL Server MVP<br /<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a><br />Ich unterstütze PASS Deutschland e.V. <a target="_blank" href=http://www.sqlpass.de>http://www.sqlpass.de</a>) <br />
  10. mmarovic Active Member

    quote:But now I am not sure what your point here is. A nonclustered index determines also a physical order. Pretty much like a clustered index.
    First, results you posted showed I was wrong.

    I new that nonclustered index items are ordered by columns from clustered index definition. If there is an clustered index on the table than bookmarks are actually values of clustered index columns instead of pointer to the place in heap where row is stored.

    What I didn't know is if leaf level entries are additionaly ordered by clustered index column values. My theory was they are not, and result you posted showed I was wrong about that. So non-clustered index on col4 is almost equivalent to non-clustered index on col4, col1, col2 and col3 in that order. Not totally equivalent because col1, col2 and col3 values are not stored in non-leaf nodes of the index and that impacts how many rows are in non-leaf nodes and index seek is performed a bit differently.

    With non-clustered index only on col4 in non-leaf nodes there is only info about where is the first row where col4=0, not the info about where is the entry where col1 = 0, ..., col4=0. So after first row where col4 = 0 is found (col1=col2=col3=-100000) in the leaf of index, rows are scanned until row where col1=col2=col3=0 is found. That's at least how I think it could be done.
  11. FrankKalis Moderator

  12. pmackay New Member

    Frank,

    I did some tests with the index without the columns contained in the constraint, but the results was not what I expected. The query took a lot of time to finish, and the reads rise up too. Then I included the constraints columns in the NCI, and the query ran again very fast.

    I don't know what to think.

    Thanks.

    Patrick MacKAY
    ASP.NET MVP
  13. mmarovic Active Member

    This supports my theory I described in previous post. It means if you don't have too much duplicates non-clustered index on col4 may have similar, in some cases even better performance then non-clustered index on col4, col1, col2 and col3. If there are a lot of duplicates you will see cost rise and performance decrease.

    Have you compared performance using just primary key? If they are unacceptable then I would add non-clustered index on col4, col1, col2 and col3 in that order.
  14. pmackay New Member

    Nope, because it wasn't really one column (col4). There were more than one column (four columns). I thought that the number of columns was not important to the discussion. I believe that because if the NCI include the cluster key (C1,C2,C3) within, the number of columns of the NCI were not important...or I am wrong?

    Patrick MacKAY
    ASP.NET MVP
  15. pmackay New Member

    There is something I forgot to tell since the first post. In the queries, I need to filter over the four columns (c4,c5,c6 and c7) and 2 of the 3 PK columns (c1 and c2).

    Resuming:

    PK Clustered C1, C2 and C3
    The new NCI C4, C5 C6 and C7. (The first index was C1, C2, C3, C4, C5, C6, C7)

    I need to filter over C1, C2, C4, C5, C6 and C7.

    Without C1, C2 and C3, the query ran too slow.

    I'm sorry for the confusion created.

    Patrick MacKAY
    ASP.NET MVP
  16. FrankKalis Moderator

    Can you post some typical queries?
    How many rows are we talking about?
    Is this OLAP or OLTP?

    --
    Frank Kalis
    Microsoft SQL Server MVP
    http://www.insidesql.de
    Ich unterstütze PASS Deutschland e.V. http://www.sqlpass.de)
  17. mmarovic Active Member

    Ok, in that case you should test one of indexes bellow and see which one gives you best performance:

    1. just pk
    2. nci on c4, c5, c6 and c7
    3. nci on c4, c5, c6, c7 and c1
    4. nci on c4, c5, c6, c7, c1 and c2

    All columns should be in order mentioned.

    Depending on data distribution other combinations are possible, e.g. a column or two may not be too selective so it may be omitted from nci definition. Without knowing your queries, type of process, table definition and data distribution this is the best recommendation I am able to provide. You have to test all possibilities yourself and find the best.
  18. pmackay New Member

    Mirko,

    the number 4 works great. I tried first the number 2, but it doesn't work. That makes the query run very slow.

    I used "dbcc show_statistics" to get the best selectivity in the columns with the index 4.

    Now, I'm wondering about the first Frank's post. I understand that the NCI includes the key of the cluster index. I'm not sure about the searching capabilities over these columns (C1-C3)

    Frank, it's an OLTP, and the table has 250K rows.

    Thanks.



    Patrick MacKAY
    ASP.NET MVP
  19. mmarovic Active Member

    quote:Now, I'm wondering about the first Frank's post. I understand that the NCI includes the key of the cluster index. I'm not sure about the searching capabilities over these columns (C1-C3)
    The problem is that index suggested by Frank contains clustered index column only in leaf b-tree nodes. That can be advantage when non-clustered index columns combination doesn't contain too many duplicates, but it looks like is not the case with your index. I have my theory how it works I expressed in this thread. Although I am not 100% sure it works the way I described, your results and some testing I did supports the theory.
  20. mmarovic Active Member

    I forgot, new index helped performance for queries you are interested in right now, but be aware that each new index slows down data manintenance operations (inserts/updates/deletes). The art of index design is finding the right balance.
  21. FrankKalis Moderator

    Well, this looks like a trial and error case. 250k row are not that much at all. I mean, dropping and creating indexes in various combinations should take too long in your dev environment. I would build up a test scenario and try to find out, what combination gives the "best" performance. <br /><br />Anyway, I think this would make up a great topic for some further investigations. I'll dig into this. Mirko, do you mind me contacting you via PM when I have further questions regarding your theory? [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br /><br />--<br />Frank Kalis<br />Microsoft SQL Server MVP<br /<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a><br />Ich unterstütze PASS Deutschland e.V. <a target="_blank" href=http://www.sqlpass.de>http://www.sqlpass.de</a>) <br />
  22. mmarovic Active Member

    Frank, you are most definitly wellcome. [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]
  23. FrankKalis Moderator

  24. pmackay New Member

    Frank,

    I'll check out this links when I return to my office. Now, I'm at the MVP summit.

    Before I realize that the best combination was the index number 4, I had to do a lot of trial and error. And Mirko guessed it well. The columns aren´t too much different. C1 has 70 or 80 different values over the 250K, and C2 like 2000 (the value of C2 depends obver the value of C1). C3 is an Identity column. Some of the other columns (C4 to C7) are bits and other dates.

    Thanks.



    Patrick MacKAY
    ASP.NET MVP
  25. FrankKalis Moderator

    Okay, I've bookmarked this and put it on my internal to do list. I love this site: Learn every day something new. [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br /><br />--<br />Frank Kalis<br />Microsoft SQL Server MVP<br /<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a><br />Ich unterstütze PASS Deutschland e.V. <a target="_blank" href=http://www.sqlpass.de>http://www.sqlpass.de</a>) <br />
  26. mmarovic Active Member

    Frank, thank you for the links.

    Patrick, not knowing business logic and almost nothing about your app and db model, I would still suggest to consider restricting clustered pk on your identity column and to use other two columns in non-clustered indexes. Based on litle information you provided, I think your app might benefit from db model improvements.
  27. pmackay New Member

    Thanks Mirko, I'll add that tests to determine which indexes works better. I'm sorry for not providing more information...

    The db model is the next one.

    Company Information
    ID1 (identity, company id, also called C1)

    Organization
    ID1 (company id, C1)
    ID2 (identity, org. id, C2)

    The 250K rows table
    ID1, company ID, C1
    ID2, Org id, C2
    ID3, document id, identity column, C3

    more columns, C4 to C7 with some information we need to search, and more columns, not indexed in any way.

    One query needs to search over the C4 to C7 columns, but it needs the info about the C1,C2 and C3 columns to call a user-defined function....

    Another queries searchs over C1 and C2 (documents of some organization of some company), and of course, some of them need to find the exact document (c1,c2,c3), which can be identified only with the value of C3, because it's identity.

    When I tried the C4-C7 index, without C1 and C2, it ran very slow.

    Hope that this can help a little.



    Patrick MacKAY
    ASP.NET MVP
  28. mmarovic Active Member

    First rule of index design: Define pk and put indexes on each fk column. The only exception is composite primary key with first column beeing fk column. In that case you don't need additional index on that column.

    Rule #2: If there is identity column make it clustered primary key.

    Rule #3: Column order matters. Index can be effectively used if you have sargable condition on index prefix. To visualise this, consider index item a string made by concatenating index column values in order of definition.

    Rule #4: Indexes speed-up reading but slow down inserts/updates/deletes. The consequence is: If app is write intensive don't create too many indexes. You have to find right balance. It means you create indexes to be good enough for each query, not optimal for any specific one except for really critical queries.

    Rule #5: The more columns are in index, the wider it is. It means (as Frank explained) that less rows will fit the buffer, so more i/o will happen. So make composite index wide, only to cover the most critical queries.

    and so on.

    In your case it means:

    1. Pks should be identity columns. Make them clustered.
    2. Fks should be non-clustered indexes. If there is a critical query that may benefit from it you may add additional columns as a second, maybe third column of that index.
    3. If query you try to tune is really critical one and performance is not otherwise acceptable then make non-clustered index on c4-c7, c1, c2 in that order.
  29. pmackay New Member

    thank you Mirko for taking the time. I have one question.

    How many indexes are the maximum that you think that a table can have? We don't have more than four for table, but most of the tables have only one (pk), or a pk and one NCI.

    I think the answer is composed of two parts : "it depends on your business needs", and "always try to have the minimum indexes on tables, with less columns".

    Thanks.

    Patrick MacKAY
    ASP.NET MVP
  30. mmarovic Active Member

    quote:How many indexes are the maximum that you think that a table can have? We don't have more than four for table, but most of the tables have only one (pk), or a pk and one NCI.

    I think the answer is composed of two parts : "it depends on your business needs", and "always try to have the minimum indexes on tables, with less columns".
    You pretty much gave the part of answer. It depends both on your business needs, but also on the nature of db process. In previous company they had huge databases, 1000s of hits per second and 8+ indexes on big tables, some of them composite consisting of 6+ columns. Beleive or not that was good design. However, they had more then 80% reads and less then 20% writes.

    But since you are not experience enough, apply rules I mentioned and you are on a safe side.

    So in your case just use indexes on pk and fk columns and then identify critical queries. Based on them you can add a few more indexes or expand some existing adding additional columns.
  31. mmarovic Active Member

    ...and yes, 4 indexes per table is usually not too much.
  32. pmackay New Member

    Thanks again.

    I'll follow your advices and Frank's advices too.

    You folks help me a lot.



    Patrick MacKAY
    ASP.NET MVP

Share This Page