SQL Server Performance

"Paging" using a Composite Index

Discussion in 'T-SQL Performance Tuning for Developers' started by Hartmut5, Dec 9, 2004.

  1. Hartmut5 New Member

    I have a database table set up with a composite index on 2 fields. The first field is a data field that is being searched, and the second is a unique field. This setup is used for "paging" through the data, retrieving only a small set at a time, but still allowing the user to scroll through records.

    My problem is that I can't seem to get SQL Server to seek to a specified record and then return the next 100 records in the index using both columns. It always seeks on the first column only, then processes all of those records until it meets the other criteria. The first column contains numeric data descending, and the second contains character data ascending. The paging is basically done with the following SQL:



    CREATE TABLE Table1 (
    Table1_ID int identity(1,1)
    , DataField varchar(10)
    , Column1 int
    , Column2 varchar(10)
    )

    CREATE UNIQUE INDEX ix_Table1_Col1Col2 ON Table1 (
    Column1 desc
    , Column2 asc
    )

    SELECT TOP 100 *
    FROM Table1
    WHERE Column1 = @SearchValue AND Column2 >= @CurrentRecord
    OR Column1 > @SearchValue

    The paging runs very well until I hit the 0's (which there are a lot of). When I analyzed the query plan, I saw that it was seeking using Column1 only. This works fairly well, since most of the numeric data is fairly unique. There are, however, a lot of zeroes. When I hit the zeroes, it takes several seconds for each page to be returned.

    I've tried rewriting the query several different ways, but I never get the query plan I'm looking for, which would seek to @SearchValue + @CurrentRecord using both columns of the index and then return the next 100 records. Is there any way I can do this, other than splitting it into two separate queries ((A = ? AND B >= ?), (A > ?)) and joining the results?


    -Hartmut5
  2. Twan New Member

    Hi ya,

    You have to put the order by on the select statement

    SELECT TOP 100 *
    FROM Table1
    WHERE Column1 = @SearchValue AND Column2 >= @CurrentRecord
    OR Column1 > @SearchValue
    order by
    column1,
    column2

    Cheers
    Twan
  3. Hartmut5 New Member

    Whoops,

    I was in a hurry when I posted and forgot to put that part in, but yes, the actual SELECT statement does have the ORDER BY clause included ( ORDER BY Column1 DESC, Column2 ASC ). Are there any hints I could use, or is this perhaps a case where the default statistics sample isn't large enough? ( I can't imagine any situation where seeking and reading the index in-order would be less efficient, unless seeking to the second field in a composite index is very costly. )


    -Hartmut5
  4. brimba New Member

    I am thinking of your Wherestatement, which seems a bit odd.

    WHERE Column1 = @SearchValue AND Column2 >= @CurrentRecord OR Column1 > @SearchValue

    which would be the same as

    WHERE Column1 >= @SearchValue AND Column2 >= @CurrentRecord

    But do you mean this maybe?

    WHERE (Column1 = @SearchValue AND Column2 >= @CurrentRecord) OR (Column1 > @SearchValue)
  5. Hartmut5 New Member

    The "AND" logical operator has higher precedence than "OR", which makes the following two conditions equivalent:

    WHERE Column1 = @SearchValue AND Column2 >= @CurrentRecord OR Column1 > @SearchValue

    WHERE (Column1 = @SearchValue AND Column2 >= @CurrentRecord) OR (Column1 > @SearchValue)



    -Hartmut5
  6. Twan New Member

    Hi Harmut,

    and you're saying that the results are not coming back in a predictable way? i.e. is it selecting the first 100 matching records and then sorting them?

    if so then that would be unexpected by me as well, but then try

    SELECT TOP 100 *
    FROM (select * from Table1
    WHERE Column1 = @SearchValue AND Column2 >= @CurrentRecord
    OR Column1 > @SearchValue ) mytbl1
    order by
    column1,
    column2

    Cheers
    Twan
  7. Hartmut5 New Member

    I am getting the expected results returned. However, I am not getting them back in the expected amount of time. The execution plan is not seeking to the first record and then returning the next 100 records, but instead doing an index scan. When running the query, it takes 7 seconds to return the 100 records, which is unacceptable for the application. The result set needs to be returned in less than 1 second, which is completely reasonable considering the index is built specifically for this query. What I can't figure out is how to make SQL Server seek to a specific record and return the next 100 records using a composite index.

    I apologize for several of the mistakes in my original post, I was in a bit of a rush. I'll recap here what the setup is (with a little more detail and less errors/omissions):

    CREATE TABLE Table1 (
    Table1_ID int identity(1,1)
    , DataField varchar(10)
    , Column1 int
    , Column2 varchar(10)
    )


    CREATE UNIQUE INDEX ix_Table1_Col1Col2 ON Table1 (
    Column1 asc
    , Column2 desc
    )

    Column2 is unique within the table and has a unique index created on it. Column1 has a wide range of values, but about 3/4 of the rows have a value of zero. The application that is accessing the database needs to retrieve records in sets of 100. Here is the original query:


    SELECT TOP 100 *
    FROM Table1
    WHERE Column1 = cast( @SearchValue as int )
    AND Column2 <= @CurrentRecord
    OR Column1 > cast( @SearchValue as int )
    ORDER BY Column1 asc, Column2 desc

    Logically, this statement should find the current record or the next closest match, then return the next 99 records in the index. However, the execution plan produced by this query is an index scan, which is pretty darn slow, considering that Table1 has approximately 5M records. To try and take advantage of the index, I rewrote the query to the following:


    SELECT TOP 100 *
    FROM Table1
    WHERE Column1 >= cast( @SearchValue as int)
    AND ( Column2 <= @CurrentRecord
    OR Column1 > cast( @SearchValue as int )
    ORDER BY Column1 asc, Column2 desc

    The execution plan produced by this query does an index seek using "Column1 >= @SearchValue", which works great, as long as the @SearchValue is not zero. When the @SearchValue is zero, SQL Server scans through all records where Column1 is zero, which comes out to about 3.25 million records. As the @CurrentRecord gets closer to the end of the zeroes, the query gets slower and slower, as it needs to read more and more records to satisfy the "Column2 <= @CurrentRecord" criteria.

    Is there some way to get SQL Server to seek to the first record meeting the "Column1 = @SearchValue AND Column2 <= @CurrentRecord" criteria and then return the next 100 records in the index? Or will I have to resort to special handling of "zero" as the @SearchValue? I guess this might work:


    SELECT TOP 100 *
    FROM (
    SELECT TOP 100 *
    FROM Table1
    WHERE Column1 = cast( @SearchValue as int )
    AND Column2 <= @CurrentRecord
    ORDER BY Column1 asc, Column2 desc
    UNION
    SELECT TOP 100 *
    FROM Table1
    WHERE Column1 > cast( @SearchValue as int )
    ORDER BY Column1 asc, Column2 desc
    ) T1
    ORDER BY T1.Column1 asc, T1.Column2 desc

    However, this seems overly complex and inefficient for the given task, and would make things much more difficult when we need to include other selection criteria.

    Is there any way to select the next n records from a composite index, with a defined starting point?

    Thanks already for your help and consideration.


    -Hartmut5
  8. Chappy New Member

    Be careful with your logic here!
    a AND b OR c --> a AND (b or c)

    It might be worth trying to eliminate the OR altogether, using a UNION with appropriate indices...

    SELECT TOP 100 *
    FROM
    (
    SELECT f1, f2, f3 from Table WHERE Column1 = cast( @SearchValue as int ) AND Column2 <= @CurrentRecord
    UNION ALL
    SELECT f1, f2, f3 from Table WHERE Column1 = cast( @SearchValue as int ) AND Column1 > cast( @SearchValue as int )
    ) a
    ORDER BY a.f1 asc, a.f2 desc


  9. Hartmut5 New Member

    Hello again,

    I hope everyone had a great Christmas. Thank goodness for vacation time!

    Well, it looks like the short answer to my original question is "No." There appears to be no way to use the native order of a composite index to return n rows from a specific point in the index (within a single SELECT statement). The only way I have been able to get near it is by using the UNION approach. The union approach works great as long as I don't add additional criteria. The following query works almost exactly as needed:

    SELECT TOP 100 * FROM
    ( SELECT Column1, Column2
    FROM Table1
    WHERE Column1 = cast( @SearchValue as int )
    AND Column2 <= @CurrentRecord
    UNION ALL
    SELECT Column1, Column2
    FROM Table1
    WHERE Column1 > cast( @SearchValue as int )
    ) T1
    ORDER BY T1.Column1 asc, T1.Column2 desc
    It seeks to the current record and then retrieves up to 100 records where Column1 matches the search value exactly. If there are less than 100 exact matches, it then retrieves enough records from the second half of the UNION statement. It even works great when the sort order is reversed. If only that solved the whole problem...

    When I add additional criteria based on the clustered index fields, I would expect the execution plan to simply add a "WHERE" restriction during the fetching of records from the index, since the clustered index fields are inherently part of the index. Here is a sample statement:

    SELECT TOP 100 * FROM
    ( SELECT Column1, Column2
    FROM Table1
    WHERE Column1 = cast( @SearchValue as int )
    AND Column2 >= @CurrentRecord
    AND ClstrCol1 = @x
    AND ClstrCol2 = @y
    UNION ALL
    SELECT Column1, Column2
    FROM Table1
    WHERE Column1 < cast( @SearchValue as int )
    AND ClstrCol1 = @x
    AND ClstrCol2 = @y
    ) T1
    ORDER BY T1.Column1 desc, T1.Column2 asc
    When this statement is run, however, the optimizer chooses to resolve the first half of the UNION in a way that seems very wrong. It scans an index on Column2, doing a bookmark lookup for each row before checking the WHERE clause criteria. It continues to process rows from the index in this manner until it reaches either 100 records or it reaches the end of the index (which means it will have done a bookmark lookup for each of the 5M records). This seems absolutely ridiculous, considering that all the fields in the query are covered by the index I'm trying to get SQL Server to use. Ironically, the second half of the UNION is executed exactly as expected, using the intended index (ordered forward or backward depending on the sort order) and incorporating the additional criteria as it seeks records in order.

    If I force the index for the first half of the UNION, it seeks the correct records, but then sorts all matching records by Column2 before taking the top 100, instead of simply seeking and reading the top 100 records using the index order. ARGH!!! This is driving me insane!!!

    I think I'll just have to settle for this:

    SELECT TOP 100 * FROM
    ( SELECT TOP 100 Column1, Column2
    FROM Table1
    WHERE Column1 = cast( @SearchValue as int )
    AND Column2 >= @CurrentRecord
    AND ClstrCol1 = @x
    AND ClstrCol2 = @y
    ORDER BY Column1 desc, Column2 asc
    UNION ALL
    SELECT TOP 100 Column1, Column2
    FROM Table1
    WHERE Column1 < cast( @SearchValue as int )
    AND ClstrCol1 = @x
    AND ClstrCol2 = @y
    ORDER BY Column1 desc, Column2 asc
    ) T1
    ORDER BY T1.Column1 desc, T1.Column2 asc
    After testing the earlier queries with different parameters and getting wildly different execution plans depending on the parameters, the above query seems to provide the most consistent, though not necessarily the fastest, performance.

    Also, someone please correct me if I'm wrong, but I believe that the "AND" logical operator has higher precedence than the "OR" logical operator in SQL and all common programming languages. Two replies so far have told me that it's the other way around. Or is my browser just reversing logical operators on me?



    -Hartmut5
  10. mmarovic Active Member

    Only alternative you haven't tried is to use temp table, insert top 100 values using first query from union, save rowCount (set @rowCount = @@rowCount) and then use dynamic sql to to insert top 100 - @rowCount rows from second union branch. This way you can expect recompilations but you can gain better execution plan. Dynamic sql can be executed using sp_executeSQL and sql injection is not possible if only parameter you pass is @RowCount.
    quote:It scans an index on Column2, doing a bookmark lookup for each row before checking the WHERE clause criteria. It continues to process rows from the index in this manner until it reaches either 100 records or it reaches the end of the index (which means it will have done a bookmark lookup for each of the 5M records). This seems absolutely ridiculous, considering that all the fields in the query are covered by the index I'm trying to get SQL Server to use.
    You can try to add clustered index columns to your "paging" index. That should not increase index row size of index leafs, but it will increase it on non-leaf level. That overhead is acceptable compared with performance gain.

    Anyway, this discussion is actualy academic about how we can force specific execution path. I think your solution is good compromise.

Share This Page