SQL Server Performance Forum – Threads Archive

Forcing a clustered index SEEK

This is related to a previous thread http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=18926). I thought the problem was solved, but it’s reappeared. I have a query that should be using a clustered index SEEK, but the Query Execution Plan wants to use a clustered index SCAN. I want to force a SEEK. I’ve tried using the hint " with (index (indexName))", but it still doesn’t work. Here are the details — There are two tables. The first (tempTable) has a single column [ID int not null] and about 20,000 rows. It has a clustered unique index on [ID]. The second table has two columns [ID int not null] and [word varchar(20) null], and *100 million* rows. There is a clustered unique index on ID. The SQL statement is: select top 100 count(*),b.word
from tempTable as a, words as b where
a.ID+1 = b.ID
group by b.word
order by count(*) desc The purpose of this query is to find the most frequent words in a 100 million word corpus, which appear immediately after a given word. Suppose that [tempTable] contains the ID values (offset values) for 20,000 occurrences of "white" in the 100 million word corpus. The SQL statement above will find the 100 most frequent words immediately after "white" (i.e. on the next row of the table). BTW, a huge self-join doesn’t work — too slow (>30 seconds) — I’ve already tried this in several different forms. Anyway, I can’t think of any reason that it wouldn’t use a clustered index SEEK in this case. I can’t force it away from a clustered index SCAN, and the performance is horrible. Thanks in advance. Mark D.

I believe you can force optimizer to use index but you can’t foce to use it as SEEK OR SCAN… Try the following… select b.* into #temp
from words as b
where exists (select 1 from tempTable as a where a.ID+1 = b.ID) Try with and without Creating the index on word column on #temp table
select top 100 count(*),b.word
from #temp as b
group by b.word
order by count(*) desc
/*************************************/
http://www.sql-server-performance.com/transact_sql.asp
The GROUP BY clause can be sped up if you follow these suggestion: Keep the number of rows returned by the query as small as possible.
Keep the number of groupings as few as possible.
Don’t group redundant columns.
If there is a JOIN in the same SELECT statement that has a GROUP BY, try to rewrite the query to use a subquery instead of using a JOIN. If this is possible, performance will be faster. If you have to use a JOIN, try to make the GROUP BY column from the same table as the column or columns on which the set function is used.
Consider adding an ORDER BY clause to the SELECT statement that orders by the same column as the GROUP BY. This may cause the GROUP BY to perform faster. Test this to see if is true in your particular situation.
[7.0, 2000] Added 6-6-2003 MohammedU.
Moderator
SQL-Server-Performance.com

quote:Originally posted by MohammedU
Try the following… select b.* into #temp
from words as b
where exists (select 1 from tempTable as a where a.ID+1 = b.ID) Try with and without Creating the index on word column on #temp table
select top 100 count(*),b.word
from #temp as b
group by b.word
order by count(*) desc

Turns out this is too slow as well (>30 seconds). BTW, this is on a fairly fast machine — five 15k SCSI drives in a RAID 5.

quote:
I believe you can force optimizer to use index but you can’t foce to use it as SEEK OR SCAN…

If true, this is really a problem. On other queries with this table where a SEEK is used, it makes all of the difference in the world.

Did you create index on WORD column on #temp table? You said it took 30 seconds… how much time each statement taken?
MohammedU.
Moderator
SQL-Server-Performance.com

While planning the data model and the indexes, which fields should I consider to have index on them….one in the select part or one in where clause.
Rohit

You’re doing a calculation to join the tables in the WHERE clause, so you’re actually lucky that the index is being used at all.
Any chance of adding a second column with the vaule of ID+1 on tempTable, index that, and join on it? Of course, this will take up time.

]]>