SQL Server Performance Forum – Threads Archive

Execution Plan Wrong?

I have this table with just over 100million rows: <pre id="code"> CREATE TABLE [MyTable] ( [EntryDate] [datetime] NOT NULL , [IPID] [int] NOT NULL , [SomeText] [varchar] (800) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [MD5] [char] (32) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [Views] [int] NOT NULL , [AnInt] [int] NOT NULL , [AnInt2] [int] NOT NULL , CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED ( [EntryDate], [IPID], [MD5] ) WITH FILLFACTOR = 80 ON [PRIMARY] ) ON [PRIMARY] </pre id="code"> I then run this query: <pre id="code"> SELECT avg(Views) FROM MyTable(nolock) WHERE EntryDate BETWEEN ’01 Nov 2005′ AND ‘8 Nov 2005′ AND IPID = 858 and MD5 = ‘592ff071652e9f3f44b28bd78b8946a9′ </pre id="code"> The query takes 20mins+ to return despite the WHERE clause being in the order of the Primary Key. Now I run this query: <pre id="code"> SELECT avg(Views) FROM MyTable(nolock) WHERE EntryDate = ’01 Nov 2005′ AND IPID = 858 and MD5 = ‘592ff071652e9f3f44b28bd78b8946a9′ </pre id="code"> It finishes in milli seconds. As it does for every date between the dates shown in the first where clause. I’ve also run the query with every in a IN clause instead of teh between. This also runs in milli seconds. So why does the first query take so long? The only difference in the esitamed query plan is that the BETWEEN version will use parallelism. Setting it to USE MAXDOP 1 is still 20 mins or so. I have updated stats and run DBREINDEX on the table. Also I have this table on 2 servers – both do the same. On is Win 2000 SP4 with SQL 2000 SP3 (81<img src=’/community/emoticons/emotion-11.gif’ alt=’8)’ />. The Other is Win 2003 SP1 with SQL 2000 SP3 (81<img src=’/community/emoticons/emotion-11.gif’ alt=’8)’ />. Any ideas? Thanks Simon

What is the response time if you write: where entryda >= ‘date’ and entrydate <= ‘date’? Luis Martin
Moderator
SQL-Server-Performance.com One of the symptoms of an approaching nervous breakdown is the belief that one’s work is terribly important
Bertrand Russell
All postings are provided â€œAS ISâ€ with no warranties for accuracy.

You answered your own question, I think, "I’ve also run the query with every in a IN clause instead of the between. This also runs in milli seconds." The between clause itself is not the problem, it’s the combination of a between range with the other two criteria that doesn’t go well with your clustered index. You might drop the PK constraint, then add an identity column with a clustered index, and next recreate the PK as non-clustered – perhaps with EntryDate as the last column.

Considering just that query clustered index with different column order:
1. IPID
2. MD5
3. EntryDate would be much faster. However, in that case you can expect extreme clustered index fragmentation (read: data fragmentation) and a lot of page splits during insert. The conclusion: Adriaan gave you excellant recommendation. Another solution you can try is to have non-clustered pk as recommended by Adriaan and clustered index on EntryDate column only. If data are inserted in EntryDate order then this solution is better then one with surrogate key. In that case you can keep default fill factor for clustered index instead of 80% as you have now.

Thanks for your suggestions – I was aware my primary key was far from ideal, I was just confused by the query times I was seeing. The plan is to change the indexing to a unique contraint against the 3 columns and a cluster index against entry date. Because of the size of this table it will be a while until I can do this, hopefully it will solve my problem. Thanks again.

Order in future 3 column non-clustered pk is important. Also you should test different fill-factors for that pk (50, 60, 70 and 80). On clustered index on EntryDate fill-factor should stay zero which is default for each database unless you change it.

]]>