Is it the hard drives, or 2005 vs 2000 ?? | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Is it the hard drives, or 2005 vs 2000 ??

This is kind of a follow-on to a query I posted a couple of weeks ago, with some additional info that I’ve gotten in the meantime. It’s getting more mysterious as time goes on, so I’m very interested in your feedback. ————- Here’s what I’m trying to do: Imagine you have a 100 million record table, with two columns — ID (word offset) and word: ID word
— —–
1 <beginning>

27304281 the
27304282 old
27304283 man
27304284 saw
27304285 the
27304286 rich
27304287 man
….
100000000 <end> Suppose I want to find the 20 most frequent words preceding ‘man’ in the 100,000,000 word database — from these seven words they would be ‘old’ and ‘rich’. Here’s what I’m doing: Step 1. Create a table [words] with the structure shown above, with a clustered index on [word]. Step 2. Create an identical table [wordsID] with the structure shown above, but this time with a clustered index on [ID]. Step 3. Insert into a temp table [temp1] all of the ID values where [word] = ‘man’. The temp table has a clustered index on [ID]. insert into temp1(ID)
select [ID]
from [words] where
word = ‘man’ ## Step 4. (Problem) Run a join to find the 20 most frequent [word] in [wordsID], where the ID value is one less than the [ID] for ‘man’, which are now stored as [ID] in [temp1]: select top 20 count(*),b.word
from [temp1] as a, [wordsID] as b where
b.ID + 1 = a.ID ‘ i.e. the ID is one less than the ID in [temp1]
group by b.word
order by count(*) desc ———— Here’s the mystery. I have identical tables and indexes on two machines — an older machine [older] running SQL Server 2000, where the database is on a RAID 5 with 3 SCSI HDs at 15,000 rpm each. The newer machine [newer] has SQL Server 2005 with in a RAID 0 with 4 SATA HDs at 7200 rpm each. Step 4 (the problem part) runs about 3-4x as fast on my [older] machine. Fine, you say, it’s the hard drives == three 15,000 SCSI drives on the older/faster machine as opposed to the four 7,200 SATA drives on the newer/slower machine. The problem is, EVERYTHING ELSE is faster on the [newer] machines — full table scans, etc etc. And the real mystery is, the Query Execution Plan for Step 4 above on both machines is identical. It’s the "Clustered Index Seek" of [ID] on the table [wordsID], which in both cases has an estimated operator cost of 99% for the whole query. Also, when I get the stats for both machines, they are quite similar: [older/2000/15,000rpm SCSI] = 3-4x faster
Table ‘wordsID’. Scan count 23446, logical reads 164340, physical reads 156, read-ahead reads 16035 [newer/2005/7,200rpm SATA] = slower
Table ‘wordsID’. Scan count 23446, logical reads 206352, physical reads 133, read-ahead reads 17492, Any ideas on what might be going on? Again, everything ELSE is faster on the [newer] machine, so I have a hard time believing it’s all due to hardware differences between the two machines. I’d say it’s 2000 vs. 2005 (for some reason 2005 is slower for this join), but again, the Query Execution Plan at the IO stats are the same for both machines. I have to migrate the databases to the newer / 2005 / 4 SATA RAID machine soon, but I can’t afford to take this performance hit. Any insights on what might be going on would be very much appreciated. ** Alternatively, if anyone can think of an alternative architecture that would still give me the desired results, I’d be happy to go that way instead. Thanks in advance. Mark D.
Ever heard of a self-join? You need just a single table Words (ID, Word). If ID is the PK of the table, make this the clustered index. Add a non-clustered index on (Word). SELECT TOP 20 A.Word, COUNT(*)
FROM Words A INNER JOIN Words B
ON A.ID = B.ID – 1
WHERE B.Word = ‘Man’
GROUP BY A.Word
ORDER BY COUNT(*) DESC

do your above test again on each machine, then provide the cpu & durations
this time, in addition to SET STATISTICS IO ON
add: SET STATISTICS TIME ON the 2 SET commands may need to be done a certain sequence
note any cpu differences between the 2 systems in pure table scan performance, SQL 2005 is much better than 2000, but you would need more disks to prove this (300MB/sec or more)
quote:Originally posted by Adriaan Ever heard of a self-join? You need just a single table Words (ID, Word). If ID is the PK of the table, make this the clustered index. Add a non-clustered index on (Word).

Yes, I use self-joins a lot. But with this 100 million row table, it takes too long — about 25-30 seconds for a word that occurs 25000-30000 times in the 100 million word corpus. I’ve had clustered indexes on [ID], on [word], on [word and ID] — everything possible. Still too slow.
quote:Originally posted by joechang do your above test again on each machine, then provide the cpu & durations
this time, in addition to SET STATISTICS IO ON
add: SET STATISTICS TIME ON the 2 SET commands may need to be done a certain sequence
note any cpu differences between the 2 systems in pure table scan performance, SQL 2005 is much better than 2000, but you would need more disks to prove this (300MB/sec or more)

I posted this info a couple of weeks back. CPU time is negligible, and about the same on both machines. It’s the disk access that is the bottleneck. As far as duration, as mentioned previously, the older/2000/3 x 15k rpm machine is 3-5 times as fast

then it definitely is the disks
see my post on storage config: do not use SATA drives where performance is important
quote:Originally posted by joechang then it definitely is the disks
see my post on storage config: do not use SATA drives where performance is important
I guess I would still come back to the question of why * everything else * is faster on the SATA machine. In addition, the query that is giving me problems is performing something like a full table scan, in spite of the clustered index (to retrieve the 25,000+ ID rows that are scattered sequentially throughout the 100,000,000 row table). But when I do a * real * full table scan (e.g. all [word] ending in ‘%tion’), then this "slower" SATA machine is the faster of the two. There’s something about the combination of this particular query, SQL Server 2005, and the hardware that’s not working. Any other suggestions?
my guess is the following: your newer machine has much faster cpu, anything in memory will be faster SATA drives can do large block sequential IO about as well as SCSI, especially a current gen SATA vs an older SCSI a simple table scan does issue large block IO
a more complex query, like yours, may actually issue many small IO, which favors 15K run perfmon, getting Disk Reads/sec, Avg Bytes/Read
also monitor write just to be sure why is it you have a hard understanding that a 7200RPM drive is going to be slower than a 15K drive
quote:Originally posted by joechang my guess is the following: why is it you have a hard understanding that a 7200RPM drive is going to be slower than a 15K drive

That’s just my point — the four 7200 rpm SATA drives (RAID 0) are *not* slower than the three 15000 rpm SCSI drives (RAID 5). For almost every task, they’re faster. It’s just with this one crucial type of query that they’re 3-5x as slow.
I sympathize with your frustration!! I spent a couple hours a day for four weeks trying to benchmark performance differences between two different machines: Data warehouse application with heavy read access, often sequential; typical reporting joins one large main table with a few . Both using SQL Server Standard edition. OLD SQL Server 2000:
Dual 1.6 ghz Xeon processors
3 gb memory
Hardware RAID controller; 15k U160 drives
RAID 1 pair – Log + 1 filegroup with most heavily accessed core table on it.
RAID 1E (IBM variant) with 5 drive array (not RAID 5)
Single large drive for DB backups and Tempdb NEW SQL Server 2005:
Dual Processor/Dual Core DL 585 AMD processors
16 gb memory
Hardware RAID controller; 15k U320 drives
RAID 1 – internal drives for LOG/Tempdb/??
2 – MSA30 drive enclosures (8 drives each); RAID 1+0
For DB (options for single large array; or smaller arrays with additional filegroups with selected tables/indexes on them). After restoring the 2000 database on 2005 and running the recommended update stats; update usage and rebuilding indexes on all tables, I started testing various scenarios (single, multiple Tempdb files/sizes; single filegroup; splitting off a separate filegroup for the main ‘core#%92 table like the old system was configured; comparing differences between having the single core table share drives with Tempdb, etc.). The new box is not in production and I am the only user with 100% control of all DB and system parameters. All testing was done in SSMS executing queries or calling SPs. Drop cleanbuffers and free proccache done before each execution. In general, the jobs ran 3-10 times faster on the new box. One, particularly intensive query however, runs about 2 times LONGER on the NEW box. Unfortunately this was one of a few problem queries the new box was planned to deliver significant improvement on. In many, many hours of testing, I was never able to make much consistent improve in execution time. Numerous executions on several configurations yielded little insight aside from keeping the physical arrays to a small number of many spindles (versus more arrays made up of fewer spindles) but not necessary making it a single array with all of the spindles in it. Out of dozens of combinations, very little of the results I obtained were repeatable to the degree I had expected given the controlled conditions. Twice in all of this testing SQL started up in ‘magic#%92 mode – all jobs on the new box ran in fractions of the time they ran in otherwise. Completely out of proportion to whatever small change in configuration I#%92d made. These two instances occurred with different configurations and were not repeatable regardless of duplicating every prior configuration up to the ‘magic#%92 one. I stopped testing due to a crunch in other areas and haven#%92t gotten back to it yet – having found it a truly frustrating experience and a deep belief that there is something flaky whose seeming random occurrence far overshadows any thought out configuration I am able to come up with. Hours reviewing and trying SQL, hardware and OS parameters have not gotten me any closer to understanding the ‘magic#%92 phenomenon……

it really is a simple matter of collecting the proper performance counters & profiler traces while paying attention to the execution plan if the execution plan is different, don’t bother trying to explain the results.
if the plans are the same, then interpret the perf counters to determine where the bottleneck is in each case blindly rebuilding index and updating stats does not guarantee consistency, it is the execution plan that is important.
comparing run time is worthless, you need the cpu as well.

joechang: To follow on your comments regarding my ‘rant’ – the particular proc I’ve been troubleshooting generates intermediate temp tables and is very complex making the creation and examination of execution plans somewhat daunting. Anyway, with further examination along the lines of your suggestion I did make progress; and did find some bizarre behavior in SQL 2005. 1) Base time on old SQL 2000 machine for entire process: 2.5 mins. Base time on new SQL 2005 machine: 10 mins. Both machines with no other activity and clear buffers and cache peformed. Also indexes rebuilt and statistics updated on SQL 2005.
OLD: 2.5 mins
NEW: 10 mins 2) I broke down the proc to the initial selection of target data, eliminating temp tables and about 60% of the code. As expected, the job times didn’t change – the bulk of the time filtering data to identify the targets based on a variety of selection criteria (supplied as parameters). However, the execution plan was much more manageable.
OLD: 2.5 mins
NEW: 10 mins 3) At this point, the proc was unioning two very large tables which were joined to four smaller tables. I removed the four smaller tables from the process producing an even simpler execution plan. The resulting run times were noteworthy.
OLD: 2.5 mins (no change)
NEW: 26 mins – Obviously something really odd happening since simplifying the process made it run substantially longer on SQL 2005. 4) I didn’t bother trying to figure out why the odd result occurred between #2 and #3 on the NEW machine. I did compare the index usage between the OLD and NEW execution plans. I used hints to force the execution plan on the NEW machine to match the execution plan on the OLD machine with a nice result:
OLD: 2.5 mins
NEW: 20 seconds (HINT forcing index usage changed execution from 26 mins to 20 secs)!!!! The total cost of the proc with the HINT was 470 (running in 20 seconds) while the cost of the original plan without HINT was 69 (running in 26 minutes). So there appears to be flaws in the cost determination and consequently the resulting execution plan.

let me say again
LOUDLY for everyone to hear
comparing run time is worthless, you need the cpu as well. use SET STATISTICS TIME ON
to get more accurate duration and cpu times
throw in SET STATISTICS IO ON just to make sure data sizes are close since this is SQL 2005, save the plans in XML (.sqlplan), open with notepad,
paste in here
then we can see the plans
the next level of plan analysis is to
1. generate the estimated plans, note the row counts (save this plan)
2. run the query with plan, note the actual row counts
are there large discrepancies? note there is a difference in row and number of execute definitions you were expecting no flaws in the optimizer cost determination?
does any one read my articles on this site?
http://www.sql-server-performance.com/joe_chang.asp
Don’t burst your aorta! I’m not trying to debate you on technical intricacies of SQL Server internals!! Given that I’ve seen the 2000 vs 2005 question a few times, I was just trying to illustrate some extreme examples of counter-intuitive results. From a practical standpoint, there is no question that the new platform should deliver imporved performance. I have absolutely no doubt that I could install SQL 2000 on the new platform and EVERY SINGLE process would run 3+ times faster. This is a performance forum – I have a 100+ processes and several hundred views – if some percentage of these run 4-10 times slower on SQL 2005 on a platform where they should run at least 3 times faster, that is an item to be aware of. For me and possibly others. I disagree to the suggestion that there is no value to the observation that:
OLD: 2.5 minutes
NEW: 26 minutes (native SQL 2005 execution plan)
NEW: 20 seconds (index hints used to override SQL 2005 execution plan) I have never used hints in query development. I spent hours pouring over server and SQL parameters looking for something that was configured incorrectly. If I’d seen a post like this, I wouldn’t have wasted as much time in those areas…
ugh!!! with just duration, you don’t know if it is the plan or the hw
with cpu & duration, you should know if it is one or the other why some people like living in the dark, i do not know
ignorance is bliss?

Insults aside, I apparently gave you more credit and respect than I should have. It would be obvious from comparison of the respective hardware, there is no way the new platform would deliver slower performance. That is also obvious in that most tasks on the new platform are much faster. I never asked about hardware – I knew it wasn’t the problem. The best the old server can do in physical disk bytes/sec is 30 million; the new one regularly hits upwards of 300 million/channel. Since we’re talking single-user scenarios and I/O intensive proc, most other items (ie. queue length) are immaterial unless you’re considering altering array configuration – the channel/physical drive aggregate bandwith is always saturated. The two logical disk devices are now on separate channels and more spindles than the old server; tempdb is now on a mult-spindle array. All are 15k U320; the old was all U160. Memory is 16gb vs 3gb on the old. On sequential scans the new server smokes – the additional spindles and bandwidth run a multi-gig table scan in a few seconds vs. several minutes on the old server. A complex join involving several tables with millions of rows runs in 1.5 minutes on the new versus 1.5 hours on the old. I have seen that a lot of the situations presented on ‘old’ vs ‘new’ wind up having offsetting configurations that blur the actual net performance result. That’s why I was posting an unquestionable example of a situation where every element between the two servers was either comparatively identical (ie. single-user, no other apps) or a plus for the new one (memory, i/o channels, etc.) – and it still had slower overall performance in some instances. I’m sure you’re extremely knowledgeable on SQL and system configuration. That said, no one knows everything; some think they do; I think there was an opportunity for intellectual discourse and mutual information gain had a constructive approach been taken.
]]>