SQL Server Performance Forum – Threads Archive

Data Warehouse — Poor Performance

We’re running SQL 2000 sp4 on Quad Xeon Dual Core 3Ghz processors and 16GB of RAM — My company basically bought a HP DL580 and loaded it up. We’ve got approximately 1.2TB of data on a directly connected NetApp. Despite all of that, performance is not impressive. The following query takes 10 minutes to return 2 records:
SELECT
d.ReportYearMonthInt,
d.ActualDate,
d.DayType,
p.level1,
count(1)
FROM
IUBProductMaster_Facts f
inner join ConformedDimension_JHATimeDim d on (d.JHATimeDim_Key = f.PostDateDim_JHATimeDim_Key)
inner join IUB_Custom.dbo.viewProduct_Hierarchy p on (f.ProductDim_ProductDim_Key = p.dimension_key)
WHERE d.ActualDate = ‘5/31/2006’
GROUP BY
d.ReportYearMonthInt,
d.ActualDate,
d.DayType,
p.level1 The fact table has 32 million records with a clustered index on the PK.
The time dimension has 230,000 records, a covering index on ActualDate and a clustered index on the PK.
The Product view has 300 records, no view indexes and a clustered index on the underlying table’s PK. I don’t know what further info to give except this edited version of statistics profile: NodeId Rows Executes EstimateRows EstimateExecutions PhysicalOp LogicalOp EstimateIO EstimateCPU AvgRowSize TotalSubtreeCost
———– ———– ———– ———————— ———————— —————————— —————————— ———————— ———————— ———– ————————
1 2 1 1.0 NULL NULL NULL NULL NULL NULL 1750.5166
2 2 1 1.0 1.0 Compute Scalar Compute Scalar 0.0 0.0000001 47 1750.5166
3 2 1 1.0 1.0 Stream Aggregate Aggregate 0.0 5.0585001E-4 51 1750.5166
4 302 1 179.00002 1.0 Sort Sort 1.1261261E-2 2.1898318E-3 51 1750.5161
5 302 1 179.00002 1.0 Parallelism Gather Streams 0.0 2.9675225E-2 59 1750.5027
6 302 8 179.00002 1.0 Merge Join Inner Join 0.0 6.3651716E-3 59 1750.473
7 302 8 321.80188 1.0 Sort Sort 2.8153153E-3 1.070448E-3 26 0.20894766
8 302 8 321.80188 1.0 Parallelism Repartition Streams 0.0 2.9783506E-2 745 0.2050619
9 322 8 321.80188 1.0 Table Scan Table Scan 0.17313406 1.4988499E-3 745 0.17463291
10 302 8 179.00002 1.0 Sort Sort 2.8153153E-3 5.4745574E-4 42 1750.2577
11 302 8 179.00002 1.0 Bitmap Bitmap Create 0.0 2.9670928E-2 42 1750.2543
12 302 8 179.00002 1.0 Parallelism Repartition Streams 0.0 2.9670928E-2 42 1750.2543
13 302 8 179.00002 1.0 Hash Match Aggregate 0.0 0.39432949 42 1750.2246
14 230292 8 229553.31 1.0 Parallelism Repartition Streams 0.0 1.4529932 42 1749.8303
15 230292 8 229553.31 1.0 Hash Match Inner Join 0.0 36.950817 42 1748.3773
16 8 8 1.0 1.0 Parallelism Broadcast 0.0 0.02850491 152 0.03183784
17 1 1 1.0 1.0 Bookmark Lookup Bookmark Lookup 4.882658E-5 0.0000011 152 3.3329306E-3
18 1 1 1.0 1.0 Index Seek Index Seek 3.2034011E-3 7.9603E-5 37 3.283004E-3
19 32137465 8 32137465 1.0 Clustered Index Scan Clustered Index Scan 1702.5569 8.8378229 124 1711.3947 One thing that alarmed me was that the actual number of executes for many of the steps was 8. Is this something to be concerned about? What is this actually telling me? 10 minutes for such a simple query seems horribly slow — I’ve worked with larger fact tables before with much faster response. I should add that the initial clustered index scan seems inefficient — what can be done to change that? Thanks for any help.

Try this minor change… SELECT
d.ReportYearMonthInt,
d.ActualDate,
d.DayType,
p.level1,
count(1)
FROM
IUBProductMaster_Facts f
inner join ConformedDimension_JHATimeDim d on (d.JHATimeDim_Key = f.PostDateDim_JHATimeDim_Key AND d.ActualDate = ‘5/31/2006’)
inner join IUB_Custom.dbo.viewProduct_Hierarchy p on (f.ProductDim_ProductDim_Key = p.dimension_key)
GROUP BY
d.ReportYearMonthInt,
d.ActualDate,
d.DayType,
p.level1
Cheers
Twan

quote:Originally posted by Twan Try this minor change…

Thanks Twan but I got the same Query Plan and the same performance. What was your thought with that change?

blindly trying to run SQL Server or any database on a system configured without any supporting analysis almost always yields horrible results 1. try an index on the fact table with the columns in the following order:
PostDateDim_JHATimeDim_Key, ProductDim_ProductDim_Key , level1 2. try the query with various degrees of parallelism, ie tack on to the end of the query
OPTION (MAXDOP x)
with x = 1, 2, 4 (8 is default for you) 3. test the table scan performance of your storage system,
first drop clean buffers so you are running from disk
then write query like:
SELECT COUNT(level1) FROM IUBProductMaster_Facts WITH (NOLOCK)
check the execution plan to make sure it shows a clustered index scan
other write it as
SELECT COUNT(level1) FROM IUBProductMaster_Facts WITH (NOLOCK,INDEX(0))
to force a table/clustered index scan a properly configured storage system should generate 700MB/sec disk activity with SQL 2000, SQL 2005 could do 2GB/sec or more
if you are not close to 700MB/sec
then why did you configure a storage system that cannot move mountains of data quickly when that is the purpose of a data warehouse

Hi ya, I’ve had cases before where putting the where clause in the ON part has changed the plan, significantly improving performance. The other thing you could try is to use a derived table something like inner join (select … from Conformed… WHERE d.ActualDate = ‘5/31/2006) d on d… assuming that this sub-select results in only a couple of rows The covering index is probably not covering since we’re selecting more columns from the Conformed… table This may be causing a merge or hash join? Cheers
Twan

quote:Originally posted by joechang blindly trying to run SQL Server or any database on a system configured without any supporting analysis almost always yields horrible results

I’m not a DBA, nor do I pretend to be. The one thing our company can’t seem to shell out for is staff to support the hardware they buy (Staff can’t be depreciated as a capital expense or some such thing). Could you direct me to a good resource for "configuring a system with supporting analysis"? I’m the guy that’s got to make this stuff work and need all the help I can get.

quote:
1. try an index on the fact table with the columns in the following order:
PostDateDim_JHATimeDim_Key, ProductDim_ProductDim_Key , level1

This completely fixed the problem! — results returned in under 3 seconds. Thanks! I previously tried placing individual indexes on each column in the fact PK as I’d seen this improve performance in other situations but the optimizer just ignored them. I hate to think about placing compound indexes on every combination of columns — is there some rule of thumb on best practices.

quote:
2. try the query with various degrees of parallelism, ie tack on to the end of the query
OPTION (MAXDOP x)
with x = 1, 2, 4 (8 is default for you)

This had no impact.

quote:
3. test the table scan performance of your storage system,
first drop clean buffers so you are running from disk
then write query like:
SELECT COUNT(level1) FROM IUBProductMaster_Facts WITH (NOLOCK)
check the execution plan to make sure it shows a clustered index scan
other write it as
SELECT COUNT(level1) FROM IUBProductMaster_Facts WITH (NOLOCK,INDEX(0))
to force a table/clustered index scan a properly configured storage system should generate 700MB/sec disk activity with SQL 2000, SQL 2005 could do 2GB/sec or more
if you are not close to 700MB/sec
then why did you configure a storage system that cannot move mountains of data quickly when that is the purpose of a data warehouse

"drop clean buffers" How would I do this? and then how would I measure disk activity. You’ll have to excuse me, my experience is with data modeling and development — at my last company we had DBAs on staff to help with performance. Thanks again for the fast response!

i suppose i should have said to test item 2 before applying item 1, just test item 3, but first SET STATISTICS IO ON
SET STATISTICS TIME ON
GO SELECT COUNT(level1) FROM IUBProductMaster_Facts WITH (NOLOCK,INDEX(0))
GO SET STATISTICS TIME OFF
SET STATISTICS IO OFF
GO after running the SELECT query get the output of the messages pane

quote:Originally posted by joechang
after running the SELECT query get the output of the messages pane

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 2 ms. (1 row(s) affected) Table ‘IUBProductMaster_Facts’. Scan count 8, logical reads 2349914, physical reads 2, read-ahead reads 2350123. SQL Server Execution Times:
CPU time = 32860 ms, elapsed time = 481323 ms. ( or 8 minutes)
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

18GB / 481 sec = 37.5MB/sec
so what is comes down to is your storage system sucks
some one bought a $30K+ server (base+cpu+memory) and paired it to a storage system with sequential performance less than 1 single $200 disk you need to ask who spec and configured the storage system, NetApp is supposed to have decent write performance but potentially very poor sequential read performance if this had be a properly configured direct attached storage,
4 x Smart Array Px00 SAS RAID controllers
4 x MSA50
40 x 10K SFF SAS drives you’d be doing 700MB/sec+ and the original query would have run in 27sec without the matched index
so this is entirely not SQL Server’s fault but a poor job of configuring the storage

Where did you get the 18GB figure? Is that related to the logical reads/physical reads? Thanks!

notice the Estimated IO cost of 1702 for the last line of the statistics profile for the facts clustered index scan?
each unit of a scan IO cost is approx 10.5MB, so 10.5 x 1702 = 18,000 MB also, read-ahead reads 2350123 at 8K per read

That read is taking place on a CIFS share to the NetApp over a crossover cable (we’re running 64bit OS and at the time of installation iSCSI for 64bit wasn’t out of beta). Do you think that that would impact the read problems? I’m contacting our NetApp admin right now to get some further help. Thanks for all your time Joe. You’ve been a great help!

I’ve used a NetApp 980C over LVD before ISCSI was ratified and I’ll never do it again. Horrible piece of junk for database servers, esp. the reads. It writes very fast because it uses a Raid-4 internal storage. Write good, read bad. Make it a fileserver and get yourself a proper small SAN (MSA1000) or toss the filer totally and get a real SAN in-house. Also, if you’re sharing q-trees with any other systems you’re paying a heavy I/O penalty in that sharing. Our Informix guys would regularly kill the SQL servers by deleting tons of 2GB chunks (yes, Informix still only has a 2GB chunk limit) and the filer processor would go 100% on the deletes and starve the other systems for cpu time. I saw write-delay failures a lot and the poor Exchange guys had even worse problems than I did because they were over CIFS as well….

quote:Originally posted by joechang notice the Estimated IO cost of 1702 for the last line of the statistics profile for the facts clustered index scan?
each unit of a scan IO cost is approx 10.5MB, so 10.5 x 1702 = 18,000 MB also, read-ahead reads 2350123 at 8K per read

How did you arrive at the 10.5MB calculation? Thank you.

read my papers on the CBO formulas

]]>