SQL Server Performance

RAID config for Performance system

Discussion in 'Performance Tuning for Hardware Configurations' started by mrwaus, Sep 24, 2003.

  1. mrwaus New Member

    Can you recommend a RAID config. with 6 drives? I am building a standalone
    workstation with an sql db. I download daily about a million records,
    aggregate them down to about 200,000. I query from these 2k records and
    actually save only a small data set from these. I can always re-download the download data as it remains availabl for up to 1 year via an FTP site. The final data set I will back up daily at an online offsite location.

    I am getting different advice in my research. I am more intertested in
    performance than anything else. Eventually I will be downloading several times a day.


    My current config. is:

    1)OS, MS SQL executable, swap file & tempdb on RAID 0-2 36 gig Raptor SATA drives
    2)System & User DBs on RAID 0-2 36 gig Raptor SATA drives
    3)Log Files on RAID 0-2 36 gig Raptor SATA drives

    Maybe the Tlog should be on a RAID 1.

    Also, If I could & should do this on a RAID 10 how would I set it all up physically?

    Thanks.

    MW
  2. satya Moderator

    Keep SQL binaries on RAID 1 or 5
    Data files on RAID 5 or 10 and Tlog on RAID 1.

    _________
    Satya SKJ
    Moderator
    SQL-Server-Performance.Com
  3. Twan New Member

    with only 6 drives you are quite limited... you will have to put the OS/SQL on 2 drives (RAID1) the log on another 2 drives (RAID1) which leaves you 2 drives... This can't be used for RAID 5, 10 or 0+1 So no choice but to use RAID1

    You could also mirror two drives (RAID1) partition it to place logs and OS/SQL on those drives. This leaves you 4 drives. These could be RAID5 if you need the space or RAID10/0+1 if you need the extra performance

    It is probably going to come down to testing I think... OS/SQL and logs on the same physical drive (separate volumes) works ok for me here, but if you had more drives splitting it off COULD give you better performance

    Cheers
    Twan

    imho a workstation may not be the best type of device to do this work...
  4. Luis Martin Moderator

    Agree with Satya except I suggest data on RAID 10



    Luis Martin

    ...Thus mathematics may be defined as the subject in which we never know what we are talking about, nor whether what we are saying is true.
    Bertrand Russell
  5. vbkenya New Member

    The bulk of your work on this workstation inclined towards reading rather than writing (Aggregation and filtering queries). The write operations which involve use of the log are minimal.

    Your current RAID 0 configuration is OK, in my view, and since you are actually shipping off the data to an offsite backup, the RAID 1, 10 and even RAID 5 configs are unnecessary (though not entirely unwise).






    Nathan H.O.
    Moderator
    SQL-Server-Performance.com
  6. satya Moderator

    RAID 10 is expensive in terms of cost, so I proposed RAID 5.

    _________
    Satya SKJ
    Moderator
    SQL-Server-Performance.Com
  7. Muhammed New Member

  8. gcoakes New Member

    I am not sure this is a RAID problem, but here goes..

    System configuration is as follows:
    C: 2 x 36GB RAID1 on dedicated HW RAID controller
    SAN (Clariion with Brocade switches and four storage processors(SP))
    (40 x 36GB disk drives in four trays of 10,
    SP1,SP2 for tray 1 & 2, SP3 for tray 3 & 4, SP4 not used)
    LUN1 7 x 36GB RAID0 on SP1
    LUN2 7 x 36GB RAID0 on SP2
    LUN3 2 x 36GB RAID0 on SP1
    LUN4 2 x 36GB RAID0 on SP1
    LUN5 7 x 36GB RAID0 on SP3
    LUN6 7 x 36GB RAID0 on SP3
    LUN7 2 x 36GB RAID0 on SP3
    LUN8 2 x 36GB RAID0 on SP3
    Some other smaller LUNS are on remaining drives to total to 40.
    Logical Drives
    F: Striped across LUN1 and LUN5 (db data file abc.mdf 135GB) (~450GB total space)
    G: Striped across LUN2 and LUN6 (db data file abc.ndf 159GB) (~450GB total space)
    L: Striped across LUN3 and LUN7 (db log file abc.ldf 5GB) (~100GB total space)
    T: Striped across LUN4 and LUN8 (temp database - minimal use)(~100GB total space)

    When Application is running largest batch processing, which can run up to 16 copies of the same program to achieve the desired performance (each copy has its own section of the database to work on based on range of numbers important to database - this ensures no lock conflicts between copies, since they are all working on separate portions of data)
    - System has 16 x 1.6Ghz CPUS and 16GB memory and four 2Gb fiber channel controllers connected to the SAN
    - Using CPU affinity, SQL Server has CPUs 0,1,2,3 and application has CPUs 4 to 15.
    - SQL Server has been allocated 5.5GB of memory
    - System CPU utilization shows is 30-40-50%
    - Controllers and SPs show low utilization on Navisphere screens
    - Performance monitor shows logical and physical disk with 60-70-80 %IDLE on F & G drives
    - Occasional IO queuing for very brief durations, but mostly zero on F & G drives
    - WaitStats indicate that 'PAGEIOLATCH_SH' is where we are waiting the most
    - Have observed some indication of 'IO Stall' on access to pages in files, but are not sure if this is valid or if we understand this
    - Earlier in the batch processing, there is indication that some tasks are consuming all the performance on the F & G drives since %IDLE can go below 10-20% and even 0%. This specific task seems very intense, but has a short duration (10-15 mins)

    The problem is that it seems like something is keeping the application from consuming the CPUs and IO subsystem, but what?

    During a database restore, we see the IO subsystem able to handle ten times the apparent workload that we see during the batch processing. I know this is a different type of activity, but it should indicate the IO subsystem can handle much more work and should not be limiting the batch process. Database restore of 300GB database takes about 1 hour from multiple disk files on F & G.

    Any ideas or suggestions would be appreciated.

    regards
    Greg
  9. satya Moderator

  10. gcoakes New Member

    Satya,

    I don't understand the 'database health checks' reference?
    The link provided doesn't seem to refer directly to this.

    The database is a mature schema with indexes on all tables that are
    very tuned to the application and the application makes heavy use of
    cursors. The application is in Microfocus Cobol and we have them helping
    in the tuning as well as Microsoft.

    Greg
  11. satya Moderator

    As you say it uses cursors heavily then multiple execution of code does have affect of performance even though it has necessary indexes in place.

    Can take help of PROFILER for slow running queries and submit trace to INDEX TUNING WIZARD for any index recommendation, that may help.

    Health checks reference is to about DBCC DBREINDEX and maintenance plans on database.

    _________
    Satya SKJ
    Moderator
    SQL-Server-Performance.Com
  12. gcoakes New Member

    Satya,

    Multiple executions each use own server-side cursors and cursors appear to use indexes efficiently.

    Much Profiler analysis has been done and there seems to be little potential for improvement in thas area.

    How does DBCC DBREINDEX impact or improve things?
    How does maintenance plans impact or improve things?

    Database is 300GB and contains a few tables that have up to 220 million rows.
    The database was recently built for our testing and we have not considered
    if the pages/extents in the database files are well organized.

    Is DBREINDEX a fast tool or will it require a long time to run and see if it had any
    impact?

    Greg
  13. satya Moderator

    For such a big database running DBREINDEX will have slowdown performance, so its better to use during less traffic on database.

    If you get slow response even intermittently run UPDATE STATS to take care of performance.

    _________
    Satya SKJ
    Moderator
    SQL-Server-Performance.Com
  14. Twan New Member


    One comment is that CPU 0 tends to have special tasks, so if SQL tries to consume all of CPU0-4 then it will starve the application. examples are some interrupt and some OS tasks only run on CPU 0

    Cheers
    Twan
  15. gcoakes New Member

    Twan,

    This is a 16 CPU system and we are using CPU Affinity to lock SQL to CPUs 0,1,2,3 and
    to lock Application to CPUs 4 to 15. The OS has access to all CPUs.
    CPUs 0-3 have 30-50% utilization and CPUs 4-15 have 10-40% utilization.
    It doesn't seem like a CPU resource limitation?

    Greg
  16. Twan New Member


    HI Greg,

    ok I thought that the following statement was referring to CPU... "- Earlier in the batch processing, there is indication that some tasks are consuming all the performance on the F & G drives since %IDLE can go below 10-20% and even 0%. This specific task seems very intense, but has a short duration (10-15 mins)"

    I realise that the OS has access to all CPUs, just beware that some tasks must run on CPU 0, so leaving that CPU for only the OS might be worth while anyway...?

    Have you checked for locking/blocking issues? Eventhough data is using different ranges of ids, there can still be locking/blocking issues due to index locking for updates/inserts/deletes and/or lock escalation...?

    Cheers
    Twan
  17. gcoakes New Member

    Twan,

    Sorry about the confusion between CPU and IO utilization.

    Yes, the one program early in the batch cycle is very intense. It reads flat input files and uses the data to update the database. Almost 100% updates/inserts. I think we can reconfigure the SAN into one drive F with all the physical disk drives and resolve the performance concern. I have wondered if this reconfig would have positive or neutral or negative impact on the later processes that we would like to run faster. If the later processes are experiencing contention at the page level in the tables and indexes, I would not expect the reconfig to have any impact at all since it does not address the real problem. It is possible the higher overall performance might help minimize the contention impact somewhat.

    The reconfig I would propose might be:
    LUN 0 - 10 x 36GB RAID0 drives on SP1
    LUN 1 - 10 x 36GB RAID0 drives on SP2
    LUN 2 - 10 x 36GB RAID0 drives on SP3
    LUN 3 - 10 x 36GB RAID0 drives on SP4 (if SP4 not available then SP3)
    Logical Drive
    Drive F: Striped across LUN 0, 1, 2, 3 (with db data and log files)(~1000GB total space)

    Any comments on SAN config would be appreciated.

    Would splitting the database into more 'MDF' files help?

    I have mentioned the CPU 0 issue to the team and they are considering testing with
    this change, but are not optimistic it will have significant impact.

    Performance monitor stats indicate very little contention for locks and little or no wait time for locks. I have been thinking about your idea that the contention could be on the index pages, but I am not sure how to determine this? Lock escalation is another issue that I cannot find good information about - how to monitor and analyze it?

    'Blocking issues' - what had you in mind about this? Do you mean blocking of records in the tables and records in the indexes or something else?

    thanks
    Greg
  18. Luis Martin Moderator

    I was consulting to frend who knows about SAN.
    Here is his comments:

    1) Assuming there in not more tuning work to do.
    2) Performance problem is because I/O and not CPU.
    3) Analisys of drives queues show some LUN are full working an some LUN are not.

    Then new configuration may help. The idea is: in first configuration you made a correct analisys of where to put data, index and so on.
    In new configuration you live all in hardware hands.

    So, the question " Would splitting the database into more 'MDF' files help?" the answer is no, because now is hardware problem.

    And he is agree with you in if new configuration will help or not.

    My point of view (common sence, only) is: if 3) is true, then go on.

    Hope this help.





    Luis Martin

    ...Thus mathematics may be defined as the subject in which we never know what we are talking about, nor whether what we are saying is true.
    Bertrand Russell
  19. Twan New Member

    Hi Greg,

    If you've been monitoring lock contention, then that will cover index pages and blocking problems too... I'm not sure what else to suggest...

    In terms of SAN config, I'd say the more separate disks/controllers the better. These should translate through to separate logical volumes. I'd have a 1:1 map between LUNs and NT volumes. I'd also consider breaking the SAN up into multiple LUNs per controller more like you had originally, but using HW RAID instead of software RAID.

    one volume for logs, one volume for tempdb (if minimal use then consider placing it on the volume with the logs. The rest of the volumes used for data files and the data split placing high used tables into separate files and/or splitting indexes and data for high used tables into separate files

    Cheers
    Twan
  20. georgel New Member

    Hi Greg,

    The solution you've recommended is referred to by Juan Loazia of Oracle as SAME (Stripe And Mirror Everything). It's shown to be a very simple way of getting performance with very little configuration hassle. It's compared in an IBM research paper called 'Storage Layout and I/O Performance Tuning for IBM Red Brick Data Warehouse' to various manual methods of laying out a database on an array. Check out those papers if you want to read more about what you're proposing. They also deal with differing stripe sizes (8K -> 1MB).

    Anyway, back to your issue. I think it lies with your transaction log (L drive), and your recovery mode. If you're at 'Full' recovery, than all your inserts and updates are hitting the log first. For example, I currently have my transaction log placed on a (4) 15k disk RAID 10 array (write cache disabled), and I still see '% disk times' above 80% during periods of activity. All this, while the data file disk is as calm as a summer's day. So, don't forget about that pesky log!

    hth,
    George
  21. gcoakes New Member

    Twan,

    I assume 1:1 LUNs and NT volumes means that each 10 x 36GB LUN should be linked to a separate drive letter and I would have F,G,H,I drives with ~350GB each. This would require me to have four database files to locate on these drives in order to spread the workload across them.

    Log and Tempdb activity is currently very small as I have them on smaller portions of the existing SAN and the performance monitor indicates all the activity is on F & G.

    The database is currently in two database files (135GB and 159GB).
    Do you know an easy way to spread the database across four files?
    I am aware of directed restore feature in DB2 that allows a backup to be restored on a different file layout thait was originally backed up from. I cannot find this feature in SQL Server or any third party utility that offers it for SQL Server.
    Unloading the database and rebuilding it will take days otherwise.

    How do you specify in SQL Server that certain tables and/or indexes should go in separate files and how do you move them once the database is created?

    regards
    Greg
  22. Luis Martin Moderator


    Using EM, Database Properties, create a new filegroup (said IndexMove), then go to tables design and for each table you can change each Index (no-cluster) from primary filegroup to IndexMove filegroup.
    For cluster index you must delete and recreate into filegropu IndexMove.



    Luis Martin

    ...Thus mathematics may be defined as the subject in which we never know what we are talking about, nor whether what we are saying is true.
    Bertrand Russell
  23. gcoakes New Member

    George,

    I have looked for this 'IBM research paper' and cannot locate it. Do you have a link you could send me for it?

    Tran log and temp db are currently on small drives and appear to have very small activity according to performance monitor. All the activity is on the data drives F & G.

    Does my idea of one large high performance logical drive consisting of as many physical drives as possible support having all the data/log/tempdb files on the same drive(F).
    Is there a requirement for extra drive letters for some reason or would this work?

    Note that with the disk, each LUN containing multiple physical disk drives can be controlled by one storage processor connected to one fiber channel controller on the server. Thus, several of these LUNs striped together would ensure plenty of controller band-width and plenty of drives involved in most IOs. Does this make sense? or would separate drives for each LUN and a database file on each drive be better?

    I think I am begining to understand this, but I am looking for some experienced advice.
    The various options seem confusing and the pros/cons not clear.
    Have you every seen this documented and explained clearly anywhere?

    regards
    Greg

  24. satya Moderator

  25. Twan New Member

    Hi Greg,<br /><br />Very hard to know... My personal setup is to separate log and databases onto different physical arrays and that's it. In the past I've found that the overhead of managing a fully separated disk IO layout was far outweighed by the benefit in performance. I'd agree with the paper when saying that you need to look at the data model and application code first, since this is where the problems almost always are. (and for those developers among you, yes I am a developer myself... <img src='/community/emoticons/emotion-5.gif' alt=';-)' />)<br /><br />Cheers<br />Twan
  26. georgel New Member

    quote:Originally posted by gcoakes

    George,

    I have looked for this 'IBM research paper' and cannot locate it. Do you have a link you could send me for it?

    Satya linked you to the correct document. thanks!


    quote:
    Tran log and temp db are currently on small drives and appear to have very small activity according to performance monitor. All the activity is on the data drives F & G.

    Ok. Could you describe what more of your batch processing workload? Like, do you see the transactions per second on you database remain constant? Does it reach a maximal number and stay there? Can you run a profile and see what procs or tables are being hit? Have you separated your indexes to different filegroups (arrays) to allow parallel queries?


    quote:
    Does my idea of one large high performance logical drive consisting of as many physical drives as possible support having all the data/log/tempdb files on the same drive(F).
    Is there a requirement for extra drive letters for some reason or would this work?

    I am a relatively new transplant to SQLServer. Other vendors (Oracle), seem to be recommending the 1 big drive solution. I have not seen an iron-clad pro or con recommendation about multiple file filegroups on the same array for SQL Server. Right now, I am building my databases with multiple files per filegroup staggered on different arrays.


    quote:
    Have you every seen this documented and explained clearly anywhere?

    regards
    Greg

    The IBM doco was the best I've seen, and I've been looking for awhile.

    Please let us know what you find out.

    hth,
    George

Share This Page