SQL Server Performance

backup/restore take way too long in sql server 2012 cluster

Discussion in 'ALL SQL SERVER QUESTIONS' started by Thanh Nguyen, Sep 17, 2012.

  1. Thanh Nguyen New Member

    Hi experts,

    I need some directions on what sort of things should I look into why backup and restore take way too long, e.g. 130GB db take 4 hrs to backup and 4hrs to restore.

    We have 2 brand new physical servers with identital hw (12-core, 288GB ram). We use VMware to create 1 virtual machine on each server and give 8-core and 260GB ram to each machine. From these 2 virtual machines, I just setup 2-node active/active failover cluster on windows 2008R2 and SQL Server 2012 Enterprise. Everything goes smoothly with installation and setup. I test out failover and stuff and things work great. I configure the windows/sql server:
    - enable "Instant File Inititalzation"
    - set sql server 100GB max mem
    - 4 tempdb files

    Our plan is to replace the current single sql 2005 with this 2012 failover cluster.

    When I start poking around with backup/restore on this cluster, i notice it take way too long, e.g. 130GB db take 4 hrs to backup and 4hrs restore to the SAN. First I blame the SAN for IO latency, but when I try to back up to the 'nul' device, it take around the same time too. These 2 new servers are on the same VLAN as the old server. The old server is 4-core 32GB ram only takes 10ms per read/write. These 2 new servers take around 80s per read/write. These 2 new servers connect to SAN via fibre channel with 1Gbs connection. I also poke around with backup/restore options such as BufferCount, MaxTransferSize, BlockSize, compression, etc... but it does not make any big difference. Of course from sql error log, i got lots of warning about 'IO take more than 15s'. I even try to backup/restore to/from multiple files and still does not help. I notice the restore speed only 8MB/sec (e.g. RESTORE DATABASE successfully processed 14701610 pages in 14676.626 seconds (7.825 MB/sec))

    Just a note, when I copy the 130GB file(s) from SAN to the server, take me around 20 min. BTW, all disks storage from server are allocated from the SAN, including C: drive. I've applied CU3 to sql server 2012 cluster, but still does not help.
    Our SAN administrator monitors the NetApp SAN while I'm doing the db restore and he sees the IO latency is around 10-20ms which is normal, but he sees very little activity hitting the SAN and only sees the write speed around 8-9MB/s which is consistent with what I see from the sql server side. I've used the following tools to measure IO performance from the windows/sql side:
    - Windows perfmon: Avg disk sec/write, Avg disk sec/read, CPU, mem
    - sqliosim
    - sql server error log

    all those 3 gives me very similar IO result which is around 80-100 sec per read/write.

    Yes. I understand this can be million things can go wrong with this issue. I just need some general guideline/direction on where else should i look into.

    Thanks very much,
  2. Thanh Nguyen New Member

    I got the issue resolved. We found out the problem was at the VMWare. Let me share my resolution here just in case anyone out there run into same issue i got. Here's the comments from VMWare consultant guy who helped us resolve the issue:
    The configuration of the VMware / NetApp MPIO had to be modified to work with Microsoft Cluster Services. The default is ALUA with Round Robbin PSP on the vSphere hosts. MSCS places permanent scsi reservations on LUNs and requires only a single active path per device per host. All IO on the "other" active path in RR would result in Blocked IO errors in the /var/log/messages.

Share This Page