Characterizing I/O Workload
Calculating Number of Disks Required
In calculating the number of disks required to support a given workload, two values must be known; the required disk I/O’s per second, which is the sum of the reads and writes that we looked at in the previous section, and the I/O per second capacity, or IOPS, of the individual disks involved.
The IOPS value of a given disk depends on many factors including the type of disk (SCSI, SAS, SATA, and Fiber), the spin speed (e.g.: 10,000 RPM, 15,000 RPM) and the IO type (Random vs Sequential). Tools such as SQLIO, can be used to measure a disk’s IOPS capacity. The sidebar “Disk Drive Technologies” covers the different types of disk storage available, and the attributes of each which effect the calculations presented below.
The process of selecting RAID levels and calculating the required number of disks is significantly different in a SAN (and between different SAN vendors) compared to a traditional Direct Attached Storage (DAS) solution. Configuring and monitoring virtualized SAN storage is a specialist skill, and DBAs should insist on SAN vendor involvement in the setup and configuration of storage for SQL Server deployments. The big four SAN Vendors (EMC, Hitachi, HP and IBM) are all capable of providing their own consultants, usually well versed in SQL Server storage requirements, to setup and configure storage and related backup solutions to maximize SAN investment
For the purposes of calculating required disk numbers, an often used average is 125 IOPS per disk for Random I/O. Whilst commonly used server class 15,000 RPM SCSI disks are capable of higher speeds, particularly for sequential I/O, the 125 IOPS figure is a reasonable average for the purposes of estimation and enables the calculated disk number to include a comfortable margin for error for handling peak, or higher than expected, loads.
Let’s look at a commonly used formula for calculating disk numbers;
Required # Disks = (Reads/Sec + (Writes/Sec * RAID adjuster)) / Disk IOPS
As above, dividing the sum of the disk reads and writes per second by the disk’s IOPS yields the amount of disks required to support the workload.
RAID adjuster takes into account the additional writes incurred by a RAID system in providing fault tolerance at the disk level. RAID 0, which provides no fault tolerance, has no write overhead, hence a RAID adjuster of 1. RAID 1 and 10 incur two physical writes to mirror each requested write, hence they have a raid adjuster of 2. RAID 5, in maintaining parity, has a raid adjuster of 4.
Disk Drive Technologies
Using a parallel interface, ATA is one of the original implementations of disk drive technologies for the personal computer. Also known as IDE or Parallel ATA, it integrates the disk controller on the disk itself and uses ribbon style cables for connection to the host.
In widespread use today, SATA, or Serial ATA drives are an evolution of the older Parallel ATA drives offering numerous improvements such as faster data transfer, thinner cables for better air flow, and a feature known as Native Command Queuing (NCQ) whereby queued disk requests are reordered to maximize the throughput. Compared to SCSI drives, discussed next, SATA drives offer much higher capacity per disk, with multi terabyte drives available today. The downside of very large SATA disk sizes is the increased latency of disk requests, partially offset with Native Command Queuing.
Generally offering higher performance than SATA drives, albeit for a higher cost, SCSI drives are commonly found in server based RAID implementations and high end workstations. Paired with a SCSI controller card, up to 15 disks can be connected to a server for each channel on the controller card. Dual channel cards enable 30 disks to be connected per card, and multiple controller cards can be installed in a server, allowing a large number of disks to be directly attached to a server. It’s increasingly common for organizations to use a mixture of both SCSI drives, for performance sensitive applications, and SATA drives, for applications requiring high amounts of storage. An example of this for a Database application is to use SCSI drives for storing the database, and SATA drives for storing online disk backups.
SAS, or Serial Attached SCSI disks connect directly to an SAS port, unlike traditional SCSI disks which share a common bus. Borrowing from aspects of Fibre Channel technology, SAS was designed to break past the current performance barrier of the existing Ultra 320 SCSI technology, and offers numerous advantages owing to its smaller form factor and backwards compatibility with SATA disks. As a result, SAS drives are growing in popularity as an alternative to SCSI.
Fibre Channel allows high speed, serial duplex communications between storage systems and server hosts. Typically found on Storage Area Networks, Fiber Channel offers more flexibility than a SCSI bus with support for more physical disks, more connected servers, and longer cable lengths.
Solid State Disks
Used today primarily in laptops and SAN cache, Solid State Disks (SSD) are gaining momentum in the desktop and server space. As the name suggests, SSDs use Solid State Memory to persist data in contrast to rotating platters in a conventional Hard Disk. With no moving parts, SSDs are more robust and promise (near) zero seek time, high performance and low power consumption. They’re an exciting future prospect for SQL Server storage.