How to Attain SQL Server High Availability at Minimal Cost

You may find it hard to believe, but it is possible to maintain SQL Server high availability and not spend a fortune. In fact, you can spend a fortune and still not get SQL Server high availability. Sound like a paradox? Not really. SQL Server high availability is not a direct function of how much you spend. Instead, high availability is more about what you do right and what you do wrong.

Before we get too far along in our discussion on how to achieve high availability at minimal cost, let’s first take a brief look at what high availability really is. Let’s assume we live in a perfect world where nothing ever goes wrong. In this case, our SQL Servers would be available 100% of the time they are needed by users. This might be 24/7, or 8/5, depending on your user base and how your SQL Servers are used. In other words, there is a time frame where SQL Server needs to be up and running efficiently, and as long as they are up and running efficiently during that time frame, then you have 100% availability.

But what does “up” mean? Does it mean that the SQL Server service is running, even if it is running so slowly that users can’t get their work done on a timely basis? Or does it mean that the SQL Server service is running and that users can access it on a timely basis? I think most DBAs would assume the second option. If users can’t access SQL Server on a timely basis, then SQL Server is not really “up”.

Now, let’s get back to reality. Stuff happens. Things break. People make mistakes. It is impossible to attain 100% SQL Server availability. Because of this, it is our goal, as a DBA, to attain as high as a level of availability as we can, given our limited resources. This may mean 99.999%, or it may mean 90%. Each organization is different and has its own standard of what high availability means to them. Sometimes those resources include redundant data centers, SQL Server clusters, or even disk mirroring. In other cases, our resources are very limited. In any event, we must make the best use of what we have available.

What Prevents SQL Server High Availability?

Before we can learn what steps we can take to help ensure high availability, we need to understand what can go wrong with our SQL Servers. Once we understand this, then we can prescribe what we need to do to prevent them from going down in the first place. While the following list is long, it is not comprehensive. It would take a book to cover every potential thing that can prevent high availability. The focus here is on key factors that can negatively affect high availability.

  • Outside Forces

  • Server Hardware

  • Network Infrastructure

  • Electrical Power

  • Scheduled Hardware/Software Upgrades/Patches

  • Operating System

  • SQL Server

  • Human Errors

  • Application Software

  • Poor Tuning

  • Database Maintenance

  • Database Jobs

  • Third-Party Software

  • Poor Documentation

Let’s take a look at each one of these potential problem areas, examining what can go wrong. Note that potential causes of SQL Server unavailability are numerous, and include much more than just hardware failure. Anything that prevents users from accessing SQL Server when they need to is a potential problem that needs to be addressed.

Outside Forces

  • Fire

  • Flood

  • Earthquake

  • Tornado/Hurricane

  • Chemical spill

  • Terrorists

  • Labor strike

  • Hackers

  • Viruses

  • Etc.

When people think of major disasters, this is what most think of. Fortunately, most outside forces rarely affect us, but when they do, they usually affect us with great impact. Most of these are very difficult to predict. On the other hand, such outside forces as hackers or viruses are a constant threat, and must be assumed to be occurring all the time.

Server Hardware

  • CPU fails

  • Memory fails

  • Disks/Interface card fails

  • Network card fails

  • Power supply fails

  • System board fails

  • Hardware driver fails

  • Etc.

When DBAs and managers think about high availability, this category seems to come out on top. This is probably because everyone knows that all physical hardware fails, eventually. While this is a very important category to keep in mind, it is only one small cause of SQL Server availability problems. There are others that are more common than these.

Continues…

Leave a comment

Your email address will not be published.