A Practical Path Towards the Ultimate Microsoft SQL Server Scalability and Availability
It has become a moot point for database vendors to compete for the highest transaction processing capability in the world. Modern database systems have become so well programmed; even the cheapest system can claim very high performance, given the hardware.
The next challenge is high availability. It is interesting to observe that database vendors are much less aggressive in their design and advertising campaigns. In practice, in addition to apologizing for customer complaints caused by SQL Server down times, we still need to answer calls when the SQL Server performance monitor shows high CPU usage and the website is sluggish. To administrators, these two issues are somehow related.
Addressing both problems does seem overwhelming. Since there is no lack of potential beneficiaries, many vendors offer database high availability solutions. Looking closely though you will find the only deliverable is a costly warm-standby.
While warm is better than cold, the value of a standby SQL Server is only realized at disaster times – an expense justifiable perhaps only to deep pocket customers. For any “hot standby” SQL offers, if real time updates are not allowed, it is simply not hot enough.
This is the state of the art for Microsoft SQL Server 2000 scalability and availability.
A careful look reveals that these two problems are actually related to a single question:
Can we or can we not replicate concurrent transactions synchronously?
If the answer is yes, then both problems can be resolved. Otherwise, we might as well quit trying.
The trouble is that this issue has been thought over and debated for many years without a clear answer. Like other unsolved mysteries, it has been tossed to an obscure corner of academic research.
Achieving SQL Server Scalability and Availability
A single SQL Server was designed to scale only within the running hardware’s capabilities. Beefing up the hardware is often called scaling up (re. Doug Kerwin’s online article at http://www.sql-server-performance.com/dk_massive_scalability.asp) that helps with limited scalability, but not at all for availability.
Clustering multiple SQL Servers (scale out) in order to distribute the workload on a single SQL Server is a powerful idea that can potentially achieve both high scalability and high availability. The success, however, hinges on the answer to the earlier question.
If it is possible for all updates to be committed on all servers at exactly the same time and in exactly the same order, then it is possible to distribute the read-only loads to achieve high scalability, if all these can be done very fast.
To see the difficulties, assuming the initial identity value equals zero(0) for the table Customers, two new arriving clients will run the following queries:
Insert into Customers values ( ‘C1’, NewID)
Insert into Customers values ( ‘C2’, NewID)
After replicating to two SQL Servers, say SQL1 and SQL2, it is then possible to have the follow results: SQL1’s Customers table can .have C1’s NewID=1, and C2’s NewID=2. But SQL2’s Customers table will have C1’s NewID=2 and C2’s NewID=1. This is because SQL Servers cannot guarantee the execution order of concurrent queries. Experiments show that approximately one in every 1000 tests will result in such inconsistent data. This is not acceptable. Similarly, deadlocks can occur in one server, but not the other if concurrent transactions are as follows:
These are only two instances of the notorious Data Consistency Problem (DCP) when replicating concurrent transactions.
There has been no lack of attempts for providing scalability and availability. As we will see, almost all attempts try to avoid DCP.