One of the more mysterious features of SQL Server is isolation levels. Whenever a statement is executed, or a data modification is made, it runs under the influence of an isolation level. The isolation level directly impacts the performance of SQL Server, along with the integrity of the data. So what exactly is this mysterious feature all about?
Essentially, an isolation level determines the degree to which a data modification is isolated from another data modification, or a database modification is isolated from a statement (such as a SELECT statement), or how a statement is isolated from another statement. Let’s look at some examples of this.
First, let’s say that User A wants to update row number 25 in a particular table. Let’s also say that User B also wants to update row number 25 in the same table at the same time. This presents a problem, as we don’t want a situation where different users are trying to modify the same row at the same time. This is an example where we want to isolate one data modification from another data modification in some controlled manner in order to prevent a situation where there is bad data stored in the row. Storing bad data is always a bad thing.
Second, let’s say that User C wants to update row number 50 in a particular table. Let’s also say that User D wants to run a SELECT statement against row number 50 in the same table at the same time. So do you allow the SELECT to run against the row while the modification is taking place, or not? This is an example of where we might or might not want to isolate a data modification from a SQL Server statement in order to prevent the possibility of returning bad data. For example, let’s say that User C has started a transaction that has updated the values stored in the columns in row 50, but that the transaction has not yet completed. Now, User D wants to run a SELECT against row 50 in order to return the columns in row 50, and does so, returning the values that were just modified by the on-going transaction. But what if User C’s transaction is now rolled back and the original data are reverted to in row 50? This now means that the SELECT statement run by User D has returned bad data, as the data returned is not the actual data currently stored in row 50. In some cases, returning bad data like this is a very bad thing. But in some cases, returning some occasional bad data like this is not much of problem (or is so unlikely that the odds of it happening are slim).
In other words, isolation levels allow us to control exactly the way SQL Server statements and transactions can work so as to minimize or eliminate bad data. In some cases, a little bad data is OK, but in other cases, there can never be any bad data.
More specifically, isolation levels are used to control the following:
- If any locks are taken when data is read, and what kind of locks are being requested.
- How long read locks are kept.
- And last, they determine if a read operation acting on a row that is experiencing a data modification, (1) blocks the data modification until the row is unlocked, (2) retrieves the committed version of the row before the start of the data modification, or (3) reads the uncommitted data modified by the data modification.
Traditionally, SQL Server has supported four isolation levels:
- Read Uncommitted: This is the lowest isolation level. It only isolates transactions and activities to ensure that physically corrupt data is never read. It allows dirty reads, nonrepeatable reads, and phantom reads.
- Read Committed: This isolation level does not permit dirty reads, but does allow nonrepeatable reads and phantom reads. This is the default isolation level for SQL Server, and is used for each connection to SQL Server unless one of the other isolation levels has manually been set for a connection.
- Repeatable Read: This isolation level does not permit dirty reads or nonrepeatable reads, but does allow phantom reads.
- Serializable Read: This is the highest isolation level and ensures that all transactions and statements are completely isolated from each other. It does not allow dirty reads, nonrepeatable reads, or phantom reads.
When we look at the above, there is a relationship between data integrity and concurrency. For example, Read Uncommitted provides the best concurrency, but the least data integrity. On the other extreme, Serializable Read provides the worst concurrency, but the highest level of data integrity. As you know, the higher the concurrency, the better SQL Server performs, and the lower the concurrency, the worse SQL Server performs. As a DBA, it is your job to balance the needs between data integrity and concurrency by selecting an appropriate isolation level. Of course, in the real world, most DBAs or developers don’t bother setting a specific isolation level for a connection, so the default isolation level of Read Committed is most commonly used.
In SQL Server 2005, two new isolation levels are introduced, both of which use row versioning. They include:
- READ_COMMITTED_SNAPSHOT (statement level)
- ALLOW_SNAPSHOT_ISOLATION (transaction level)
The purpose of these new isolation levels is to give DBAs or developers more granularity in their choice of isolation levels, with the goal of improving read concurrency. When read concurrency is increased, there are fewer locks to be managed by SQL Server. This results in less locking resources required by SQL Server, along with less blocking. This in turn boosts SQL Server’s performance.
Before we talk more about these two new isolation levels, we first must introduce the concept of row versioning, which is an integral part of how these two isolation levels are implemented. Row versioning is not used by the four older isolation options.
The concept of row versioning is not new to SQL Server, as SQL Server has been using it for years with triggers. For example, when a DELETE trigger is executed for a row, a copy of that row is stored in the “deleted table” just in case the trigger is rolled back and the deleted row needs to be “undeleted.” In a sense, the row is versioned, and if need be, can be reused.
Row versioning for isolation levels is very similar, though not identical to row versioning for triggers. When a row versioning-based isolation level (which includes the two new ones we are now discussing) is enabled at the database level, the database engine maintains versions of each row that is modified (for an entire database). Whenever a transaction modifies any row, an image of the row before the modification is copied into a page of what is called the version store. The version store is located in the tempdb database and is used for temporary storage of versioned rows for all of the databases on a single SQL Server instance.
When a transaction or SQL Server statement specifies the use of a row versioning isolation level, it views the data as it existed at the start of the statement or transaction, instead of protecting all reads with locks. Using row versioning reduces the chance that a read operation will block other transactions, boosting performance. Row versioning also prevents users from reading uncommitted data and prevents multiple users from attempting to change the same data at the same time.
Now that you know a little bit about row versioning, let’s look at each of the new isolation levels, one at a time.
READ_COMMITTED_SNAPSHOT is actually a variation of the default READ_COMMITTED isolation level. It uses row versioning, instead of locking, to provide read consistency at the SQL Server statement level. When a statement runs that specifies the READ_COMMITTED isolation level (the default isolation level), and the READ_COMMITTED_SNAPSHOT option is turned on at the database level, all statements see a snapshot of the data as it existed at the start of any current transaction. It uses the row-versioned snapshot of the row to return data, and no locking is needed by the statement, which is normally the case. The biggest benefit of this isolation level is that reads do not block writes and writes do not block reads. Writes can still block writes, but this is necessary to prevent data corruption.
Let’s look at an example of this. First, we assume that READ_COMMITTED_SNAPSHOT is turned on for the database. I’ll show you later how to do this. Second, let’s assume that User A has started a transaction that will modify row 25 of a table. When the transaction starts, the database engine takes a snapshot of row 25 and stores it in the data store. Third, let’s assume that User A’s transaction is still running and that User B runs a SELECT statement to return the columns of row 25. Assuming that User B’s SELECT statement is running under the READ_COMMITTED isolation level (the default), then the SELECT statement will return the snapshot of row 25 that was taken before User A’s transaction began. The SELECT statement does not need to place any locks on the row, nor does it have to wait until User A’s transaction is complete.
READ_COMMITTED_SNAPSHOT works well with all databases because when turned on at the database level, any connection that is made to SQL Server that does not specify a specific isolation level uses the default READ_COMITTED isolation level, and will automatically take advantage of row versioning.