Relational databases, like Microsoft’s SQL Server, use locks to prevent multiple users from making conflicting modifications to a set of data. When a set of data is locked by a user, no other users can modify that same set of data until the first user finishes modifying the data and relinquishes the lock. There are exceptions, but let’s not go there. Some databases, SQL Server included, use locks to prevent users from seeing uncommitted modifications. In these systems if UserA is modifying some set of data, then UserB and all the rest of the users must wait until UserA is done modifying that data before they can even get a shot at reading the data, let alone modifying it. Databases place locks at all levels of their physical hierarchies: rows, pages (typically a few KB of rows), extents (typically a few pages), entire tables, and entire databases. Some databases (Oracle, others?) only use fine-grained row locks, others don’t do row locks at all and only allow rough-grained page, extent, table, and database locks. Most databases, SQL Server included, support row locking, but often use rough-grained locks. This is because lock management is a royal pain. Locks aren’t small or simple entities, so if you only do row-level locking, you can get yourself into a world of pain: a million-row update can easily swamp memory and be a bear to manage.
Lock Contention Described
Databases that don’t do just row-level locking often use a technique called lock escalation to achieve better performance. Unless its clear from the outset that a whole table will be modified, these databases start off using row locks, and they make plans to trade these locks in for rough-grained locks later if too many rows are modified. Unfortunately, lock escalation introduces and amplifies a whole new problem: deadlocks. If two users try to modify semantically-unrelated but physically-near data in two separate tables in reverse order, both users will start off with row locks, then try to upgrade them to page locks, and the situation will be that each user wants something the other user has, so they’re stuck. This is called a deadlock. For example:
UserA modifies some rows in TableA, causing a page lock affecting not just the rows UserA modified, but many others
UserB modifies some rows in TableB, causing a page lock affecting not just the rows UserA modified, but many others
UserA wants to modify some rows that UserB has locked (but not modified) in TableB
UserB wants to modify, or maybe just access, some rows that UserA has locked (but not modified) in TableA
Something’s gotta give. To deal with this problem, the database occasionally looks for deadlocks, and kills off one of the transactions so the other can finish. It usually kills the one that’s made the least modifications so that it minimizes the cost of rolling back changes. Databases that use only row-level locking almost never have this problem because two users rarely want to modify the exact same row, and even more rarely do they attain locks in the perfectly poor order needed to cause a deadlock. Also, databases like this use lock timeouts to prevent users from waiting too long for a lock. Query timeouts also factor in here. You can write code to retry queries that time out, but this only automates database congestion. Any timeout that is often reached will only serve to worsen the user experience. Things simply should not take that long. In practice and under high load, SQL Server’s locking system, which is based on lock escalation, does not perform well. Why? Lock contention. Lock contention is the problems of deadlocks and waiting for locks. In a system in which many users are modifying the database at once, and many more users are trying to access the database concurrently, the locks are flying, users spend a lot of time waiting to attain locks, deadlocks are frequent, and users are far from happy. Granted, if you’ve only got a few occasional users, you won’t have much trouble with SQL Server’s out-of-the-box behavior. You’ll be hard pressed to see these problems with simple in-the-office tests or deployments involving just a few users. But throw a couple hundred concurrent users at your database and a constant stream of INSERTS and UPDATES with quite a few DELETEs sprinkled in, and you’ll start reading Oracle literature and eyeing your war chest. However, I’ve got a solution for you that will only cost you a code review, a few minor tweaks, and a system test. You do have a system test procedure in place, right?