How and why to use locking with relational databases
- Your application lets users manage items. Each item has a long description. One of your users starts editing the description of item A at 10:00 and saves his changes at 10:02. Meanwhile, another user starts editing the same item’s description at 10:01 and saves it at 10:03. Because the original description that the second user started from did not yet contain the changes made by the first user, the changes made by the first user are lost without either user being notified about it.
- Your application communicates with the database through an ORM using the common approach where you retrieve the current version of an object, make some changes to it and then save the resulting object to the database. Some user or process changes the same object right after you retrieved it, so the object you are saving does not contain those changes. This leads to your application overwriting the changes with stale data.
- Your application allows linking items to a group, but only if the group has status “Active”. This restriction is not enforced at the database-level. Before linking an item to a group, you verify that the group has the correct status. However, just before you save the item, another user or process changes the status of the group to “Inactive”. Once you commit, the item is now linked to an inactive group.
- Perform most of the operation under assumption that no conflicting operations have occurred
- Just before saving/committing, verify that no conflicts have occurred and abort otherwise
In SQL, two approaches:
- Compare actual data to data you based yourself on when performing the operation
- Using a version number or timestamp that changes ever time the data changes
Can be used to retrieve object, make some changes to it and then save and verify that no other changes have been made in the meantime. Could also use same mechanism to verify that a certain object you needed simply didn't change.
- Flexibility: don't have to care about where or when the "base version" was retrieved. Could have been in different transaction, could have been 15 minutes ago when a user started editing, ...
- Deadlocks less likely and straightforward to prevent them by always saving object in same order (DB-level locks only acquired when saving)
- If conflict does occur, you need to deal with operation being aborted
- Retrying can make sense in some scenarios (e.g. ORM retrieves object, makes changes and immediately saves it). If the operation fails, we can retrieve the most recent version and then try making our changes on that one. Note: retry attempts should likely be limited, leading to the possibility for failure again!
- Retrying does not make sense in first scenario (users concurrently editing descriptions), because it would just let users unknowingly overwrite each other's changes again. Here, we need user input (e.g. manual merging of changes)!
- Assume conflicting operations will occur
- Actively block anything that might lead to a conflict
Works by actively locking database rows (or even entire tables) that you need in your operation.
Types of locks:
- Shared locks (read locks):
- use on data you just need to stay the same until your operation completes
- allow other shared locks on the same data
- Exclusive locks (write locks):
- use for reading and writing data you want to update
- do not allow any other lock on the same data before transaction is committed or rolled back
Attempts to obtain a lock when it's not allowed will typically block until the conflicting locks are released -> deadlock potential!
- Completely prevents conflicts from occurring!
- Could actually be best -performing strategy in high-concurrency environments.
- Misses the flexibility of optimistic locking: everything needs to happen inside same DB transaction
- Impractical for scenarios that need to wait for user input (and what if input never comes?)
- Can limit options for the way codebase is organized
- Could unnecessarily limit concurrency
- Special care needed to prevent deadlocks!
- Locks acquired at start of operation -> more locking
- Less flexibility regarding order in which locks are obtained (e.g. if you need to retrieve some data based on earlier-retrieved data)
- It helps to immediately require the most restrictive locking level you will need during the operation
Combining different strategies
It is possible for different locking strategies to exist at different levels in your application!
Example for scenario where users can edit item descriptions:
- Require users to "check out" item before description can be changed (pessimistic locking)
- At level of DB communication, we could still use optimistic locking to deal with conflicts caused by users concurrently trying to check out the same item
Optimistic locking implementation using SQL
(examples assume isolation level Read Committed, which is the default level in several popular relational databases and even the lowest available level for some)
- Version number is used to check for conflicts
- Application verifies number of updated rows (returned from DB), if that is 0 we know that a conflict has occurred and we can roll back the transaction
Note: DB-level locks are still acquired when performing the actual update! This can lead to DB deadlocks if you're not careful.
The problem: the second update statement in transaction A depends on transaction B being committed or rolled back, while the second update statement in transaction B depends on transaction A being committed or rolled back!
Solution: make sure to always lock rows in the same order (flexibility of optimistic locking makes this easy)
OUTPUT/RETURNING or a separate SELECT query
Application verifies returned version and rolls back transaction if necessary (we know there was a conflict if version has increased by more than 1 relative to the one we had).
For databases not supporting the OUTPUT or RETURNING clause, a separate SELECT statement can be used.
Note that, even without the RETURNING statement, the above query locks the row until the end of the transaction.
Pessimistic locking implementation using SQL
Pessimistic locking typically performed at row level, but similar locks can also be obtained for an entire table (if you are only interested in some specific rows, use row-level locking so you don’t unnecessarily limit concurrency)
Shared lock on row (held until commit/rollback of the entire transaction):
Exclusive lock on row (held until commit/rollback of the entire transaction):
Locking rows versus locking objects
A single object does not always correspond to a single row in the DB!
Example: order containing order lines, 1 table for orders and 1 for order lines, the row for an order contains some data based on data in order lines. What if users concurrently add order lines?
- Optimistic locking where order has version number that is increased whenever the order or its order lines are changed
- Pessimistic locking with lock on order row whenever we update order or its lines
- Pessimistic locking on order lines table also possible, but row-level locking does not prevent adding new rows!
Atomic updates and database-level restrictions
Structure data and application in such a way that all updates are made using atomic UPDATE queries that only update exactly the relevant data
- Could be useful for simple CRUD apps
- Tricky if value A depends on value B but they can be changed separately
- Tricky if relationships are involved
- Does not solve the problem of data being lost if multiple users concurrently edit the description for the same item!
Could also be useful when validity of update depends on status of related object , but only if checks for that are at the DB level. For example, let's say we can link items to a group, but only if the group has status “Active”. We can enforce this at the DB level if we link items to a separate table holding IDs for active groups. This way, if we have a transaction adding an item to a group and a concurrent transaction making that group inactive (thus removing it from the active groups table), the DB will not allow both transactions to succeed.
Higher transaction isolation levels
(higher than Read Committed)
- Lack of flexibility: For this to work, everything you do as part of an operation needs to happens inside the same database transaction (like pessimistic locking)
- Lose control over what exactly is locked and when
- Increases likelihood of deadlocks if database uses locking to implement transaction isolation
- Behavior varies widely between database vendors or even between different versions of the same database
- Probably not the best option for applications that need to support multiple databases
- Does not solve the problem of lost updates in the example with multiple users concurrently editing the same item’s description