- Danny Mican
Postgres offers a default transaction isolation level of READ COMMITTED. Even with transactions read committed can result in logical race conditions. Read-modify-write queries are subject to this, even if they leverage transactions. This post illustrates what READ COMMITTED isolation is, how it can result in concurrency errors during read-modify-write queries and then shows how to use SELECT FOR UPDATE to synchronize access to a sql result, thereby removing the possibility of race conditions.
Logical Race Condition
Logical race conditions are errors that produce logically incorrect results, but no overt programming error. Logical race conditions are triggered when multiple operations are happening concurrently. Logical race conditions are common in sql based web systems. Multiple web requests can occur simultaneously, each triggering a corresponding database request. The concurrent web requests can trigger database to access the same resources at the same time.
To illustrate this concept consider the simplified case of a sql based counting system. The following describes the table for this counting system:
create table counts ( counter integer not null );
postgres=# \d counts Table "public.counts" Column | Type | Collation | Nullable | Default ---------+---------+-----------+----------+--------- counter | integer | | not null |
The table has a single field
counter keeps track of the total number of web requests served. The counter starts at 0 and each reqquest increments it.
Imagine we have 2 separate http handlers, and each handler reads the value, increments the value and then inserts the new value:
START; curr_count = SELECT counter FROM counts; // READ new_count = curr_count + 1 // MODIFY INSERT INTO counts ($1), new_count; // WRITE COMMIT;
The pattern of
SELECT, modifying and writing is referred to as read-modify-write. read-modify-write operations are extremely common, and frequently results in logical race conditions. A resource is pulled from a database, a property is updated, and then that information is written back to the database.
The following diagram illustrates the logical race condition the counter example produces:
The diagram is read from the top to the bottom. It includes two workers, W1 and W2. W1 reads the initial
counter value of 0, increments it and writes it back as 1. While W1 is modifying the value, W2 also reads the
counter, currently at 0. W1 writes the incremented counter back to postgres, but W2 over writes that value. Two requests too place but the
counter value is only 1, resulting in a logical race condition. Even the use of transactions in W1 and W2 will not prevent the race condition. Transactions do not protect the writers because the default isolation level is READ COMMITTED.
Read committed isolation level means each reader will only read committed data. This means inflight data does not show up. Read committed is a relatively strict isolation level. It is a good balance between performance and correctness. A lower guarantee would allow transactions to see data that was not yet committed. If a transaction reads uncommitted data, performs a computation on that data, then updates a value, then the original transaction rolls back, it could result in incorrect data. SERIALIZABLE is a stricter form of isolation but incurs huge performance overhead since it forces transactions to never overlap.
SELECT FOR UPDATE synchronization
Adding FOR UPDATE to a query "locks" access to the resources returned by the select query, ensuring that only a single transaction can modify the resources at a single time.
START; curr_count = SELECT counter FROM counts FOR UPDATE; // READ counter but locks rows from concurrent modification new_count = curr_count + 1 // MODIFY INSERT INTO counts ($1), new_count; // WRITE COMMIT;
SELECT FOR UPDATE issues a lock on the rows returned by
counter. This blocks other accesses until the initial transaction is complete, ensuring correctness and removing the logical race condition!
The Price of Correctness
SELECT FOR UPDATE ensures correctness by limiting access to certain rows. This has serious performance implications in a high throughput system with 100's or 1000's of concurrent requests. Each request must wait for all requests in front of it to complete. Latency is often a good tradeoff for correctness in the majority of use cases since most assume correctness. Imagine if a banking app couldn't guarantee correctness! It's often good to start with correctness using these locking primitives until a performance issue is encountered, and once it is an alternative strategy can be found!
SELECT FOR UPDATE is one of many strategies to remove logical race conditions. Always be careful and watch out for read-modify-write loops....transactions are not enough!!