Concurrency Control — The Heart Of Transactions

Ani
6 min readMar 26, 2022

“It is far easier to design a class to be thread-safe than to retrofit it for thread safety later.” ― Brian Goetz, Java Concurrency in Practice

How can you decide?

The Success Story of RDBMS

Handling transactions with ease is always the USP for RDBMS systems. We live in a world where every interaction is transactional in nature. A transaction is any operation is the single unit of work, which either completes fully or does not complete at all, and leaves the storage system in a consistent state. The best example of a transaction is when you withdraw money from your bank account. Either the money has left your bank account, or it has not — there cannot be an in-between state.

ACID is an acronym that stands for atomicity, consistency, isolation, and durability. Together, these ACID properties ensure that a set of database operations (grouped together in a transaction) leave the database in a valid state even in the event of unexpected errors.

Atomicity

Atomicity guarantees that all of the commands that make up a transaction are treated as a single unit and either succeed or fail together.

Consistency

Consistency guarantees that changes made within a transaction are consistent with database constraints.

Isolation

Isolation guarantees that concurrent transactions do not affect each other’s outcomes.

Durability

Durability guarantees that, once the database has told the client it has written the data, the data has in fact been written to a backing store. The data will persist even in the case of a system failure.

The Theory

In data world, any two simultaneous transactions on an object is called concurrent operation(teaching english). It looks so nice when we get table lock warning or transaction waiting message while doing writes to table. Even though it looks so simple yet there are tons of things happen in the background.

optim.jpg

Concurrency control

In information technology and computer science, especially in the fields of computer programming, operating systems, multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible.

word-image8.jpg

When components that operate concurrently interact by messaging or by sharing accessed data (in memory or storage), a certain component’s consistency may be violated by another component. The general area of concurrency control provides rules, methods, design methodologies, and theories to maintain the consistency of components operating concurrently while interacting, and thus the consistency and correctness of the whole system. Introducing concurrency control into a system means applying operation constraints which typically result in some performance reduction. Operation consistency and correctness should be achieved with as good as possible efficiency, without reducing performance below reasonable levels.

Why is concurrency control needed?

If transactions are executed serially, i.e., sequentially with no overlap in time, no transaction concurrency exists. However, if concurrent transactions with interleaving operations are allowed in an uncontrolled manner, some unexpected, undesirable results may occur, such as:

  1. Lost update : A second transaction writes a second value of a data-item (datum) on top of a first value written by a first concurrent transaction, and the first value is lost to other transactions running concurrently which need, by their precedence, to read the first value. The transactions that have read the wrong value end with incorrect results.
  2. Dirty read : Transactions read a value written by a transaction that has been later aborted. This value disappears from the database upon abort, and should not have been read by any transaction (“dirty read”). The reading transactions end with incorrect results.
  3. Incorrect summary : While one transaction takes a summary over the values of all the instances of a repeated data-item, a second transaction updates some instances of that data-item. The resulting summary does not reflect a correct result for any (usually needed for correctness) precedence order between the two transactions (if one is executed before the other), but rather some random result, depending on the timing of the updates, and whether certain update results have been included in the summary or not.

Most high-performance transactional systems need to run transactions concurrently to meet their performance requirements. Thus, without concurrency control such systems can neither provide correct results nor maintain their databases consistently.

The Back Bone

The glass is either half full or half empty.

concurrency.jpg

Optimistic concurrency control

Optimistic concurrency control (OCC), also known as optimistic locking, is a concurrency control method applied to transactional systems such as relational database management systems and software transactional memory. OCC assumes that multiple transactions can frequently complete without interfering with each other. While running, transactions use data resources without acquiring locks on those resources. Before committing, each transaction verifies that no other transaction has modified the data it has read. If the check reveals conflicting modifications, the committing transaction rolls back and can be restarted. Optimistic concurrency control was first proposed by H. T. Kung and John T. Robinson.

OCC is generally used in environments with low data contention. When conflicts are rare, transactions can complete without the expense of managing locks and without having transactions wait for other transactions’ locks to clear, leading to higher throughput than other concurrency control methods. However, if contention for data resources is frequent, the cost of repeatedly restarting transactions hurts performance significantly, in which case other concurrency control methods may be better suited. However, locking-based (“pessimistic”) methods also can deliver poor performance because locking can drastically limit effective concurrency even when deadlocks are avoided.

Pessimistic concurrency control

Pessimistic concurrency control (or pessimistic locking) is called “pessimistic” because the system assumes the worst — it assumes that two or more users will want to update the same record at the same time, and then prevents that possibility by locking the record, no matter how unlikely conflicts actually are.
The locks are placed as soon as any piece of the row is accessed, making it impossible for two or more users to update the row at the same time. Depending on the lock mode (shared, exclusive, or update), other users might be able to read the data even though a lock has been placed.

Lake House : The Next Big Thing

The monopoly of data warehouses in this world is coming to an end with the rise of Deltalake and Iceberg lakehouse because of their ACID capabilities and data engineers throughout the world are evangelising it.

To read more about Iceberg :

Apache Iceberg : A Primer

Table format : Wasn’t a file format enough?

Table Evolution in Apache Iceberg

For any type of help regarding career counselling, resume building, discussing designs or know more about latest data engineering trends and technologies reach out to me at anigos.

P.S : I don’t charge money

--

--

Ani

Big Data Architect — Passionate about designing robust distributed systems