Concurrency Control and Locking

All transactions in a database are executed serially, which can easily become a performance bottleneck for the entire application. Although nodes that cannot be horizontally scaled will eventually become bottlenecks, databases that execute transactions serially accelerate this process; concurrency makes everything possible and can solve certain performance issues.

Pessimistic Concurrency Control

Database programs hold a pessimistic attitude toward data modification, locking data during processing to resolve contention issues. Before modifying any record, an exclusive lock is first attempted on that record.

select status from t_goods where id=1 for update;

Locks in databases are designed with two modes: shared locks and exclusive locks.

Obtaining a shared lock allows read-only access
Obtaining an exclusive lock allows both read and write operations
Transactions without locks fall into waiting states

Based on Timestamps

Every transaction has a globally unique timestamp, which can use system clock time or counters, as long as all timestamps are unique and increment over time.

Each data item has two timestamps: read timestamp and write timestamp, representing the transaction timestamps of successfully executed corresponding operations.

This protocol ensures that all conflicting read-write operations are executed serially according to timestamp size. During execution, only the timestamp values of data items need to be considered without concerning other transactions.

Both read and write operations compare read-write timestamp values sequentially from left to right. If less than current value, they are directly rejected and rolled back. The database system assigns a new timestamp to rolled-back transactions and re-executes them.

Two-Phase Locking Protocol (2PL)

During the growing phase, a transaction can acquire locks but cannot release them; during the shrinking phase, transactions can only release locks and cannot acquire new ones.

Strict 2PL: Exclusive locks held by transactions must be released after commit; Rigorous 2PL: All locks held by transactions must be released after commit;

Deadlock resolution:

Deadlock prevention using directed acyclic graphs
Use preemption plus transaction rollback to prevent deadlocks Transactions obtain a timestamp when starting

Optimistic Concurrency Control

As an optimistic concurrency control mechanism, it assumes all transactions will ultimately pass validation and execute successfully. During the read phase, the database executes all read and write operations in the transaction, storing all post-write values in temporary variables without actually updating database content. At this point, the next stage begins, where the database program checks if current modifications are valid, returning errors if invalid.

Optimistic locking assumes data generally won’t cause conflicts, so conflict detection occurs only when data is submitted for updates. If conflicts are detected, error information is returned to users for decision-making. When using version numbers, a version number can be specified during data initialization. Each data update operation increments the version number by +1 and checks if the current version number is the latest version number for that data.

Multi-Version Concurrency Control (MVCC)

Read-heavy, write-light scenarios mean every write operation creates a new version of data, while read operations select the most appropriate result from a limited number of versions and return directly. At this point, read-write operation conflicts no longer need attention, while managing and quickly selecting data versions becomes the main issue MVCC needs to solve.

MySQL MVCC

Each version of data rows has a unique timestamp. When read transaction requests occur, the database program directly returns the data item with the maximum timestamp from multiple versions.

Update operations are slightly more complex. Transactions first read the latest version of data to calculate update results, then create a new version of data. The new data’s timestamp is the current data row’s maximum version + 1. Combining pessimistic locks with MVCC

PostgreSQL MVCC

Multi-version timestamp ordering protocol All transactions are assigned unique timestamps before execution. Each data item has two timestamps for read and write operations. When PostgreSQL transactions issue read requests, the database directly returns the latest version of data without being blocked by any operations. Write operations require the transaction timestamp to be greater than or equal to the data row’s read timestamp, otherwise they will be rolled back.

Dirty read: When one transaction is accessing data and modifies it, and this modification hasn’t been committed to the database yet, another transaction accesses this data and uses it.

Non-repeatable read: Under the same conditions, data you’ve read previously returns different values when read again.

Phantom read: A phenomenon that occurs when transactions are not executed independently.

Transaction Isolation Levels

DEFAULT uses the database’s configured isolation level (default), determined by DBA settings to decide the isolation level READ_UNCOMMITTED query statements don’t add locks and may read uncommitted rows Will experience dirty reads, non-repeatable reads, phantom reads (lowest isolation level, high concurrent performance)

READ_COMMITTED only adds record locks to records, not gap locks between records, so allowing new record insertion near locked records means different results may be obtained when using query statements multiple times Will experience non-repeatable reads and phantom read issues (locks currently read rows)

REPEATABLE_READ may experience phantom reads (locks all read rows) SERIALIZABLE guarantees no situations will occur (table locking)

Big Data Special Topic Deep Dive into InfluxDB - Storage Engine