MySQL Database Architecture and Performance Optimization

Basic Architecture

Similar to a backend server, the database will pre-open threads and cache these threads. When a statement comes in, the database will start a thread to process it. First, it searches in the query cache, and there are corresponding caches in the parser as well. After parsing and optimization, it requests the storage engine through API.

Concurrency Control

Read-Write Locks

Shared Lock
Exclusive Lock
Read Lock
Write Lock

Table locks, row-level locks

Transactions

For query applications that do not require transactions, choosing non-transactional storage engines can achieve higher performance.

Isolation Levels

Dirty Read: Modifications within a transaction are visible to other transactions even if they have not been committed.

Repeatable Read

This level guarantees that multiple reads of the same record within the same transaction yield consistent results.

Deadlocks

Two or more transactions mutually occupying the same resource. Solution: Set lock wait timeout, rollback the transaction holding the fewest row-level exclusive locks.

AUTOCOMMIT

Default uses automatic commit; each query is treated as a transaction and executed with commit operations.

Storage Engines

MyISAM Storage Engine

Before version 5.1, it was the default storage engine, but it does not support transactions and row-level locks, and cannot be safely recovered after crashes. Uses table locks, which significantly affects performance. Tables that are not modified can use compressed tables to greatly save disk space. Stores tables separately in data files and index files. Read-only or mostly read-only tables can use this storage engine.

InnoDB

Uses filesystem directories and files to store database and table definitions. Uses clustered indexes, providing high performance for primary key queries. Uses predictive prefetching when reading from disk. Adaptive hash index accelerates read operations.

InnoDB VS MyISAM

The main difference between the two types is that InnoDB supports transaction processing, foreign keys, and row-level locks.

Normalization vs Denormalization

In normalized databases, each factual data appears only once, with no redundancy. In denormalized databases, information is redundant.

Advantages of normalization:

Fast update operations, modifications only need to change small amounts of data
Tables are usually smaller and can fit in memory
Less need for DISTINCT or GROUP BY statements Disadvantages: Requires joins, which may invalidate some indexes

Advantages of denormalization: No need to join tables, avoids random IO, even in worst cases requiring full table scans, it’s basically sequential IO. Disadvantages: Modifications are cumbersome, data redundancy exists.

Mixed Use of Normalization and Denormalization

Replication and caching are applications of denormalization, using triggers to update cached values. Using cache tables and summary tables for scheduled statistics is better than real-time queries, but must choose between real-time rebuilding or periodic rebuilding.

Use InnoDB for main tables, MyISAM for cache read-only tables to save space. Build additional indexes, add redundant columns.

Counter Tables

Updating counters may cause concurrency issues, so you can use rand for random writes to speed up concurrent updates, then SUM during reads.

Indexes

Benefits:

Reduce scanned data volume
Avoid sorting and temporary tables
Convert random IO to sequential IO If using an index for range sorting, another index cannot be used for sorting simultaneously.

Quick Index Creation

One technique is to disable indexes first, load data, then enable them. If disabling is not possible, deleting indexes achieves the same result.

B-Tree Index

Values and pointers are stored together. If an index contains multiple columns, the column order is very important. MySQL can only efficiently use the leftmost prefix columns of an index. Most current engines use B+ tree indexes.

B+ Tree Index

InnoDB uses B+ tree indexes. Differences from B-trees: Non-leaf nodes store only values, leaf nodes store values and row pointers, which reduces node size, allowing more nodes per page, meaning fewer disk I/O operations; leaf nodes are linked like a list, facilitating range queries; deletion operations are simpler; fast descent to find values.

Limitations:

Cannot use index if not starting from the leftmost column
Cannot skip columns in the index
If there’s a range query on a column, all columns to its right cannot use index optimization for lookups

Clustered Index

A data storage method used by InnoDB. All field values of data rows are actually stored in leaf pages, with data rows and adjacent key values placed together. It’s a B+ tree with index columns as the primary key, making data access faster. Data distribution: Index equals the entire table.

MyISAM Data Distribution

Uses B+ tree indexes, using primary key column values as indexes, with leaf nodes storing row pointers (physical addresses), then prefix compression of indexes allows more indexes to fit in memory.

Hash Index

Only supported by memory engine, calculates hash values for all indexed columns, with hash values as row pointers. Disadvantages: Cannot be used for sorting, queries must use all indexed columns to utilize the index, only supports equality comparisons. Long string indexes can alternatively use CRC32 as indexed columns, updated and maintained using triggers, with extremely high query efficiency.

Adaptive Hash Index: An index in InnoDB that creates a hash index in memory based on the B-tree index when the engine notices certain index values are being used very frequently, using hash values instead of keys to search in the B-tree.

LSM Tree MySQL Technical Internals Reading Notes