MySQL Database Architecture and Performance Optimization
Basic Architecture
Similar to a backend server, the database will pre-open threads and cache these threads. When a statement comes in, the database will start a thread to process it. First, it searches in the query cache, and there are corresponding caches in the parser as well. After parsing and optimization, it requests the storage engine through API.
Concurrency Control
Read-Write Locks
- Shared Lock
- Exclusive Lock
- Read Lock
- Write Lock
Table locks, row-level locks
Transactions
For query applications that do not require transactions, choosing non-transactional storage engines can achieve higher performance.
Isolation Levels
Dirty Read: Modifications within a transaction are visible to other transactions even if they have not been committed.
Repeatable Read
This level guarantees that multiple reads of the same record within the same transaction yield consistent results.
Deadlocks
Two or more transactions mutually occupying the same resource. Solution: Set lock wait timeout, rollback the transaction holding the fewest row-level exclusive locks.
AUTOCOMMIT
Default uses automatic commit; each query is treated as a transaction and executed with commit operations.
Storage Engines
MyISAM Storage Engine
Before version 5.1, it was the default storage engine, but it does not support transactions and row-level locks, and cannot be safely recovered after crashes. Uses table locks, which significantly affects performance. Tables that are not modified can use compressed tables to greatly save disk space. Stores tables separately in data files and index files. Read-only or mostly read-only tables can use this storage engine.
InnoDB
Uses filesystem directories and files to store database and table definitions. Uses clustered indexes, providing high performance for primary key queries. Uses predictive prefetching when reading from disk. Adaptive hash index accelerates read operations.
InnoDB VS MyISAM
The main difference between the two types is that InnoDB supports transaction processing, foreign keys, and row-level locks.
Normalization vs Denormalization
In normalized databases, each factual data appears only once, with no redundancy. In denormalized databases, information is redundant.
Advantages of normalization:
- Fast update operations, modifications only need to change small amounts of data
- Tables are usually smaller and can fit in memory
- Less need for DISTINCT or GROUP BY statements Disadvantages: Requires joins, which may invalidate some indexes
Advantages of denormalization: No need to join tables, avoids random IO, even in worst cases requiring full table scans, it’s basically sequential IO. Disadvantages: Modifications are cumbersome, data redundancy exists.
Mixed Use of Normalization and Denormalization
Replication and caching are applications of denormalization, using triggers to update cached values. Using cache tables and summary tables for scheduled statistics is better than real-time queries, but must choose between real-time rebuilding or periodic rebuilding.
Use InnoDB for main tables, MyISAM for cache read-only tables to save space. Build additional indexes, add redundant columns.
Counter Tables
Updating counters may cause concurrency issues, so you can use rand for random writes to speed up concurrent updates, then SUM during reads.
Indexes
Benefits:
- Reduce scanned data volume
- Avoid sorting and temporary tables
- Convert random IO to sequential IO If using an index for range sorting, another index cannot be used for sorting simultaneously.
Quick Index Creation
One technique is to disable indexes first, load data, then enable them. If disabling is not possible, deleting indexes achieves the same result.
B-Tree Index
Values and pointers are stored together. If an index contains multiple columns, the column order is very important. MySQL can only efficiently use the leftmost prefix columns of an index. Most current engines use B+ tree indexes.
B+ Tree Index
InnoDB uses B+ tree indexes. Differences from B-trees: Non-leaf nodes store only values, leaf nodes store values and row pointers, which reduces node size, allowing more nodes per page, meaning fewer disk I/O operations; leaf nodes are linked like a list, facilitating range queries; deletion operations are simpler; fast descent to find values.
Limitations:
- Cannot use index if not starting from the leftmost column
- Cannot skip columns in the index
- If there’s a range query on a column, all columns to its right cannot use index optimization for lookups
Clustered Index
A data storage method used by InnoDB. All field values of data rows are actually stored in leaf pages, with data rows and adjacent key values placed together. It’s a B+ tree with index columns as the primary key, making data access faster. Data distribution: Index equals the entire table.
MyISAM Data Distribution
Uses B+ tree indexes, using primary key column values as indexes, with leaf nodes storing row pointers (physical addresses), then prefix compression of indexes allows more indexes to fit in memory.
Hash Index
Only supported by memory engine, calculates hash values for all indexed columns, with hash values as row pointers. Disadvantages: Cannot be used for sorting, queries must use all indexed columns to utilize the index, only supports equality comparisons. Long string indexes can alternatively use CRC32 as indexed columns, updated and maintained using triggers, with extremely high query efficiency.
Adaptive Hash Index: An index in InnoDB that creates a hash index in memory based on the B-tree index when the engine notices certain index values are being used very frequently, using hash values instead of keys to search in the B-tree.