Consistent Hashing Algorithm

Consistent Hashing Algorithm

To address the problem of cache invalidation caused by server changes or various reasons in clusters, we use a hash ring to distribute incoming key values around a ring ranging from 0 to 2^32 by taking modulo 2^32. The unique identifiers of servers (IP addresses) are distributed across the entire ring, which divides the ring into many intervals.

When querying, the hash value of the data key is calculated to determine its position on the hash ring, then it traverses sequentially to find the corresponding server. If a server fails, we only need to set another replica server onto the traversal path, positioned counterclockwise before the failed server.

To avoid long traversal distances for query keys (data skew problem), virtual nodes can be set for the same physical node to reduce traversal steps.