CRDT stands for "Conflict-free Replicated Data Type". The past few days I've seen so many people get the abbreviation wrong, and a whole bunch of mentions about it by friends and colleagues. Neither of which is surprising, because one of the projects in The Innovation Lab works with CRDTs and their implementation in Valkey. And every mention of CRDT reminds me of this unfinished post that I began to write back from 27 May. So let's get on with it.
Why CRDTs?¶
To answer why we need CRDTs, we first need to look at the problem CRDTs are trying to solve. A lot of the data today is stored in distributed systems, primarily to ensure fault tolerance, better performance and scalability. The problem then changes from performance and scalability to having Consistency, Availability and Partition Tolerance. These metrics are governed by the CAP Theorem, but that's digressing from the point (and we will get back to this).
The CAP Theorem
It states that distributed data systems can simultaneously guarantee at most two of three properties: Consistency, Availability, and Partition tolerance- Consistency: All requests see the most recent version of the data (or absolutely nothing)
- Availability: Every request receives a response, without the guarantee that it contains the latest version of data
- Partition tolerance: The system continues to operate even if some of the requests are dropped, partly due to some of the nodes being unavailable
Relying on this theorem, we can try to guarantee any 2 of these properties. Disregarding Partition Tolerance for the other 2 basically reverts us to a single-node system, which is, by logic, not distributed (oof). Hence while designing these systems, engineers must decide between Consistency and Availability.
This comes with a caveat: An AP system can have an inherent property of being eventually consistent.When the same data is distributed across multiple nodes, conflicts can occur when you try to ensure consistency of data among the nodes; especially when multiple nodes have divergent changes from each other. Resolving these conflicts requires a lot of tedious effort if done manually. In fact, you could liken it to merge conflicts arising from version control using git itself!
A lot of distributed computing focuses on preventing concurrent updates to shared data across multiple nodes, but another approach that can be taken is called "Optimistic Replication". In this approach, all the concurrent updates are let through, and the inconsistencies that are created are resolved later. Eventual Consistency comes from merging different replicas of data. While this method would otherwise fail because of the conflicts, CRDTs can seemingly be a fix to this problem (and thus bringing Eventual Conisistency).
How do CRDTs not have conflicts?¶
The name for CRDTs is a little misleading, primarily because CRDTs have a built in conflict resolution method that is followed whenever a conflict arises, so while there are conflicts, they are resolved using a predetermined method.
There are 2 different methods that can be used to make a CRDT:-
- State-based CRDTs
- Operation-based CRDTs
State based CRDTs¶
State based CRDT is also called as Convergent Replicated Data Type (CvRDTs). These contain 5 main components (of sorts, for the lack of a better word):-
- A local state (Data Structure)
- Methods (to modify the state)
- Constructor (to initialise the state)
- Merge function (to resolve conflicts)
- A function to apply operations (like a write handler)
When a node has it's local state updated, it applies the changes to its local state and sends the entire state to the other nodes. The nodes recieving this state would then merge it with their own local state using the merge function (represented by $\sqcup$).
However, the merge function must guarantee convergence, even without having a central state/lock. In order to do so, the merge function must form a Bounded Join-Semilattice, and that means the merge must satisfy 3 properties:-
- Commutative ($A \sqcup B = B \sqcup A$): The order in which a replica receives states doesn't matter.
- Associative ($(A \sqcup B) \sqcup C = A \sqcup (B \sqcup C)$): Order in which multiple merges occur doesn't matter.
- Idempotent ($A \sqcup A = A$): Merging the exact same state doesn't change the state.
This also comes with the added caveat that these state changes must be monotonically increasing; the state can only progress forward according to a partial order defined by the lattice.
Bounded Join-Semilattice
A bounded join-semilattice is a partially ordered set where every finite subset has a least upper bound (supremum) and the set contains an absolute minimum element ($\perp$).Formally, it is a mathematical structure $(L, \sqcup, \perp)$ consisting of:
- $L$ (Carrier Set): A non-empty set.
- $\sqcup$ (Join): A binary operation (least upper bound) that is associative ($a \sqcup (b \sqcup c) = (a \sqcup b) \sqcup c$), commutative ($a \sqcup b = b \sqcup a$), and idempotent ($a \sqcup a = a$).
- $\perp$ (Bottom Element): The least element acting as the identity ($a \sqcup \perp = a$).
CvRDTs provide incredible fault tolerance and offline-first capabilities, but they have their own set of advantages and disadvantages:-
Advantages¶
- Network Agnostic: Because of idempotency and commutativity, messages can be dropped, duplicated, or reordered by the network.
- Low Coordination: Nodes don't have to lock resources or wait for a consensus round (like Raft or Paxos) before accepting a local write.
Disadvantages¶
- High Network Overhead: Sending the entire state across the wire can quickly become highly inefficient as the data structure grows.
- Memory Usage: Deleted data leave behind markers (called as "tombstones") to ensure that the deletion propagates to other nodes. This means that memory usage grows over time and requires complex pruning strategies.
Operation-based CRDTs¶
To mitigate the high network overhead, Operation-based CRDTs (aka Commutative Replicated Data Type (CmRDTs)) only send the operations performed on the local state, albeit these data types do not come with a merge function, but instead rely on the operations to be Commutative, and follow a Causal Delivery Order. Let's break it down to better understand how this works:-
- When an action is recieved on a node, it is first intercepted, then packaged with some metadata using the source function, containing a unique id, timestamp or causal vector clocks (or a combination of these).
- The action is then applied in the node, and then sent to other nodes.
- The other nodes recieve the packaged action, and call the downstream function (that which applies the operation on the local state), and the change is applied only based on the metadata.
Thus, the main components of a CmRDT are:-
- The local state (Data Structure)
- The Source function (which is used to package the action with metadata)
- The Downstream function (which is used to apply the action on the local state)
- The initial state (using a Constructor)
- Causal Delivery Constraint (to ensure that causally related operations are applied in the order they occurred)
One very important thing to note is that CmRDTs assume that the underlying network will only send the packet once (and successfully, all the time), and must also follow the Causal order. This means that CmRDTs push the requirement of Idempotency and Associativity to the network itself. This can raise some issues, as networks are not fault tolerant 100% of the time.
Summarising, the advantages and disadvantages of a CmRDT are:-
Advantages¶
- Network Efficiency: Tiny, constant-size mutation payloads ($O(1)$ space)
- High Performance: Replicas execute direct local updates instead of complex merging/diff-ing algorithms.
- Exact Operations: Preserves the exact operation as is when recieved and logs it.
Disadvantages¶
- Network Reliance: Relies on the network layer to guarantee causal and deduplicated message delivery to prevent permanent state divergence.
- Complex Re-syncing: Nodes that go offline for long periods must replay a massive backlog of individual operations sequentially.
- Metadata Storage: Requires storing extra metadata (like vector clocks, message IDs, or deletion timestamps) alongside the data to track history and handle deduplication.
Some known examples of CRDTs¶
Often, a lot of CRDTs end up being a combination/slight modification of simpler CRDTs. Here's some I found when reading up:-
Grow only counter (G-Counter)¶
The G-Counter is a state based CRDT that implements a monotonic counter among n nodes. Each node contains an array of n elements, each initialised to 0 at the start. Since each node has it's own index in every node, there are no write conflicts. When a node wants to increment the counter, it simply increments its own element in the array. When two nodes merge their states, they take the element-wise maximum of the two arrays. In order to get the current value of the counter, we simply sum up all the elements in the array.
$$ \begin{aligned} node_1 &= [1,0,0] \\ node_2 &= [0,0,1] \\ node_3 &= [0,0,0] \end{aligned} $$ $$ \begin{aligned} node_1 \sqcup node_2 \sqcup node_3 &= [1,0,1] = 2 \\ i_0 &= max(node_1[0],node_2[0],node_3[0]) \\ i_1 &= max(node_1[1],node_2[1],node_3[1]) \\ i_2 &= max(node_1[2],node_2[2],node_3[2]) \end{aligned} $$ $$ Counter = \sum_{i=0}^{n-1} i_n $$
Note that since there is no centralised node where the updates happen, the changes are reflected through a Gossip Protocol.
P-N Counter (Positive-Negative Counter)¶
Say you wanted to decrement the values too. The first problem with using the G-Counter as is, is that the max() function will just ignore the lower value. A better solution would be to have 2 monotonic counters, one for increments, and the other for decrements. Thus, building on the G-Counter, the array representation will now be a $2 \times n$ matrix, where the first row tracks increments, and the second row tracks decrements.
$$ \begin{aligned} node_1 &= [[1,0,0],[0,0,0]] \\ node_2 &= [[0,1,0],[0,0,0]] \\ node_3 &= [[0,0,0],[0,0,1]] \end{aligned} $$ $$ Counter = sum(pos) - sum(neg) $$ $$ \begin{aligned} node_1 \sqcup node_2 \sqcup node_3 &= [[1,1,0],[0,0,1]] = 1 \\ pos &= [max(1,0,0), max(0,1,0), max(0,0,0)] = [1,1,0] \\ neg &= [max(0,0,0), max(0,0,0), max(0,0,1)] = [0,0,1] \end{aligned} $$
Grow-only Set (G-Set)¶
Applying the Grow Counter but to set of items essentially gives you your Grow-only set. You apply union between the different sets that you maintain, and you can only add an item to the set.
payload set A
initial ∅
update add(element e)
A := A ∪ {e}
query lookup(element e) : boolean b
let b = (e ∈ A)
compare (S, T) : boolean b
let b = (S.A ⊆ T.A)
merge (S, T) : payload U
let U.A = S.A ∪ T.A
Example from Wikipedia
2 Phase Set (2P-Set)¶
Two G-Sets are combined to create a 2P-Set. The Remove set (also called the tombstone set) keeps track of removals. Once a certain item is added to the Remove set, it can never be added back into the set. This effectivey makes the 2P-Set based on "remove-wins" semantics.
payload set A, set R
initial ∅, ∅
query lookup(element e) : boolean b
let b = (e ∈ A ∧ e ∉ R)
update add(element e)
A := A ∪ {e}
update remove(element e)
pre lookup(e)
R := R ∪ {e}
compare (S, T) : boolean b
let b = (S.A ⊆ T.A ∧ S.R ⊆ T.R)
merge (S, T) : payload U
let U.A = S.A ∪ T.A
let U.R = S.R ∪ T.R
Example from Wikipedia
Last Write Wins (LWW Element Set)¶
Like the 2P-Set, the LWW-Element-Set consists of an add-set $A$ and a remove-set $R$, but unlike it, every element in these sets is stored as a tuple paired with a timestamp. This allows us to resolve conflicts by looking at which operation happened last. This essentially is the "Last write wins" semantics (and hence the name).
An element $e$ is considered a member of the set if it is in the add-set $A$, and either it is not in the remove-set $R$ or the latest timestamp of $e$ in $A$ is greater than the latest timestamp of $e$ in $R$. If the timestamps are exactly equal, a bias is used (usually "add-wins" or "remove-wins") to break the tie.
payload set A, set R
initial ∅, ∅
query lookup(element e) : boolean b
let b = (∃ t_a : (e, t_a) ∈ A ∧ (∀ t_r : (e, t_r) ∈ R ⇒ t_a > t_r))
update add(element e, timestamp t)
A := A ∪ {(e, t)}
update remove(element e, timestamp t)
R := R ∪ {(e, t)}
compare (S, T) : boolean b
let b = (S.A ⊆ T.A ∧ S.R ⊆ T.R)
merge (S, T) : payload U
let U.A = S.A ∪ T.A
let U.R = S.R ∪ T.R
Example from Wikipedia
Observed-Remove Set (OR-Set)¶
In a 2P-Set, once an element is removed, it can never be added back. The Observed-Remove Set (OR-Set) solves this limitation. Instead of relying on timestamps, the OR-Set assigns a unique tag (or identifier) to each add operation.
When an element is added, a unique tag is generated and stored with the element in the add-set $A$. When an element is removed, all the tags for that element currently observed (i.e. present in the add-set) are added to the remove-set $R$. An element is considered present in the set if there is at least one tag in the add-set that has not been matched in the remove-set.
payload set A, set R
initial ∅, ∅
query lookup(element e) : boolean b
let b = (∃ u : (e, u) ∈ A ∧ u ∉ R)
update add(element e)
let u = unique()
A := A ∪ {(e, u)}
update remove(element e)
pre lookup(e)
R := R ∪ {u | (e, u) ∈ A}
compare (S, T) : boolean b
let b = (S.A ⊆ T.A ∧ S.R ⊆ T.R)
merge (S, T) : payload U
let U.A = S.A ∪ T.A
let U.R = S.R ∪ T.R
Sequence CRDTs¶
Say you wanted to build something more complex, like a collaborative text editor or a shared document. We'd need a way to maintain an ordered list of elements (a sequence) where items can be inserted or deleted at arbitrary indices. Counters would get arbitrarily large or complex to manage/implement. This is where Sequence CRDTs come in.
The problem with simple arrays is that integer indices shift whenever something is inserted or deleted. If two users insert text at the same place concurrently, their local indices diverge, leading to a mess of conflicts. Sequence CRDTs bypass this by assigning a unique, globally ordered identifier to every single element in the sequence. Rather than a flat array, the sequence is represented as a tree or a linked list of elements with these stable identifiers.
Some common Sequence CRDT ideas are:-
- Replicated Growable Array (RGA): Uses a linked list where each element references the identifier of the element preceding it. Deletions are handled using tombstones.
- Logoot / LSEQ: Use a boundary-based allocation of identifiers. Identifiers are sequences of integers representing paths in a tree of infinite branching factor.
- Yjs / Automerge: Highly optimized, modern sequence CRDT frameworks that compress metadata and structure updates as continuous operations to achieve near-native performance for collaborative editing.
Closing notes¶
There are a lot of CRDTs out there, and I kinda lost steam trying to write them down when I reached LWW Set. You can see with the examples being lifted off straight from Wikipedia. I'll probably document few other ones that I found/find interesting and put them up at some other point!