Replication - keeping a copy of the same data on multiple machines that are connected via a network. And it’s good for

Keep data geographically close to your users
Allow the system to continue working even if some of its parts have failed
Scale out the number of machines that can serve read queries

If all we have is static data and don’t have to worry about changes, then it is pretty easy: just keep a copy on each machine. However, it’s the changes what make the process tricky, and that will be the focus of the chapter.

Leaders and Followers

Each node(a machine) that stores a copy of the database is called a replica. With multiple replicas, how can we ensure that all the data ends up on all of them?

The most common solution for this is called leader-based replication (also known as active/passive or master-slave replication).

One of the replicas is designated as the leader. When clients want to write to the database, they must send their requests to the leader, which first writes the new data to its local storage.
The other replicas are known as followers(read replicas, salves, secondaries, or hot standbys). Whenever the leader writes new data to its local storage, it also sends the data change to all of its followers as part of a replication log or change stream. Each follower takes the log from the leader and updates its local copy of the database accordingly.
When a client wants to read from the database, it can query either the leader or any of the followers. However, writes are ONLY accepts on the leader.

Leader-based replication is not restricted to databases: distributed message brokers such as Kafka also use it.

Synchronous vs Asynchronous Replication

The advantage of synchronous replication is that the follower is guaranteed to have an up-to-date copy of the data that is consistent with the leader. If leader suddenly fails, the data will still be available on the follower. The disadvantages, however, is that if the follower doesn’t respond(crashed, network fault or other reasons), the write cannot be processed.

Thus, most “synchronous” database only make one of its follower synchronous. If the synchronous follower becomes unavailable or slow, one of the async followers is made synchronous. This config is sometimes also called semi-synchronous

Often, leader-based replication is configured to be completely asynchronous. Although that means the data is not guaranteed to be durable, the advantage is that the leader can continue process writes even if all of its follower is fall behind.

Setting Up New Followers

If we need to set up a new follower, how do we ensure that the new follower has an accurate copy of the leader’s data?

Simply copying the data files from one to another is not sufficient since there might be new data add in during the process of copying. Thus, the process of setting up new follower would look like this:

Take a consistent snapshot of the leader’s database at some point in time.