Introduction: What is the Etcd ?

Daein Park
5 min readNov 30, 2021

This article is for the Etcd introduction. I will show you what the Etcd is in a glance.

What is the Etcd ?

The name comes from etc directory of linux which stores usually all configurations of the linux. And suffix “D” of the etcd means the Distributed architecture. Actually, the etcd is a distributed key-value store like this example. Simply you can put key-value pair, and get the value using the key.

$ etcdctl put KEY VALUE
OK
$ etcdctl get KEY
KEY
VALUE

The most famous use case is Kubernetes. The etcd stores all manifests and the kubernetes cluster states information as key-value pairs. So if etcd is crushed, then control plane affects functioning due to not syncing current cluster states. But even though new updates are not applied to the current status, it would not be affected against running pods on the worker nodes.

Like this diagram, etcd is crushed then api cannot fetch and update the cluster states from etcd. It affects all dependencies of master api service, such as controller manager, scheduler and kubelet agent. But existing running pods are working as it is.

What is the RAFT ?

I gonna show you most important internal functioning of the etcd like RAFT.
What is RAFT ? RAFT is distributed consensus protocol. It makes a cluster keep a data consistently among multiple members. The etcd relies on RAFT, so if you can understand the RAFT, it’s much helpful to understand the etcd as well.

RAFT is based on Leader of the cluster. In other words, it means all changes go through the leader. If there is no leader in the etcd cluster, then the cluster would not work.

The leader is elected as follows. This process is called LEADER Election.
All member states are started as “follower” initially, if a leader of the cluster is not elected. If followers know that there is no leader then they can become a candidate. The candidate requests to vote to the other members. And then the members will reply with their votes. Eventually the candidate becomes the leader if it got votes from a majority of cluster members including itself. This picture is simplified the leader election process on the cluster.

Leader Election

For example based on real logs, this etcd log shows how to elect a leader in the etcd cluster based on real logs. In this case, etcd1 has 1111 ID, etcd2 has 2222 ID and etcd3 has 3333 ID. The etcd1 member became a candidate and voted for itself first, and sent voting requests to another members. The etcd1 member could get a majority as receiving a vote from etcd2 member including own vote. Eventually the etcd1 elected as a leader as you can see the logs. Look red log messages.

Additionally it’s good to memorize TWO timeout values for more understanding leader election. In Raft, there are two timeout values which control the leader elections.

They are ETCD_ELECTION_TIMEOUT and ETCD_HEARTBEAT_INTERVAL. You have to set up bigger ETCD_ELECTION_TIMEOUT than 5 times of ETCD_HEARTBEAT_INTERVAL at least. It is evaluated when etcd started.
If the timeout is not valid, etcd would be failed to start.

Let’s take a look how to work the timeouts each other on the etcd cluster leader election.

The ETCD_ELECTION_TIMEOUT is the amount of time a follower waits until becoming a candidate. When expired the election timeout, a follower becomes a candidate and starts new election term. But leader’s heartbeat and candidate’s vote request make this timeout reset. In other words, if your cluster is stable, then leader’s heartbeat makes the election timeout reset regularly.

And ETCD_HEARTBEAT_INTERVAL is the frequency with which the leader will notify followers that it is still the leader.

Like this diagram, the election term will continue until a follower stops receiving heartbeats from the leader. If not received the heartbeat, new leader election would be started again, and during the election the cluster would not work as well.

Lost leader event causes new leader election

At least the leader keeps sending the heartbeats to the followers within the election timeout, then there is no new leader election, and no influence of working the cluster.

No leader is No cluster

Next one is for influence of the lost leader event aforementioned. I said no leader no cluster. You can see what it means in this example. In first example, I killed a follower member, yes it’s not leader, it did not affect working cluster expectedly.

BUT, when killed a leader member, it could not get a key from the cluster temporarily until elected new leader. Usually etcd client would be implemented to retry request if failed, so as a result, the value can be gotten. Temporary lost leader event would be no problem.

Conclusion

We glanced the etcd from the RAFT point of view. I hope this knowledge help you to understand the etcd. Thank you for reading.

References

https://etcd.io/docs/v3.5/
https://github.com/ongardie/dissertation/blob/master/stanford.pdf

--

--

Daein Park

Hi, I’m Daein working at Red Hat. Just do something fun :) Nothing happens, if you do nothing. #OpenShift #Kubernetes #Containers #Linux #OpenSource