CKA 6: Cluster Maintenance


Drain Cordon Uncordon  
If a worker node down, then K8s wait for "pod-eviction-timeout" period. Default value is 5 min
This value is passed to kube-control-manager as command line argument.

so if we can upgrade worker node within 5 min, it is ok. Else safer way is:
k drain "worker node name"

it also means the worker node is cordoned, i.e. no pod can be scheduled on it. So once it is back, we should uncorden it
k uncorden "worker node name"

K8s Version
It applies to: (1) kube-apiserver (2) controller-manager (3) kube-scheduler (4) kubelet (5) kube-proxy (6) kubectl
It does not apply to (1) etcd cluster (2) coreDNS

Cluster Upgrade
If Kube-apiserver has version X then
(1) controller-manager and (2) kube-scheduler can be with version [X - 1, X]
(1) kubelet and (2) kube-proxy can be with version [X - 2, X - 1, X]
kubectl can be with version [X - 1, X, X + 1]

* We should always upgrade one minor release at a time

kubeadm upgrade plan //Gives all the information about present version and latest available version
kubeadm upgrade apply "New version"

On master node

1. Upgrade kubeadm tool itself.
2. kubeadm upgrade plan "New version"hold hold is used to mark a package as held back, which will prevent the package from being automatically installed, upgraded or removed
3. kubeadm upgrade apply "New version"

k get nodes command show version of kublet, not version of kube-apiserver

3. if kubelet is present at master node then upgrade it
4. systemctl restart kubelet

on worker node

1. k drain "worker node name"
2. upgrade kubeadm
3. upgrade kubelet
4. kubeadm upgrade node config --kubelet-version "new version"
5. systemctl restart kubelet
6. k uncorden "worker node name"

OR

1. k drain "worker node name"
2. kubeadm upgrade node
3. upgrade kubelet
4. k uncorden "worker node name"

If we run k drain command for master node, then also it applies only to user applications. The K8s applications running on master node will not evacuated.

"apt-mark hold" is used to mark a package as held back, which will prevent the package from being automatically installed, upgraded or removed

Backup and Restore

1. YAML files


k get all -A -o yaml > all.yaml

OR use tools

ARK by HeptIO is now called Velero

instead of getting backup of YAML file, take back up etcd cluster. 

2. etcd cluster

2.1 etcd started with "--data-dir" option. Its default value is /var/lib/etcd

If the snapshot is copied from the data directory, there is no integrity hash and it will only restore by using --skip-hash-check.

2.2 ETCDCTL_API=3 etcdctl snapshot save snapshot.db
To restore

A. service kube-apiserver stop
B. ETCDCTL_API=3 etcdctl snapshot restore snapshot.db --name=""  --data-dir "" --initial-cluster "" --initial-cluster-token "" --initial-advertise-peer-urls "" --data-dir=""
C. Preferably use different (1) initial-cluster-token  and (2) data-dir. change etcd configuration with these 2 new values
D. service etcd restart
E. service kube-apiserver start

We must specify (1) endpoints (2) cacert (3) cert and (4) key

In managed K8s, we may not have access to etcd. So back up of YAML file is better approach.

3. If etcd is running on a storage volume that supports backup, such as Amazon Elastic Block Store, back up etcd data by taking a snapshot of the storage volume.

Reference
https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/recovery.md
https://www.youtube.com/watch?v=qRPNuT080Hk

0 comments:

Post a Comment