Express YourSelf !: August 2020

CKA 14 : JSON

k command -o json

k command -o=jsonpath='{QUERY1}{QUERY2}'
{'\n'} and {'\t}

Loop : '{range.items[*]}{QUERY}{end}'

k command -o=custom-columns=COL_NAME1:QUERY1, COL_NAME2:QUERY2
Here, QUERYi is without .items[*]. It can be also without '{}'

k command --sort-by=QUERY1
Here, QUERY1 is without .items[*]. It can be also without '{}'

kubectl get pv --sort-by=.spec.capacity.storage -o=custom-columns=CAPACITY:.spec.capacity.storage
k get pv --sort-by={".spec.capacity.storage"} -o=custom_columns=CAPACITY:.spec.capacity.storage

k get pv --sort-by=.spec.capacity.storage -o=custom-columns=CAPACITY:.spec.capacity.storage
k get pv --sort-by=.spec.capacity.storage -o=custom_columns=CAPACITY:.spec.capacity.storage

kubectl config view --kubeconfig=my-kube-config

JSON query to list : pod v/s container image

k get po -A -o=custom-columns=Image:"spec.containers[*].image",Name:"metadata.name"

CKA 13: Troubleshooting

* Check the events at pod
kubectl describe pods ${POD_NAME}

* Pod in pending state
- no enough CPU/memory resource on node
- taint and tolerance
- hostport is occupied, if pod is using host network.

* Pod in waiting state
- pod is assigned to node
- pod is not able to pull image. Check image spelling. run command
docker pull IMAGE

* To validate YAML syntax
kubectl apply --validate -f mypod.yaml
Note: all spelling mistakes will be ignore, without validate

* To debug service,
- verify endpoints
kubectl get endpoints ${SERVICE_NAME}
- Verify that the pod's containerPort matches up with the Service's targetPort
- verify DNS entry with nslookup command
- If nslookup for short name fails then check /etc/resolv.conf "search" parameter
//kubelet is invoked with --cluster-dns (set DNS server) and --cluster-domain (set default : cluster-local)
- port value should be numeric, not string
- Is kube-proxy running?
- Is conntrack installed?

kubeadm tool deploy all k8s components as pod

They can be deployed as services too
Master: kube-apiserver, kube-controller-manager, kube-scheduler
Worker: kubelet, kube-proxy

with kubeadm tool logs can be checked with "k logs POD_NAME" command. With service logs can be checked with "journalctl -u SERVICE_NAME" command.

For static pod, its name is suffix with node name.

static pod path
1. /etc/systemd/system/kubelet.service.d/10-kubeadm.conf file has env variable KUBELET_CONFOG_ARGS that point to kubelet config file.
2. kublet config file is at /var/lib/kubelet/config.yaml
3. In this file, we have statisPodPath
4. its default value is generally /etc/kubernetes/manifest

* Node failure
- check cpu with top
- check hdd with df -h
- kubelet status with service status kubelet -l
- kubelet logs with journalctl -u kubelet
- check kubelet certificate with
openssl x509 -in /var/lib/kubelet/worker-1.crt -text

Restart kubelet
1. systemctl deamon-reload
2. systemctl restart kubelet

Label and Selector

K8s Object	Set-based	Equality-based
Job	Yes	Yes
Deployment	Yes	Yes
Replicaset	Yes	Yes
Daemonset	Yes	Yes
NodeSelector		Yes
NodeAffinity	Yes	Yes
NodeAntiAffinity	Yes	Yes
kubectl command	Yes	Yes
service		Yes

Set based label selector

selector:
  matchLabels:
    component: redis
  matchExpressions:
    - {key: tier, operator: In, values: [cache]}

- {key: environment, operator: NotIn, values: [dev]}

Equality based selector always support AND
to support OR, use set based selector

CKA 10 : Install K8s the Hard Way

Production grade cluster
- 5000 nodes
- 150,000 pods
- 300,000 containers
- 100 pods per node
A node can have upto 36 vCPU and 60 GB

Storage:
high performance SSD
if we have multiple concurrent connections then network based storage
PV, PVC if we need shared access among multiple pods

Assign label to node, as per its storage and then use node selector to assign application to specific node

* kubeadm add taints to master node, so application is not scheduled on it

Turnky solutions

kops is tool to install K8s on AWS.
kubeadm for on-prem
OpenShift
BOSH is tool for K8s on CloudFoundry Container Runtime
VMWare Cloud PKS
Vagrant provides set of useful scripts

Hosted solutions (managed solution)

GKE for GCP
AKS for Azure
OpenShift Online
EKS on AWS

HA

* If master fails
- pod will not be recreated, if it is part of replicaset
- we cannot use kubectl

* API server : active-active mode with LB
* Controller Manager and Scheduler : active-standby mode. Leader election process, parameters
-- leader-elect true .
-- least time 15 second
-- leader-elect-renew-deadline is 10 second
-- leader-elect-retry-period is 2 second
* etcd
- etcd is part of master node. it is called stacked control plane nodes topology
- etcd is not part of master node. it is called external etcd topology
Write is done by leader only. then it updates all of its followers. write is complete only it is done at majority of nodes. majority is Quorum N/2+1. N - Quorum = Fault Tolerance. N = 1, N = 2 then Fault Tolerance = 0. Recommendation N minimum as 3. also select N as odd. If N is even then cluster may fail due to network segmentation. N = 5 is sufficient.
distributed consensus using RAFT protocol.

https://github.com/mmumshad/kubernetes-the-hard-way

ssh-keygen command generates id_rsa and id_rsa.pub key pairs. The content of id_rsa.pub should be copied to peer host 'known_host' file.

secure cluster communication

The Kubernetes Controller Manager leverages a key pair to generate and sign service account tokens

Components for automation around SA
1. A Service account admission controller: It adds secret to pod
2. A Token controller (part of Controller Manager): It creates / deletes secrets as per SA creation/deletion. It uses option --service-account-private-key-file and API server uses option --service-account-key-file The secret type is ServiceAccountToken
3. A Service account controller

ref: https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/

etcd

etcd uses peer.key and peer.cert for multi-node HA etcd port 2379 and client.key client.cert for API server on port 2380. We shall set --client-cert-auth=true and also provide CA --trusted-ca-file=etcd.ca

command to know status of all k8s component
k get componentstatuses

Worker node
1. ca.crt and ca.key exist with master node
All below steps can be automated with TLS bootstraping
2. create key
3. create CSR with unique CN for each kubelet
4. sign certificate
5. distribute certificate

1. kube-apiserver --enable-bootstrap-token-auth=true
2. kube-controller-manager will sign certificate. It is already configured with ca.crt and ca.key. We shall pass --controllers=*,bootstrapsigner,tokencleaner to its service
3. create a secret with type : bootstrap.kubernetes.io/token Token format is abcdef.0123456789abcdef Here public id: abcdef private part is 0123456789abcdef. Name of secret bootstrap-token-PUBLICID
4. create cluster-role-binding create-csrs-for-bootstrapping for group=system:bootstrappers

5. create cluster-role-binding auto-approve-csrs-for-group for group=system:bootstrappers // for first time

6. create cluster-role-binding auto-approve-csrs-for-nodes for group=system:nodes // for renewal
7. create bootstrap-kubeconfig with 4 commands: cluster, credential, set-context, use-context
8. create YAML kind: KubeletConfiguration with cert and private key for TLS
7. kubelet service with --bootstrap-kubeconfig="/var/lib/kubelet/bootstrap-kubeconfig"

vargant file: https://gist.github.com/surajssd/71892b7a9c5c2cb175fd050cee45d495

CKA 9 : Networking

Command

ip link //to list out interfaces
ip addr add IPV4/24 dev eth0
route
ip route add IPV4_NW/24 via GW
ip route add default via GW
is same as
ip route add 0.0.0.0 via GW
/proc/sys/net/ipv4/ip_forward

Files

/etc/sysctl.conf file. Modify Boolean value for /proc/sys/net/ipv4/ip_forward

For persistence across reboot, modify
/etc/network/interfaces

DNS

/etc/hosts
/etc/resolv.conf
nameserver IPV4
nameserver 8.8.8.8
serach mycompany.com

Whatever mentioned in search is appended to all entries.

Preference to /etc/hosts and then to DNS server
as per /etc/nsswtich.conf
check for line starts with "hosts:" as key
see the value part
files for /etc/hosts
dns for nameserver

CNAME records map, one name to another name

DNS tools: nslookup, dig, hostlookup
nslookup and dig do not consider entries at /etc/hosts

coreDNS

Corefile is configuration file for coreDNS. it is at /etc/coredns/Corefile inside pod. coreDNS has many plugins https://coredns.io/plugins/
Here only we define name of K8s cluster. Default is cluster.local
This file is created as configmap.

kubelet adds file /etc/resolv.conf to each pod. The content of file is
nameserver "IP_ADDRESS"
the value for IP_ADDRESS is as per clusterIP of coreDNS service
coreDNS service generally runs with replica count = 2 for redundancy.
kubelet has config file: /var/lib/kubelet/config.yaml
it has clusterDNS, clusterIP for coreDNS service is specified. it has clusterDomain where K8s cluster name "cluster.local" is specified.

PTR Record
if clusterIP is a.b.c.d for SERVICE
then PTR record is
d.c.b.a.in-addr.arpa. "ttl" IN PTR SERVICE.NS.svc.ZONE.
similar PTR record for IPv6. Note: PTR record with IPv6 shall have same sequence.

If we have headless service HL having a pod with hostname POD then 2 A records for each pods (2 AAAA records, if IPv6)
HL.NS.svc.cluster.local. 4 IN A a.b.c.d
and
POD.HL.NS.avc.cluster.local. 4 IN A a.b.c.d

If HL listens on port 80 then SRV record for each pod

_http._tcp.HL.NS.svc.cluster.local 4 IN SRV "weight" "priority" 80 POD.HL.NS.avc.cluster.local.

PTR record for HL service
d.c.b.a.in-addr.arpa. 4 IN PTR POD.HL.NS.svc.cluster.local
similar PTR record for IPv6. Note: PTR record with IPv6 shall have same sequence.

CNAME record

SERVICE.NS.cluster.local. 10 IN CNAME www.example.com

Then we can have A and/or AAAA record for www.example.com

References
https://github.com/kubernetes/dns/blob/master/docs/specification.md

Before K8s 1.12 DNS server was kube-dns. After K8s 1.12 onwards it is CoreDNS. However the K8s microservice name is still kube-dns

Namespace

Commands

1. Create network namespace

ip netns add "NW_NS_NAME"
ip netns // to list NW NS
ip netns exec "NW_NS_NAME" ip link
is same as
ip -n "NW_NS_NAME" link
To connect two NS
ip link add VETH-1 type veth peer name VETH-2
ip link set VETH-1 netns NW_NS1
ip link set VETH-2 netns NW_NS2
ip -n NW_NS1 addr add IP1 dev VETH-1
ip -n NW_NS2 addr add IP2 dev VETH-2
Then set the link up
ip -n NW_NS1 link set dev VETH-1 up
ip -n NW_NS2 link set dev VETH-2 up
Cleanup
If we delete any one of the link and other will be deleted automatically.
ip -n NW_NS1 link del VETH-1

To connect many NS and build a virtual network within host, we need virtual switch. (1)Linux Bridge (2) Open vSwitch OVS

We will see Linux Bridge

2. Create bridge network/interface

ip link add v-net-0 type bridge

this is added in host. it is listed in response to ip link command
ip link set dev v-net-0 up

3. Create VETH pairs (pipe, virtual cable)

Now create virtual cable to connect to bridge
ip link add VETH-1 type veth peer name VETH-1-br
ip link add VETH-2 type veth peer name VETH-2-br
ip link add VETH-3 type veth peer name VETH-3-br
ip link add VETH-4 type veth peer name VETH-4-br

4. Attach veth to NS

ip -n NW_NS1 link set dev VETH-1 up
ip -n NW_NS2 link set dev VETH-2 up
ip -n NW_NS3 link set dev VETH-3 up
ip -n NW_NS4 link set dev VETH-4 up

5. Attach other veth to bridge

ip link set VETH-1-br master v-net-0
ip link set VETH-2-br master v-net-0
ip link set VETH-3-br master v-net-0
ip link set VETH-4-br master v-net-0

6. Assign IP address

ip -n NW_NS1 addr add IP1 dev VETH-1
ip -n NW_NS2 addr add IP2 dev VETH-2
ip -n NW_NS3 addr add IP3 dev VETH-3
ip -n NW_NS4 addr add IP4 dev VETH-4

ip addr add IP5/24 dev v-net-0

7. Bring the interface up

8. Then each namespace add host IP as default GW
ip netns exec NS1 ip route add IP0/24 via IP5
ip netns exec NS2 ip route add IP0/24 via IP5
ip netns exec NS3 ip route add IP0/24 via IP5
ip netns exec NS4 ip route add IP0/24 via IP5

9. Then add firewall rule for NAT
iptables -t nat -A POSTROUTING -s IP0/24 -j MASQUERADE

Also add default GW
ip netns exec NS1 ip route add default via IP5
ip netns exec NS2 ip route add default via IP5
ip netns exec NS3 ip route add default via IP5
ip netns exec NS4 ip route add default via IP5

* We can open a port at any of the namespace using firewall
iptables -t nat -A PREROUTING --dport 80 --to-destination IP2:80 -j DNAT

Always set NETMASK, while setting IP address using ip addr add command by adding /24

Docker Networking

- None: --network none Docker container is not attached to any network outside. All docker containers cannot talk with each other too.
- Host: --network host
- Bridge

docker network ls
ip link command list docker0

Docker internally run:
ip link add docker0 type bridge
But this docker0 is down

ip netns
docker inspect NS // NS from previous command. it is for one running container.

For each container there is a device VETH and
ip link set VETH master docker0

If we run
ip -n NS link
We can see VETH. It will have IP address also.
ip -n NS addr

docker run -p "HOST_PORT":"CONTAINER_PORT"
It will add iptables rule
~~iptables -t nat -A PREROUTING -j DNAT --dport HOST_PORT --to-destination CONTAINER_PORT~~
iptables -t nat -A DOCKER -j DNAT --dport HOST_PORT --to-destination CONTAINER_IP:CONTAINER_PORT

This can be verified with
iptables -nvL -t nat

CNI

Docker follows same steps as above for bridge network, except naming convention is different. Same steps by : Docker, rocket (rkt), Mesos, K8s. So there is a script name "bridge". Here bridge script is CNI plugin
bridge add "container id" "network namespace"

As per CNI step 1
* CRI must create n/w ns
* identify network, that container must attach to
* CRI invoke CNI plugin when container is added
* CRI invoke CNI plugin when container is deleted
* JSON format for n/w configuration

at CNI plugin steps 2 onwards
* must support command line arguments ADD/DEL/CHECK
* must support parameter container id, network namespace
* must manage IP address assignment
* must return result in specific format

Default CNI plugins : bridge, VLAN, IPVLAN, MACVLAN, Windows
Default IPAM CNI plugin: DHCP, host-local

Docker is not CNI. It has its own standard called CNM

K8s create docker container with --network none
Then invoke CNI plugins

Cluster Networking

kube-apiserver Master 6443
kube-scheduler Master 10251
kube-controller-manager Master 10252
etcd Master 2379
etcd Multiple Master 2380
kublet both 10250
NodePort Worker 30000-32767

kubelet is invoked with

--cni-conf-dif=/etc/cni/net.d

--cni-bin-dir=/etc/cni/bin

--network-plugin=cni

the CNI binary is located at /etc/cni/bin and it is invoked with "add | del | check" , "container id" and "network namespace"

At /etc/cni/net.d path, we have JSON file for each CNI plugin. It contains

cniVersion
name
type // the name of binary at /etc/cni/bin
bridge
isGateway // The network within node (bridge), shall have its IP address? So it can act as a gateway
isMasq // If NAT rule should be added for IP MASQUERADE or not?
ipam

IPAM

weave CNI plugin IPAM range is 10.32.0.0/12 = 10.32.0.1 to 10.47.255.254

total about 1e06 IP addresses. They are divided equally as per number of worker nodes.

For ClusterIP range

kube-api-server --service-cluster-ip-range ipNet

Default is 10.0.0.0 /24

We can check command line arguments for k8s master node by checking file at /etc/kubernetes/manifests path.

Ingress Controller
it consists of
- deployment or deamonset
- configmap
- service
- service account
- role
- clusterrole
- rolebinding

ingress.yaml has multiple rules for multiple host. For single host, and multiple path, we can have single rule.

backend:
serviceName:
servicePort:
rules:
- host: //optional
http:
paths:
- path:
backend:
serviceName:
servicePort:

Ingress can have
- tls at same level as rule

tls:
- hosts:
- HOST_FQDN

secretName: SECRET

Ingress can be in different namespace and ingress controller can be in different namespace. Generally ingress shall be in same namespace, where application resides. we can have multiple K8s ingress objects

We need to add

annotations:

nginx.ingress.kubernetes.io/rewrite-target: /

Reference:
Also refer
http://layers7.blogspot.com/2020/04/ckad-7exposing-applications.html
https://www.objectif-libre.com/en/blog/2018/07/05/k8s-network-solutions-comparison/

CKA 8: Storage

Docker image has all read only layer. top most layer is writable.

docker run -v VOLUME_NAME:/dir/path/inside/container IMAGE_NAME //volume mount

docker volume create VOLUME_NAME
It create volume at path /var/lib/docker/volumes/VOLUME_NAME

docker run -v /host/path:/dir/path/inside/container IMAGE_NAME //bind mount
is same as:
docker run --mount type=bind,source=/host/path,target=/dir/path/inside/container

Docker storage driver
AUFS, ZFS, BTRFS, Device mapper, overlay, overlay2
Docker choose appropriate driver based on OS.

Default volume driver plugins at Docker is local
Other driver can be specified with
docker --volume-driver
option

CSI is supported by K8s, Cloud Foundry and Mesos.

CSI =
- a set of RPCs
-- Create Volume
-- Delete Volume
-- Controller Publish Volume
with parameters and error code

in YAML file

volumes:
- name: data-volume
hostPath:
path: /host/path
type: Directory

To bind specific PV, and PVC: PV shall have

labels:
name: my-pv

PVC shall have
selector:
matchLabels:
name: my-pv

PV can be greater than PVC
PV and PVC have one to one relation. So unused volume is waste if PV > PVC
PVC capacity is same PV capacity if PV > PVC

Criteria for binding
- Sufficient Capacity
- Access Modes
- Volume Modes
- Storage Class
- Selector

If a PVC is used by pod, and if we delete it, then it gets stuck at 'Terminating' state
If a PVC is not used by pod, and if we delete it, then PV status is 'Released'

Also refer:

https://layers7.blogspot.com/2020/04/ckad-5-deployment-configuration.html
http://layers7.blogspot.com/2019/03/docker-volume.html

K8s API

The K8s APIs are grouped as follows:

api
apis
healthz
logs
metrics
version

Out of them, first two are related to cluster functionalities. "api" is also called "core api" and "apis" are called "named api". They are defined as "" in RBAC. The named api (apis) are further classified as following resource groups:

admission
admissionregistration.k8s.io
apiextensions.k8s.io
apiregistration.k8s.io
apps
auditregistration
authentication.k8s.io
authorization.k8s.io
autoscaling
batch
certificates.k8s.io
coordination.k8s.io
core
discovery
events.k8s.io
extensions
flowcontrol
imagepolicy
monitoring.coreos.com
networking.k8s.io
node.k8s.io
policy
rbac.authorization.k8s.io
scheduling.k8s.io
settings
storage.k8s.io
testdata

One can perform following actions (verbs) on these resource groups, if RBAC policy allows.

list
get
create
delete
update
watch

As per K8s 1.17, the K8s resources are grouped as follows:

The below table indicates, composition relationships among all K8s objects, as per K8s 1.17

Reference:
https://kubernetes.io/docs/reference/
https://kubernetes.io/docs/reference/kubectl/overview/
https://kubernetes.io/docs/concepts/overview/working-with-objects/object-management/
https://github.com/kubernetes/community/blob/master/contributors/devel/sig-api-machinery/strategic-merge-patch.md
https://www.cncf.io/blog/2020/03/17/how-to-setup-role-based-access-to-kubernetes-cluster/
https://github.com/kubernetes/kubernetes/issues/7856

https://kubernetes.io/docs/concepts/overview/kubernetes-api/
https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md
https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api_changes.md

Life is an exercise to express the InExpressible.

Express YourSelf !

CKA 14 : JSON

CKA 13: Troubleshooting

Label and Selector

CKA 10 : Install K8s the Hard Way

CKA 9 : Networking

CKA 8: Storage

K8s API

Total Pageviews

Subscribe by E-mail

Labels

Popular Posts

Coming Soon...

Followers

My Social Network

Blog Archive