Real-Time Kubernetes Debugging, Monitoring and Alerting with BotKube


Ref: https://www.meetup.com/k8s-cloudnative-online/events/269706847

My take away points

Real-Time Kubernetes Debugging

1 Describe
* describe pod and see event
* describe service and see endpoint
2 Events
* kubectl get events
gives events in that namespace
* kubectl get events --all-namespace
3 exec with -it (interactive terminal) for pod. Use -c to run inside specific container. 
4. kubectl debug is K8s 1.18 alpha feature. First enable alpha feature. We can dynamically insert container in pod, for debug purpose. 
5. Logs
* kubectl logs, 
Container shall expose its logs using stdout
* kubectl logs --previous 
provides log for when it run last time. It is useful if any pod is keep restating. 

Grafana Dashboard: https://grafana.com/grafana/dashboards?orderBy=name&direction=asc

For K8s + Prometheus setup, relevant dashboards are : https://grafana.com/grafana/dashboards?dataSource=prometheus&direction=asc&orderBy=name&search=kubernetes

Select one dashboard. import in you Grafana with its ID. OR one can download JSON file and import the JSON file

 BotKube

BotKube is for debugging and monitoring
it can run kubectl also

kubectl get clusterroles
give all who have cluster role

BotKube can work with multiple cluster

We can add custom filter in GoLang to BotKube: https://www.botkube.io/filters/ 

BotKube next release
- It will be integrated with Microsoft Team
- It will have json path based status:field monitoring

Reference: 
https://github.com/infracloudio/botkube
https://www.botkube.io 

From Live Chat

* if service has two label and pod has only 1 out of 2 labels then also that pod is part of service due to OR condition.
* deployments are not associated with any node, so one cannot get node IP address from deployment
* Better go with Prometheus Operator Model that is a part of Kube-Prometheus project. It'll install node-exporter that is a daemonset. It'll act as a agent for nodes
* Prometheus can work with docker container also, without K8s
* splunk and kibana both can be combined. 
https://blogs.halodoc.io/production-grade-eks-logging-with-efk-via-sidecar/
* how scaling is designed for botkube ? once is a while, there can be many events from K8s cluster . Most of the time, there are no events from k8s cluster. 
Scaling is using K8s HPA only. Even we have so many events, each event processing needs little CPU. 

Meetup Recording : https://www.youtube.com/watch?v=bGnQep5bY6c

K8s Dashboard


CKAD : 4.Design


Each component should be decoupled from resources.

All resources have transient relationship with others.

K8s orchestration works with series of agents = controllers = watch loops.

PodSpec determines the best node for deployment

One CPU =
1 AWS vCPU or
1 GCP Core
1 Azure vCore
1 Hyperthread in bare metal Intel CPU with Hyperthreading

Label and Selector 

Label can be
- production / development
- department name
- team name
- primary application developer

Selectors are namespace scoped, unless --all-namespace is used
--selector key=value
OR
-l key=value

If we use command
kubectl create deployment design2 --image=nginx
Then a pod is created with label app=design2
if we edit label of that pod, then design2 deployment will create another pod with label app=design2
if we delete deployment design2 then all pod whose label is app=design2, those will only get deleted. 

To show all labels use --show-labels options with kubectl get command. 

Job

Job is for scenarios when you don’t want the process to keep running indefinitely.  

It does support parallel processing of a set of independent but related work items. These might be 
- emails to be sent, 
- frames to be rendered, 
- files to be transcoded, 
- ranges of keys in a NoSQL database to scan, and so on.

A Replication Controller manages Pods which are not expected to terminate (e.g. web servers), and a Job manages Pods that are expected to terminate (e.g. batch tasks).

Jobs are part of batch API group
It has following parameters
1. activeDeadlineSeconds it can remain alive for that many seconds only.
2. completions : How many instance ? default 1
3. parallelism : how many should run at a time ? default 1. with value 0, the job is paused. 
4. restartPolicy : {Never | OnFailure } Default is Always. Default is not suitable for Job
5. It is restarted for backoffLimit times default 6
6. ttlSecondsAfterFinished default 'never'

* If parallelism > completions  then parallelism = completions 

While debugging set restartpolicy = never. This policy applies to pod not to job

The job status is Failed if
- restarted more than backoffLimit times OR
- it run more than activeDeadlineSeconds
else status is Complete

Delete Job
With cascade = false, only job get deleted, not pods

kubectl delete jobs job_name cascade=false

CronJob
Linux style cronjob syntax
MM HH DD MM WW
It can be list with comma separated value: 1,2
It can be range with hyphen: 1-5
It can be * to indicate all
It can be */ and number to indicate periodic: */2
* and ? has same meaning. 

CronJob creates multiple jobs as per schedule. The CronJob is only responsible for creating Jobs that match its schedule, and the Job in turn is responsible for the management of the Pods it represents.

It has following parameters1. If a cronJob has sleep for 30 seconds and activeDeadlineSeconds is 10 then none of the job created by cronJob get completed state. 

2. startingDeadlineSeconds If cornjob cannot scheduled within this time then it is considered as failed. Failure can also because of forbid policy. After 100 such failure, no more job will get scheduled. 

Note: If startingDeadlineSeconds is set then failure count is considered in last startingDeadlineSeconds . It should be less then 100. 

3. concurrencyPolicy 
Allow:
Forbid: If second job is scheduled, before earlier job finished, then it is not allowed
Replace

4. suspend
all subsequent job will not be scheduled. 

5. successfulJobsHistoryLimit and failedJobsHistoryLimit
How many job shall be kept


Terms for multi container pod
1. Ambassador : Communicate with outside resources / outside cluster. E.g. Envoy Proxy
- Proxy local connection
- Reverse Proxy
- Limits HTTP request
- Re-route to outside world
2. Adapter : Modify data that primary container generates
3. sidecar : helps to provide service that is not found in primary container. E.g. logging

Flexibility : one application per pod
granular scalability : one application per pod
best inter container performance : Multiple application per pod

Containerizing an application

- It should be stateless
- It should be transient
- Remove the environment configuration. It should be via ConfigMap and Secrets
- it is like converting city bus to scooters 

After containerization of application, just ask

Q1 : Is my application as decoupled as it could possible be?
Q2 : Are all components design considering other components are transient. Will it work with Chaos Monkey?
Q3 : Can I scale any particular component?
Q4 : Have I used stable and open standard to meet my need?

Managing Resource Usage

If pod ask more CPU then defined, then
- Nothing

If pod ask more memory then defined, then behavior is undefined
- restart pod OR
- evicted node

If pod ask memory more than node has, then
- evicted node

If pod ask more storage then defined, then 
- evicted node

Resource Limits

1. CPU: cpu
2. Memory: memory
3. Huge Pages: hugepages-2Mi
4. Ephemeral Storage: ephemeral-storage
they apply at container level
Pod level value is summation of all container's values

The resources can be specify at project quota level also. 

limits:
  cpu: "1"
  memory: "1Gi"
requests:
  cpu: "0.5"
  memory: "500Mi"

k describe node "Node Name"

We can specify LimitRange object with default values. It is applicable within namespace. It is applicable if admission controller LimitRanger is enabled. 

apiVersion: v1
kind: LimitRange
metadata:
  name: limit-mem-cpu-per-container
spec:
  limits:
  - max:
      cpu: "800m"
      memory: "1Gi"
    min:
      cpu: "100m"
      memory: "99Mi"
    default:
      cpu: "700m"
      memory: "900Mi"
    defaultRequest:
      cpu: "110m"
      memory: "111Mi"
    type: Container

- While creating container, if memory request and limit both are not specified 
then default range applies.  
- While creating container, if memory request is not specified and limit is specified. Then request value is same as limit
- While creating container, if memory request is specified and limit is not specified. Then limit value is double then request value


CNI 

* Some CNI plugins supports Network Policies. E.g. Calico, Canal, Kube Router, Romana, Weave Net

* Some CNI plugins supports encryption of UDP and TCP traffic. E.g. Calico, Kopeio, Weave Net

* Some CNI plugins allows vxlan. E.g. Canal, Flannel, Kopeio-networking, Weave Net

* CNI plugins are layer 2 or layer 3
Layer 2: Canal, Flannel, Kopeio-networking, Weave Net
Layer 3: Calico, Romana, Kube Router

* kubenet is basic CNI. It relis on cloud provider for routing and cross node networking

./enter_pod.sh "pod name"


#!/bin/sh

containerId=`kubectl get pods $1 -o jsonpath='{.status.containerStatuses[1].containerID}' | sed -e "s/^docker:\/\///"`
pid=`docker inspect --format {{.State.Pid}} $containerId`
echo $pid
sudo nsenter --target $pid --mount --uts --ipc --net --pid sh

CKAD : 3. Build


App Container (appc) is an open specification that defines several aspects of how to run applications in containers: an image format, runtime environment, and discovery protocol. rkt's native image format and runtime environment are those defined by the specification.

clear container (from intel) uses kvmtool mini-hypervisor. So it is VM with quick bootup and low memory footprint. Not comparable with Docker but acceptable for many use cases. 

If we create a file inside Docker container, then it is acutally located at 
/var/snap/docker/common/var-lib-docker/aufs/diff/  
OR 
/var/lib/docker/aufs/diff/

Tools
1. Docker
2. buildah
- create OCI image
- with or without Dockerfile
- no superuser previliage needed
- Go-lan based API for easy integation
3. podman (pod manager)
- replacement of "docker run"
- it is for container LCM
4. Kompose

sudo kompose convert -f docker-compose.yaml -o localregistry.yaml

latest is just a string. we need process to name and rename latest version as "laters" as an when it available. Else, there is no point. 

k exec -it -- /bin/bash
Here instead of /bin/bash any tool of local host, (where kubectl is running) can be used. 

redinessProbe and livenessProbe
1. exec statement
2. HTTP GET. return value 200-399
3. TCP. Try to open port on pre-determined port

To get logs generated by etcd
k -n kube-system logs etcd

The events can be listed with
k describe pod  

We can user "--dry-run -o yaml" just to generate YAML file

Minikube

To access K8s service on Minikube we have few approaches

1. Make it as NodePort Service

1.1  we can change service type as NodePort by
k path svc -p '{"spec":{"type":"NodePort"}}'
Now to access NodePort service on Minikube, we need IP address of virtual box. 

1.2
minikube ip
this command  give IP address of Worker+Master Node.
User command 
curl http://192.168.99.108:31754/v2/
curl http://"Minikube IP":"NodePort"/v2/

1.3

Use the command
minikube service --url
you will get service end point
http://192.168.99.108:31754

http://"Minikube IP":"NodePort"

We can open this URL using default browser using command
minikube service 

2.1

We can use ClusterIP

sudo route add 10.100.88.2 gw 192.168.99.108
sudo route add gw

Registry

We should add insecure registry to docker with its ClusterIP 

sudo vim /etc/docker/daemon.json

{ "insecure-registries":["10.110.186.162:5000"] }

Then Restart Docker Service
sudo systemctl restart docker.service