Real-Time Kubernetes Debugging, Monitoring and Alerting with BotKube


My take away points

Real-Time Kubernetes Debugging

1 Describe
* describe pod and see event
* describe service and see endpoint
2 Events
* kubectl get events
gives events in that namespace
* kubectl get events --all-namespace
3 exec with -it (interactive terminal) for pod. Use -c to run inside specific container. 
4. kubectl debug is K8s 1.18 alpha feature. First enable alpha feature. We can dynamically insert container in pod, for debug purpose. 
5. Logs
* kubectl logs, 
Container shall expose its logs using stdout
* kubectl logs --previous 
provides log for when it run last time. It is useful if any pod is keep restating. 

Grafana Dashboard:

For K8s + Prometheus setup, relevant dashboards are :

Select one dashboard. import in you Grafana with its ID. OR one can download JSON file and import the JSON file


BotKube is for debugging and monitoring
it can run kubectl also

kubectl get clusterroles
give all who have cluster role

BotKube can work with multiple cluster

We can add custom filter in GoLang to BotKube: 

BotKube next release
- It will be integrated with Microsoft Team
- It will have json path based status:field monitoring


From Live Chat

* if service has two label and pod has only 1 out of 2 labels then also that pod is part of service due to OR condition.
* deployments are not associated with any node, so one cannot get node IP address from deployment
* Better go with Prometheus Operator Model that is a part of Kube-Prometheus project. It'll install node-exporter that is a daemonset. It'll act as a agent for nodes
* Prometheus can work with docker container also, without K8s
* splunk and kibana both can be combined.
* how scaling is designed for botkube ? once is a while, there can be many events from K8s cluster . Most of the time, there are no events from k8s cluster. 
Scaling is using K8s HPA only. Even we have so many events, each event processing needs little CPU. 

Meetup Recording :

K8s Dashboard

CKAD : 4.Design

Each component should be decoupled from resources.

All resources have transient relationship with others.

K8s orchestration works with series of agents = controllers = watch loops.

PodSpec determines the best node for deployment

One CPU =
1 AWS vCPU or
1 GCP Core
1 Azure vCore
1 Hyperthread in bare metal Intel CPU with Hyperthreading

Label and Selector 

Label can be
- production / development
- department name
- team name
- primary application developer

Selectors are namespace scoped, unless --all-namespace is used
--selector key=value
-l key=value

If we use command
kubectl create deployment design2 --image=nginx
Then a pod is created with label app=design2
if we edit label of that pod, then design2 deployment will create another pod with label app=design2
if we delete deployment design2 then all pod whose label is app=design2, those will only get deleted. 

To show all labels use --show-labels options with kubectl get command. 


Job is for scenarios when you don’t want the process to keep running indefinitely.  

It does support parallel processing of a set of independent but related work items. These might be 
- emails to be sent, 
- frames to be rendered, 
- files to be transcoded, 
- ranges of keys in a NoSQL database to scan, and so on.

A Replication Controller manages Pods which are not expected to terminate (e.g. web servers), and a Job manages Pods that are expected to terminate (e.g. batch tasks).

Jobs are part of batch API group
It has following parameters
1. activeDeadlineSeconds it can remain alive for that many seconds only.
2. completions : How many instance ? default 1
3. parallelism : how many should run at a time ? default 1. with value 0, the job is paused. 
4. restartPolicy : {Never | OnFailure } Default is Always. Default is not suitable for Job
5. It is restarted for backoffLimit times default 6
6. ttlSecondsAfterFinished default 'never'

* If parallelism > completions  then parallelism = completions 

While debugging set restartpolicy = never. This policy applies to pod not to job

The job status is Failed if
- restarted more than backoffLimit times OR
- it run more than activeDeadlineSeconds
else status is Complete

Delete Job
With cascade = false, only job get deleted, not pods

kubectl delete jobs job_name cascade=false

Linux style cronjob syntax
It can be list with comma separated value: 1,2
It can be range with hyphen: 1-5
It can be * to indicate all
It can be */ and number to indicate periodic: */2
* and ? has same meaning. 

CronJob creates multiple jobs as per schedule. The CronJob is only responsible for creating Jobs that match its schedule, and the Job in turn is responsible for the management of the Pods it represents.

It has following parameters1. If a cronJob has sleep for 30 seconds and activeDeadlineSeconds is 10 then none of the job created by cronJob get completed state. 

2. startingDeadlineSeconds If cornjob cannot scheduled within this time then it is considered as failed. Failure can also because of forbid policy. After 100 such failure, no more job will get scheduled. 

Note: If startingDeadlineSeconds is set then failure count is considered in last startingDeadlineSeconds . It should be less then 100. 

3. concurrencyPolicy 
Forbid: If second job is scheduled, before earlier job finished, then it is not allowed

4. suspend
all subsequent job will not be scheduled. 

5. successfulJobsHistoryLimit and failedJobsHistoryLimit
How many job shall be kept

Terms for multi container pod
1. Ambassador : Communicate with outside resources / outside cluster. E.g. Envoy Proxy
- Proxy local connection
- Reverse Proxy
- Limits HTTP request
- Re-route to outside world
2. Adapter : Modify data that primary container generates
3. sidecar : helps to provide service that is not found in primary container. E.g. logging

Flexibility : one application per pod
granular scalability : one application per pod
best inter container performance : Multiple application per pod

Containerizing an application

- It should be stateless
- It should be transient
- Remove the environment configuration. It should be via ConfigMap and Secrets
- it is like converting city bus to scooters 

After containerization of application, just ask

Q1 : Is my application as decoupled as it could possible be?
Q2 : Are all components design considering other components are transient. Will it work with Chaos Monkey?
Q3 : Can I scale any particular component?
Q4 : Have I used stable and open standard to meet my need?

Managing Resource Usage

If pod ask more CPU then defined, then
- Nothing

If pod ask more memory then defined, then behavior is undefined
- restart pod OR
- evicted node

If pod ask memory more than node has, then
- evicted node

If pod ask more storage then defined, then 
- evicted node

Resource Limits

1. CPU: cpu
2. Memory: memory
3. Huge Pages: hugepages-2Mi
4. Ephemeral Storage: ephemeral-storage
they apply at container level
Pod level value is summation of all container's values

The resources can be specify at project quota level also. 

  cpu: "1"
  memory: "1Gi"
  cpu: "0.5"
  memory: "500Mi"

k describe node "Node Name"

We can specify LimitRange object with default values. It is applicable within namespace. It is applicable if admission controller LimitRanger is enabled. 

apiVersion: v1
kind: LimitRange
  name: limit-mem-cpu-per-container
  - max:
      cpu: "800m"
      memory: "1Gi"
      cpu: "100m"
      memory: "99Mi"
      cpu: "700m"
      memory: "900Mi"
      cpu: "110m"
      memory: "111Mi"
    type: Container

- While creating container, if memory request and limit both are not specified 
then default range applies.  
- While creating container, if memory request is not specified and limit is specified. Then request value is same as limit
- While creating container, if memory request is specified and limit is not specified. Then limit value is double then request value


* Some CNI plugins supports Network Policies. E.g. Calico, Canal, Kube Router, Romana, Weave Net

* Some CNI plugins supports encryption of UDP and TCP traffic. E.g. Calico, Kopeio, Weave Net

* Some CNI plugins allows vxlan. E.g. Canal, Flannel, Kopeio-networking, Weave Net

* CNI plugins are layer 2 or layer 3
Layer 2: Canal, Flannel, Kopeio-networking, Weave Net
Layer 3: Calico, Romana, Kube Router

* kubenet is basic CNI. It relis on cloud provider for routing and cross node networking

./ "pod name"


containerId=`kubectl get pods $1 -o jsonpath='{.status.containerStatuses[1].containerID}' | sed -e "s/^docker:\/\///"`
pid=`docker inspect --format {{.State.Pid}} $containerId`
echo $pid
sudo nsenter --target $pid --mount --uts --ipc --net --pid sh

CKAD : 3. Build

App Container (appc) is an open specification that defines several aspects of how to run applications in containers: an image format, runtime environment, and discovery protocol. rkt's native image format and runtime environment are those defined by the specification.

clear container (from intel) uses kvmtool mini-hypervisor. So it is VM with quick bootup and low memory footprint. Not comparable with Docker but acceptable for many use cases. 

If we create a file inside Docker container, then it is acutally located at 

1. Docker
2. buildah
- create OCI image
- with or without Dockerfile
- no superuser previliage needed
- Go-lan based API for easy integation
3. podman (pod manager)
- replacement of "docker run"
- it is for container LCM
4. Kompose

sudo kompose convert -f docker-compose.yaml -o localregistry.yaml

latest is just a string. we need process to name and rename latest version as "laters" as an when it available. Else, there is no point. 

k exec -it -- /bin/bash
Here instead of /bin/bash any tool of local host, (where kubectl is running) can be used. 

redinessProbe and livenessProbe
1. exec statement
2. HTTP GET. return value 200-399
3. TCP. Try to open port on pre-determined port

To get logs generated by etcd
k -n kube-system logs etcd

The events can be listed with
k describe pod  

We can user "--dry-run -o yaml" just to generate YAML file


To access K8s service on Minikube we have few approaches

1. Make it as NodePort Service

1.1  we can change service type as NodePort by
k path svc -p '{"spec":{"type":"NodePort"}}'
Now to access NodePort service on Minikube, we need IP address of virtual box. 

minikube ip
this command  give IP address of Worker+Master Node.
User command 
curl http://"Minikube IP":"NodePort"/v2/


Use the command
minikube service --url
you will get service end point

http://"Minikube IP":"NodePort"

We can open this URL using default browser using command
minikube service 


We can use ClusterIP

sudo route add gw
sudo route add gw


We should add insecure registry to docker with its ClusterIP 

sudo vim /etc/docker/daemon.json

{ "insecure-registries":[""] }

Then Restart Docker Service
sudo systemctl restart docker.service

CKAD : 2. K8s Architecture

Key take away points
  • All the configuration is defined in YAML and stored in JSON format
  • Container creation tools: Buildah, Podman, cri-o, containerd, frakti, 
  • Mesos has multi level scheduler for data center cluster
  • Evolution: Brog-> Mesos, Cloud Foundry, K8s, Omega
  • Replication Controller is now
  • - Deployment controller
  • - Replicaset
  • Deployment ensures that resources are available such as (1) IP Address and (2) Storage. Then deploys ReplicaSet
  • So if we delete ReplicaSet then deployment recreate it. 
  • If we delete deployment, then ReplicSet also get deleted. But service and pod remains
  • If we delete service then pod get delted. 
  • Node has taints to discourage pod assignment, unless pod has toleration taint is expressed as key=value:effect
  • Annotation is not for k8s. it is for 3rd party tools
  • 'Cloud controller manager' is optional at master node. If it is present, the kublet shall be started with option --cloud-provider-external
  • Pause container is used to get IP address
Useful commands
1. 1 To run pod, without YAML filek run newpod --image=nginx --generator=run-pod/v1

1.2 To create deployment without YAML filek create deployment firstpod --image=nginx

2. To know all taintsk describe nodes | grep -i taint

3.1 To know about all resourcesk api-resources
NAME                              SHORTNAMES       APIGROUP                       NAMESPACED   KIND
bindings                                                                          true         Binding
componentstatuses                 cs                                              false        ComponentStatus
configmaps                        cm                                              true         ConfigMap
endpoints                         ep                                              true         Endpoints
events                            ev                                              true         Event
limitranges                       limits                                          true         LimitRange
namespaces                        ns                                              false        Namespace
nodes                             no                                              false        Node
persistentvolumeclaims            pvc                                             true         PersistentVolumeClaim
persistentvolumes                 pv                                              false        PersistentVolume
pods                              po                                              true         Pod
podtemplates                                                                      true         PodTemplate
replicationcontrollers            rc                                              true         ReplicationController
resourcequotas                    quota                                           true         ResourceQuota
secrets                                                                           true         Secret
serviceaccounts                   sa                                              true         ServiceAccount
services                          svc                                             true         Service
mutatingwebhookconfigurations               false        MutatingWebhookConfiguration
validatingwebhookconfigurations             false        ValidatingWebhookConfiguration
customresourcedefinitions         crd,crds           false        CustomResourceDefinition
apiservices                                       false        APIService
controllerrevisions                                apps                           true         ControllerRevision
daemonsets                        ds               apps                           true         DaemonSet
deployments                       deploy           apps                           true         Deployment
replicasets                       rs               apps                           true         ReplicaSet
statefulsets                      sts              apps                           true         StatefulSet
tokenreviews                                       false        TokenReview
localsubjectaccessreviews                           true         LocalSubjectAccessReview
selfsubjectaccessreviews                            false        SelfSubjectAccessReview
selfsubjectrulesreviews                             false        SelfSubjectRulesReview
subjectaccessreviews                                false        SubjectAccessReview
horizontalpodautoscalers          hpa              autoscaling                    true         HorizontalPodAutoscaler
cronjobs                          cj               batch                          true         CronJob
jobs                                               batch                          true         Job
certificatesigningrequests        csr                false        CertificateSigningRequest
leases                                               true         Lease
events                            ev                       true         Event
daemonsets                        ds               extensions                     true         DaemonSet
deployments                       deploy           extensions                     true         Deployment
ingresses                         ing              extensions                     true         Ingress
networkpolicies                   netpol           extensions                     true         NetworkPolicy
podsecuritypolicies               psp              extensions                     false        PodSecurityPolicy
replicasets                       rs               extensions                     true         ReplicaSet
network-attachment-definitions    net-attach-def                true         NetworkAttachmentDefinition
ingresses                         ing                  true         Ingress
networkpolicies                   netpol               true         NetworkPolicy
runtimeclasses                                               false        RuntimeClass
poddisruptionbudgets              pdb              policy                         true         PodDisruptionBudget
podsecuritypolicies               psp              policy                         false        PodSecurityPolicy
clusterrolebindings                            false        ClusterRoleBinding
clusterroles                                   false        ClusterRole
rolebindings                                   true         RoleBinding
roles                                          true         Role
priorityclasses                   pc                   false        PriorityClass
csidrivers                                                false        CSIDriver
csinodes                                                  false        CSINode
storageclasses                    sc                      false        StorageClass
volumeattachments                                         false        VolumeAttachment

3.2 We can list associated verbs with commandk api-resources -o wide

3.3 We can list multiple resources as comma separated list
kubectl get deploy,rs,po,svc,ep

4. Under container we write:
- containerPort: 80

5. Under Service we write:
 - protocol: TCP
   port: 80

Kubernetes Resource Map

Docker image, without Internet

Let's assume, you wish to run a K8s pod with Docker container on a host, where there is no Internet. This blog will describe steps 

Run below command on another host, where Internet connection is present. 

docker save -o /

Then transfer to target host using FTP or SCP

At target host, run following command

docker load -i

Now make sure, at YAML file for pod or deployment, you add below line

imagePullPolicy: IfNotPresent

2019: Looking back with smile

Happy New Year 2020

For last two years, I reflect upon my life, achievements for previous year. So here was my year 2019


  • Attended satsang with Sri Sri Ravishankar Guruji. 
  • Read His commentary on "Shrimad Bhagavad Gita", in a WhatsApp group. Effectively utilization of WhatsApp and free time, whenever I need to wait for anyone. 
  • Attended Advance Mediation silence course for the 7th time

Social Media:

  • Now this blog is super active. Maximum posts 73 in this year. Majority are technical
  • Completed a series of online courses under "DevOps for Practitioners". Good refreshment of quality principal, that we learnt during MBA. Also happy to know, how they are applied in today's IT industry. 
  • Tried LinkedIn Premium 
  • GitHub: 
  • Earlier I created category "Bookmark" on this blog. Now it is no longer valid. Technology is changing so far. Many technical posts, I am keep updating. 


Registered for Certified Kubernetes Application Developer (CKAD) exam.  
Attended many technical events, including Kubernetes Day India

Sanskrit Promotion: 

  • Conducted 6th free workshop about Spoken Sanskrit in Feb 2019
  • Encouraged school students for Sanskrit by solving their doubts.
  • Continued watching YouTube series by Dr. Baladevanand Sagar on topic "Learn Sanskrit, Be Modern !". 

  • After almost 10 years, I changed job. Here also, learning new technology and here also, working with best talented people of our industry. 
  • Reconnected and had nice, meaningful conversion with old colleagues and professional people. 
  • Continue making new friends using Quick Ride

I read an excellent book. Transcendence. Read my book review here. I also read another interesting book "The Phoenix Project". It is a business novel with ToC (Theory of Constraint) principle for DevOps. 


  • Watched few movies:
    • Helaro Gujarati movie was good. 
    • Mission Mangal Hindi movie was the best
  • Watched a documentary movie on 
    • Indian Airforce
    • NaMo's "wild v/s man"

Next Year 2020:
  • Spend more time with people, famly, friends. 
  • Regular in SAADHANA
  • Complete YouTube series by Dr. Baladevanand Sagar on topic "Learn Sanskrit, Be Modern !". 
  • Re-Start conducting Free Spoken Sanskrit classes. 
  • Continue enriching my personal blog. A new series will be started about my personal notes on  "Machine Learning" course at coursera by Andrew NG. 
  • Clean up all old Bookmarks in Google Chrome. 
  • Cultivate good habits and get rid of bad habits. 
You may also like: 

2018: Looking back with smile
2017: Looking back with smile