CKAD : 8.Troubleshooting


Tools
* busybox container has shell
* DNS configuration file
* dig (for DNS lookup)
* tcpdump
* Prometheus
* Fluentd

Commands
k exec -ti "deployment name"+Tab -- /bin/sh

Logging
Log Command
k logs "pod name"
- this command can also be used to find out name of containers, if there are multiple container inside pod.
To get live stream of logs use -f option. Same as we add -f to tail : tail -f
The actual command is: 
k logs "pod name" "container name"
k logs "pod name" --all-containers=true
k logs command has some useful options like
--tail="N"
--since="1h"
-l for selector
--max-log-results="N" along with -l
-p for previous instance 
--time-stamps=true to add timestamp on each line. 
- Without logs, we can deploy sidecar container that generates logs. (1) stream application logs to their own stdout OR (2) run a logging agent. 
The Kubelet uses Docker logging driver to write container logs to local files. These logs are retrieved by k logs command. 
Tools
Elastic Search, Logstash, Kibana Stack (ELK), Fluentd
Fluent agent is daemonset. It feeds data to Elastic Search. Then one can visualize at Kibana dashboard. 

kubelet is a non-container component. its log found in "/var/log/journal" folder. It is access with command journalctl -a


Networking
- DNS, firewall, general connectivity, using standard linux command tools
Changes at switches, routes, or other network settings. Inter node networking. Look at all recent relevant / irrelevant infrastructure changes. 

Security
- RBAC,
- SELinux and AppArmor are important to check, for network-centric applications. 
Disable security and test again
Refer log of tools, for find out rule violation.
Fix possible multiple issues and then re-enable security


Other Points
- check node logs for errors. Make sure enough resources are allocated
- pod logs and state of pod
- troubleshoot pod DNS and pod network
- API calls between (1) controller < - > (2) kube API server
- inter node network issue: DNS, firewall

K8s Troubleshooting is similar to data center troubleshooting. Main differences are:
- See pod state: pending and error state
- See error in log files
- check resources are enough

Prometheus
counter, gauge, Histogram (server side), Summary (client side)

MetricsServer
It has only in memory DB. Now heapster is deprecated. 
With MetricsServer we can use command
k top node
k top pod

Jaeger
feature: 
- distributed context propagation
- transaction monitoring
- root cause analysis

Conformance Testing
Tool: 
1. Sonobuoy https://sonobuoy.io/ , https://github.com/vmware-tanzu/sonobuoy
2. https://github.com/cncf/k8s-conformance/blob/master/instructions.md
3. Heptio

It makes sure that
- workload on one distribution works on another. 
- API functions the same
- Minimum functionality exists. 

Misc

Inside pod, each container has its own restart count. We can check by running command k describe pod . Pod's restart count is summation of restart count of all containers. 

nslookup FQDN command is to check DNS query gets resolved or not. its configuration is /etc/resolv.conf (not resolve)

If pod is within service then it can have its DNS name as 
"hyphen separated IP address"."pod name"."service name"."namespace name".svc.cluster.local

If pod is part of deployment, then pod name is not the absolute name. 
If we change label of any pod in deployment, with --overwrite option then it will be removed from service, a new pod will be created. The removed pod's DNS entry will also get removed and new pod's DNS entry will be added. 

To add label key=value on k8s object (e.g. pod) command is:
k label 'object type' 'object name' key=value

To overwrite label key=value1 
k label 'object type' 'object name' --overwrite key=value1

To remove label with key
k label 'object type' 'object name' key-

There is no DNS entry for naked pod. There is no entry for pod, that belongs to daemon set. 

With wget command, we can check DNS resolution is working or not. 

Kube-proxy

We can check kube-proxy log by
k -n kube-system

8.1: 11,13


Reference: 

https://kubernetes.io/docs/concepts/cluster-administration/logging/
https://kubernetes.io/docs/tasks/debug-application-cluster/logging-elasticsearch-kibana/

https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-application/
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-cluster/
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-pod-replication-controller/
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/
https://github.com/kubernetes/kubernetes/issues

CKAD
https://github.com/dgkanatsios/CKAD-exercises
https://github.com/lucassha/CKAD-resources

CKAD : 7.Exposing Applications


ClusterIP range is defined via API server startup option --service-cluster-ip-range
NodePort range is defined in cluster configuration. 
ExternalName has no port, no selector, no endpoint. Redirection happen at DNS level. 
'kubectl proxy' command create a local service, to access ClusterIP. Useful for troubleshooting and development work. 

If we create service with LoadBalancer type on bare metal, and we have not deployed any load balancer then also we can access it as NodePort service. 

Grace Period

We should add 

--grace-period=0 --force
for immediate deletion

pod and deployment have terminationGracePeriodSeconds parameter in spec section. One cannot modify it runtime with kubectl edit command. We can modify it during deployment time only. 

KubeProxy Mode
* K8s 1.0 userpace mode
* K8s 1.1 iptables introduced
1.2 iptables become default
it allows max upto approx 5000 worker nodes. 
* K8s 1.9 ipvs. configurable load balancing algorithms
- round-robin
- shortest expected delay
- least connection
- others. 
IPVS kernel module shall be installed and running. 

KubeProxy Mode is configured as startup flag
mode=iptables, mode=IPVS, mode=userspace

Accessing an application with a service
k expose deploy "deploy name" --port=80 --type=NodePort

We can expose pod also as service, if pod has label
k expose pod "pod name" --port=80 --type=NodePort

The targetPort value by default set as value of port. 
port is part of endpoint: clusterIP:port
targetPort is opened at pod. 

Service can point to service in different namespace, to service which is outside cluster. 

External Name is used to access resource, external to cluster. Here selector is not used. 

Ingress resource
match : host and path both
rules: HTTP rules only to direct traffic. 

Usecase:
- Fan out to service
- name based hosting
- TLS
- load balancing
- expose low numbered port

Ingress Controller
Officially supported 
- nginx
- GCE
Community supported
- Traefik (pronounced Traffic) 
- HAProxy 
Other: 
- Contour
- Istio


Ingress controller can be deployed as daemonset. It has its own service account, ClusterRole and ClusterRoleBinding. ClusterRole includes (1) get (2) list (3) watch access to (1) service (2) ep (3) secrets and (4) ingress resource. 

Ingress resource has rules. This rule is kind of similar to (1) Ingress GW (2) Virtual Service (3) Destination Rule in Istio. 

Ingress resource is created in a same namespace where we have all the svc and deployment. 

Traefik has nice UI also accessible with default 8080 port

Questions
What is difference between containerPort and targetPort ? 






Prometheus


1. Run this script

KUBEADM_SYSTEMD_CONF=/etc/systemd/system/kubelet.service.d/10-kubeadm.conf
sed -e "/cadvisor-port=0/d" -i "$KUBEADM_SYSTEMD_CONF"
if ! grep -q "authentication-token-webhook=true" "$KUBEADM_SYSTEMD_CONF"; then
  sed -e "s/--authorization-mode=Webhook/--authentication-token-webhook=true --authorization-mode=Webhook/" -i "$KUBEADM_SYSTEMD_CONF"
fi
systemctl daemon-reload
systemctl restart kubelet

If needed step 2 to 6
2. change "--bind-address" value from 127.0.0.1 to 0.0.0.0 in below two files

/etc/kubernetes/manifests/kube-controller-manager.yaml
/etc/kubernetes/manifests/kube-scheduler.yaml

3. Run below two commands (optional) 

k delete pod kube-controller-manager-minikube -n kube-system
k delete pod kube-scheduler-minikube -n kube-system

4. run command
kubectl api-resources | grep deployment

5. run command
kubectl api-resources | grep DaemonSet
Do require changes

6. kubectl edit clusterrole system:node

add following section in - apiGroups:

- apiGroups:
  - ""
  resources:
  - nodes/proxy
  verbs:
  - get

if needed step 7
7 k edit clusterrole prometheus

add "nodes/metrics"

URLs

http://127.0.0.1:30990/ Prometheus 
http://127.0.0.1:31400/  Grafana
http://127.0.0.1:30993/ Prometheus Alert Manager
=========================

The Prometheus resource declaratively describes the desired state of a Prometheus deployment, while a ServiceMonitor describes the set of targets to be monitored by Prometheus.

Reference : 
https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/getting-started.md

The Prometheus Operator reconciles services called prometheus-operated and alertmanager-operated, which are used as governing Services for the StatefulSets. To perform this reconciliation

The custom resources that the Prometheus Operator introduces are:
  • Prometheus
  • ServiceMonitor
  • PodMonitor
  • Alertmanager
  • PrometheusRule

CKAD : 6.Security


Authentication
- X.509 client cert
- static token
- bearer or bootstrap token
- static password file
- service account
- OpenID connect tokens

Kube API server options
- basic-auth-file
- oidc-issuer-url
- token-auth-file
- authorization-webhook-config-file

Authorization
Kube API server option -authorization-mode
Values
- ABAC (API server additional option: - authorization-policy-file="file_name.json")
- RBAC
- Webhook
- AlwaysAllow
- AlwaysDeny

Authorization policy has user, group, namespace, verb (=operation) 

Role = Many rules
Rule = 
+ apiGroups
+ resources
+ resourceName
+ verb (= operation) 

RoleBinding maps (1) role and 
2.1 Service Account OR
2.2 User Account (mapped with context) OR
2.3 Group

The service account can be associated with pod or with deployment using serviceAccountName
It mounts a secret with name "service account name - token - random" at path 
/var/run/secrets/kubernetes.io/serviceaccount path. This path has 3 files.
1. ca.cert
2. namespace
3. token

All files stored secrets in plain text format
The values are stored at etcd in base64 encoded format
we can very with command
base64 -d "file name"
echo "plain text" | base64

securityContext

Mapped with pod or containers inside pod. E.g.
- UID of process
- Linux capabilities (for containers) 
- filesystem group
securityContext is cluster level rules 

It can present at container level or pod level or both level. 
If both are defined then securityContext at container level will override securityContext of container level. 
If we have pod level securityContext about runAsUser then nginx container is not able to start. This container want to create a path /var/cache/nginx/client_temp it is possible with runAsUser 1 by default. 

PodSecurityPolicieis (PSP) automate enforcement of securityContext. 

To enable PSP, configure admission-controller of the controller manager to have PSP. 

Network Policies 
supported by CNI plugins: Calico, Romana, Cilium, Kube-router, WeaveNet 
With network policy, all pod can communicate with all pods. So with network policy, let forbid communication from all pod to all pod and the allow as per requirement
- based on namespaceSelector
- based on podSelector (matchLabels) 
- to IP address + port
- from IP address + port

The policyTypes are Ingress and Egress. 

For WeveNet CNI plugin, we shall add annotation of network policy name at namespace. The flannel CNI plugin does not honor network policy

The Calico CNI shall be installed. Download latest calico.yaml file. Then install by command

k create -f calico.yaml

While starting minikube pass this additional flag

--extra-config=kubelet.network-plugin=cni --network-pugin=cni

We can add whitelist as below. 

ingress:
- from:
  - ipBlock:
      cidr: 192.168.0.0/16
  ports:
  - port: 80
    protocol: TCP



Allow all ingress traffic

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-all-ingress
spec:
  podSelector: {}
  ingress:
  - {}
  policyTypes:

  - Ingress

Allow all egress traffic

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-all-egress
spec:
  podSelector: {}
  egress:
  - {}
  policyTypes:

  - Egress

Deny all ingress traffic 

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
spec:
  podSelector: {}
  policyTypes:

  - Ingress

Deny all egress traffic

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-egress
spec:
  podSelector: {}
  policyTypes:

  - Egress

All policies are add / union. So there is no chance of conflict. 
Whitelist can be keep growing. 

Capabilities:

We can run this command inside container

grep Cap /proc/1/status

CapInh: 00000000a80425fb 
CapPrm: 0000000000000000 
CapEff: 0000000000000000 
CapBnd: 00000000a80425fb 
CapAmb: 0000000000000000

The capability can be decoded with

capsh --decode=00000000a80425fb

Capability can be added under securityContext

capabilities: 
  add: ["NET_ADMIN", "SYS_TIME", "MAC_ADMIN"]

References: 
https://github.com/kelseyhightower/kubernetes-the-hard-way
https://kubernetes.io/docs/reference/access-authn-authz/controlling-access/
https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/
https://kubernetes.io/docs/reference/access-authn-authz/abac/#examples
https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
https://github.com/kubernetes/examples/blob/master/staging/podsecuritypolicy/rbac/README.md
https://github.com/ahmetb/kubernetes-network-policy-recipes
https://kubernetes.io/docs/concepts/services-networking/network-policies/

CKAD : 5. Deployment Configuration


K8s volume share lifetime of pod, not lifetime of containers within. If container restart, then data available to new container. Volume is available to multiple pods. Volume's life can be longer than life of pod. If multiple pods have write access mode, then data corruption is possible unless there is locking mechanism. 

Pod has Volume. Single volume can be available to multiple container within pod using VolumeMount (directory or folder path). (If VolumeMount has different value, then within each container, the path will be different.) So it can be used for intra pod communication.

28+ volume types available
- rbd for Ceph, GlusterFS: block storage. 
- NFS, iSCSI : for multiple reading 
- gcePersistentDisk 
- Others: azureDisk, azureFile, csi, downardAPI, fc (fiber channel) it has raw block volume, Flocker, gitRepo, Local, projectd, portworxVolume, quobyte, scaleIO, secret, StorageOS, vsphereVolume, persistentVolumeClaim, awsElasticStore, CepthFS, Cinder, FlexVolume, GCEPersistentDisk, Quobyte, 

PV K8s PV has longer lifetime than pod. Pod can claim PV as PVC. 
PV is cluster scoped. 

Volume plugin (as per CSI), are in-tree. i.e. they compiled along with k8s binaries
Out-of-tree object allow storage vendors to develop single driver and plugin can be containerized. it needs elevated access to host node so it is security risk. 

Pod can claim PV as PVC. Multiple pods can share PVC. Other pod cannot use data inside volume.

Pod:Voulme:PVC
Then it is mounted in container. 

Access Mode (accessModes) : 
1. RWO (Read Write Once)
2. ROX (Read Only Many)
3. RWx (Read Write Many)
for node

1. cluster group all volumes with same access mode. 
2. then sort volume by size. smallest to largest. 

PVC request with parameter (1) access mode, (2) volume size (3) storage class is optional parameter. 

PVC is namespaced scope

phase
1. Provisioning : PV

PV can be of type (1) empty or (2) pre-populated volume. 

1.1. emptyDir
1.2. hostPath : mounts resource from host node file system
- directory
- file
- socket
- character device
- block device
path: 
* All types have its own configuration settings. 

The resource must exist on host node. 
- DirecotryOrCreate
- FileOrCreate

When PV is created using NFS storage, we must mention, server and path both

  nfs:
    server: nfs01
    # Exported path of your NFS server

    path: "/mysql"

2. Binding: PVC

k get pv
Here CLAIM column indicate name of PVC
Other columns are : Capacity, Access Mode, Reclaim policy, status = Bound | Available, Storageclass. StorageClass = manual if PV is creating using hostPath. 

3. use

If we create a file in path, which is mounted in container pointing to PVC of pod. Then we delete all pods, who have this PVC, then create again, the file is as it is. Because PVC is not deleted. 

4. Release: when PVC is deleted
5. reclaim as per persistentVolumeReclaimPolicy
- Retain
- Delete
- Recycle 

With retain, the PV is not available for any other PVC, even after old PVC is deleted. 

StorageClass is used for dynamic provisioning. We no need to define PV. PV is created automatically. At PVC we can use storageClassName to link PVC with storage class. 

Secret 

k create secret generic "name of secret" --from-literal=key=value

To encrypt secret, use EncryptionConfiguration object with (1) key and (2) provider identity. The kube-apiserver should be started with --encryption-provider-config flag. Example provider: aescbc, ksm. Other tools Helm Secrets, HashiCorp Vault. Then we need to recreate all existing secrets. 

Max size of secret = 1 MB
We can have many secret. No limit
They will stored in tempfs of host node. 

It can be mounted in container as environment variable or file. It is stored as plain text inside container. 

Inside pod it is referred as: secretKeyRef

Secret has two maps
1. data
2. stringData: It is a write-only convenience field. Not display in output of command kubectl get secret mysecret -o yaml

If any key is specified at both data and stringData, then value specified at stringData is used and value specified at data will be ignored

When we mount secret, we can mount individual key

    volumeMounts:
    - name: foo
      mountPath: "/etc/foo"
      readOnly: true
  volumes:
  - name: foo
    secret:
      secretName: mysecret
      items:
      - key: username

        path: my-group/my-username

OR all keys of secret. 

    volumeMounts:
    - name: foo
      mountPath: "/etc/foo"
      readOnly: true
  volumes:
  - name: foo
    secret:

      secretName: mysecret

If we specify file permission in JSON, it should be decimal. In YAML it should be octal. 

In K8s 1.18 Alpha feature cm and secret both can be immutable, if they specify with "immutable: true"

Secret resources reside in a namespace. Secrets can only be referenced by Pods in that same namespace.

secret->data->.secret-file It will create hidden file, when mounted. 

The ability to watch and list all secrets in a cluster should be reserved for only the most privileged, system-level components.

Configmap:
1. Key value pair
2. Plain config file in any format

Data come from
1. singe file
2. collection of files in a single directory. 
3. literal value (use --from-literal with command k create cm 'cm name' --from-liternal)

Inside pod
1. env var
2. volume
3. a file in volume (a filename is key, file content is value)
4. in pod command
5. set file name and its access mode in volume using ConfigMaps
ConfigMaps can be used by system components and controllers. 

Inside pod it is referred as: configMapKeyRef

If ConfigMap is defined as K,V pairs. Pod - > volume is linked with that ConfigMap. The volume is mounted in container at specific path. Inside that directory, there are multiple files as per K in config map and the content of file is V

If we want to mount env variable inside container then no need to modify anything in pod section. Suppose we have cm with name cmone and its content KONE=vone. Another cm with name cmtwo and its content is KA=va and KB=vb

Now to load only one env variable from cmone, we shall use: 

envFrom:
      - configMapRef:
          name: cmone

All K,v pairs from cmone will be loaded to the container as env variables. env name = k and env value = v

If we want to load individual k,v pair use below syntax

        - name: KAA
          valueFrom:
            configMapKeyRef:
              name: cmtwo
              key: KA
        - name: KBB
          valueFrom:
            configMapKeyRef:
              name: cmtwo
              key: KB

Here we can change env variable name also. 

Same thing applies for secret

envFrom:
      - secretRef:

          name: sone

and

        - name: KAA
          valueFrom:
            secretKeyRef:
              name: stwo
              key: KA
        - name: KBB
          valueFrom:
            secretKeyRef:
              name: stwo

              key: KB

We can also define a readiness probe with command ls /path/for/volumeMount. So when the configmap is loaded to container then only the container becomes ready. 

Deployment Status


k get deploy grafana -o yaml
At the end Status is present. 
1. availableReplicas
2. observedGeneration : For rollout and rollback situation. This parameter specifies current revision number. 

Rolling update by
1. changing replica
k scale deploy "name of deployment" --replicas="new count for replica Count"
2. edit deployment and change container image with other version

A deployment has strategy with value “Recreate” or “RollingUpdate”.

We can use --record option to set annotation in deployment. then with command we can rollback. The --record option put annotation in CHANGE-CAUSE column. Once we use --record, then same annotation is continued to future upgrade also, if we do not use --record in future. 

The rollback is possible with "rollout undo' command as below. Suppose current version is 5 and if you rollback to 2. Then new version 6 is created. version 2 is removed. record of version 2 is automatically added to 6. if you are at version 5, and you rollback with 'rollout undo' command then we are going back to revision 4. new revision is created as revision 6. So 4 is renamed as 6. You can see revisions 1, 2, 3, 5, 6

k rollout undo deploy "name of deployment" 
rollback to specific version with option --to-revision='number'
k rollout undo deploy "name of deployment" --to-revision=1

When we create deployment, replicaset A is automatically created. 
When we rollout, new replica set B is created. B's replica count increase from 0 to rpelica count of old A replicaset. A's replicacount will decrease from original value to 0. If we rollback (rollout undo) then reverse will happen. 

if we use 'k rolling-update' command, then update will be stopped if client is closed.

We can see status
k rollout status deploy "name of deployment"

We can see all events about new pod creation and old pod termination with command 
k describe deploy "name of deployment"

we can pause and resume deployment.

k rollout pause deploy "name of deployment"
k rollout resume deploy "name of deployment" 

k rollout history deploy "name of deployment"
provides all the revisions. 

We can see specific revision 
k rollout history deploy "name of deployment" --revision=1

We can see diff
k rollout history deploy "name of deployment" --revision=1 > one.out
k rollout history deploy "name of deployment" --revision=1 > two.out
diff one.out two.out

The changes are:
1. line 1 has revision number
2. pod-template-hash
3. Image

If we change replica count, then new revision is not created. 

We can trigger rollout with command 

k set image deploy "name of deployment" "name of container"=nginx:1.9.1

Real-Time Kubernetes Debugging, Monitoring and Alerting with BotKube


Ref: https://www.meetup.com/k8s-cloudnative-online/events/269706847

My take away points

Real-Time Kubernetes Debugging

1 Describe
* describe pod and see event
* describe service and see endpoint
2 Events
* kubectl get events
gives events in that namespace
* kubectl get events --all-namespace
3 exec with -it (interactive terminal) for pod. Use -c to run inside specific container. 
4. kubectl debug is K8s 1.18 alpha feature. First enable alpha feature. We can dynamically insert container in pod, for debug purpose. 
5. Logs
* kubectl logs, 
Container shall expose its logs using stdout
* kubectl logs --previous 
provides log for when it run last time. It is useful if any pod is keep restating. 

Grafana Dashboard: https://grafana.com/grafana/dashboards?orderBy=name&direction=asc

For K8s + Prometheus setup, relevant dashboards are : https://grafana.com/grafana/dashboards?dataSource=prometheus&direction=asc&orderBy=name&search=kubernetes

Select one dashboard. import in you Grafana with its ID. OR one can download JSON file and import the JSON file

 BotKube

BotKube is for debugging and monitoring
it can run kubectl also

kubectl get clusterroles
give all who have cluster role

BotKube can work with multiple cluster

We can add custom filter in GoLang to BotKube: https://www.botkube.io/filters/ 

BotKube next release
- It will be integrated with Microsoft Team
- It will have json path based status:field monitoring

Reference: 
https://github.com/infracloudio/botkube
https://www.botkube.io 

From Live Chat

* if service has two label and pod has only 1 out of 2 labels then also that pod is part of service due to OR condition.
* deployments are not associated with any node, so one cannot get node IP address from deployment
* Better go with Prometheus Operator Model that is a part of Kube-Prometheus project. It'll install node-exporter that is a daemonset. It'll act as a agent for nodes
* Prometheus can work with docker container also, without K8s
* splunk and kibana both can be combined. 
https://blogs.halodoc.io/production-grade-eks-logging-with-efk-via-sidecar/
* how scaling is designed for botkube ? once is a while, there can be many events from K8s cluster . Most of the time, there are no events from k8s cluster. 
Scaling is using K8s HPA only. Even we have so many events, each event processing needs little CPU. 

Meetup Recording : https://www.youtube.com/watch?v=bGnQep5bY6c