CKAD : 8.Troubleshooting


Tools
* busybox container has shell
* DNS configuration file
* dig (for DNS lookup)
* tcpdump
* Prometheus
* Fluentd

Commands
k exec -ti "deployment name"+Tab -- /bin/sh

Logging
Log Command
k logs "pod name"
- this command can also be used to find out name of containers, if there are multiple container inside pod.
To get live stream of logs use -f option. Same as we add -f to tail : tail -f
The actual command is: 
k logs "pod name" "container name"
k logs "pod name" --all-containers=true
k logs command has some useful options like
--tail="N"
--since="1h"
-l for selector
--max-log-results="N" along with -l
-p for previous instance 
--time-stamps=true to add timestamp on each line. 
- Without logs, we can deploy sidecar container that generates logs. (1) stream application logs to their own stdout OR (2) run a logging agent. 
The Kubelet uses Docker logging driver to write container logs to local files. These logs are retrieved by k logs command. 
Tools
Elastic Search, Logstash, Kibana Stack (ELK), Fluentd
Fluent agent is daemonset. It feeds data to Elastic Search. Then one can visualize at Kibana dashboard. 

kubelet is a non-container component. its log found in "/var/log/journal" folder. It is access with command journalctl -a


Networking
- DNS, firewall, general connectivity, using standard linux command tools
Changes at switches, routes, or other network settings. Inter node networking. Look at all recent relevant / irrelevant infrastructure changes. 

Security
- RBAC,
- SELinux and AppArmor are important to check, for network-centric applications. 
Disable security and test again
Refer log of tools, for find out rule violation.
Fix possible multiple issues and then re-enable security


Other Points
- check node logs for errors. Make sure enough resources are allocated
- pod logs and state of pod
- troubleshoot pod DNS and pod network
- API calls between (1) controller < - > (2) kube API server
- inter node network issue: DNS, firewall

K8s Troubleshooting is similar to data center troubleshooting. Main differences are:
- See pod state: pending and error state
- See error in log files
- check resources are enough

Prometheus
counter, gauge, Histogram (server side), Summary (client side)

MetricsServer
It has only in memory DB. Now heapster is deprecated. 
With MetricsServer we can use command
k top node
k top pod

Jaeger
feature: 
- distributed context propagation
- transaction monitoring
- root cause analysis

Conformance Testing
Tool: 
1. Sonobuoy https://sonobuoy.io/ , https://github.com/vmware-tanzu/sonobuoy
2. https://github.com/cncf/k8s-conformance/blob/master/instructions.md
3. Heptio

It makes sure that
- workload on one distribution works on another. 
- API functions the same
- Minimum functionality exists. 

Misc

Inside pod, each container has its own restart count. We can check by running command k describe pod . Pod's restart count is summation of restart count of all containers. 

nslookup FQDN command is to check DNS query gets resolved or not. its configuration is /etc/resolv.conf (not resolve)

If pod is within service then it can have its DNS name as 
"hyphen separated IP address"."pod name"."service name"."namespace name".svc.cluster.local

If pod is part of deployment, then pod name is not the absolute name. 
If we change label of any pod in deployment, with --overwrite option then it will be removed from service, a new pod will be created. The removed pod's DNS entry will also get removed and new pod's DNS entry will be added. 

To add label key=value on k8s object (e.g. pod) command is:
k label 'object type' 'object name' key=value

To overwrite label key=value1 
k label 'object type' 'object name' --overwrite key=value1

To remove label with key
k label 'object type' 'object name' key-

There is no DNS entry for naked pod. There is no entry for pod, that belongs to daemon set. 

With wget command, we can check DNS resolution is working or not. 

Kube-proxy

We can check kube-proxy log by
k -n kube-system

8.1: 11,13


Reference: 

https://kubernetes.io/docs/concepts/cluster-administration/logging/
https://kubernetes.io/docs/tasks/debug-application-cluster/logging-elasticsearch-kibana/

https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-application/
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-cluster/
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-pod-replication-controller/
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/
https://github.com/kubernetes/kubernetes/issues

CKAD
https://github.com/dgkanatsios/CKAD-exercises
https://github.com/lucassha/CKAD-resources

0 comments:

Post a Comment