CKAD : 8.Troubleshooting
Tools
* busybox container has shell
* DNS configuration file
* dig (for DNS lookup)
* tcpdump
* Prometheus
* Fluentd
Commands
k exec -ti "deployment name"+Tab -- /bin/sh
Logging
Log Command
k logs "pod name"
- this command can also be used to find out name of containers, if there are multiple container inside pod.
To get live stream of logs use -f option. Same as we add -f to tail : tail -f
The actual command is:
k logs "pod name" "container name"
k logs "pod name" --all-containers=true
k logs command has some useful options like
--tail="N"
--since="1h"
-l for selector
--max-log-results="N" along with -l
-p for previous instance
--time-stamps=true to add timestamp on each line.
- Without logs, we can deploy sidecar container that generates logs. (1) stream application logs to their own stdout OR (2) run a logging agent.
The Kubelet uses Docker logging driver to write container logs to local files. These logs are retrieved by k logs command.
Tools
Elastic Search, Logstash, Kibana Stack (ELK), Fluentd
Fluent agent is daemonset. It feeds data to Elastic Search. Then one can visualize at Kibana dashboard.
kubelet is a non-container component. its log found in "/var/log/journal" folder. It is access with command journalctl -a
Networking
- DNS, firewall, general connectivity, using standard linux command tools
Changes at switches, routes, or other network settings. Inter node networking. Look at all recent relevant / irrelevant infrastructure changes.
Security
- RBAC,
- SELinux and AppArmor are important to check, for network-centric applications.
Disable security and test again
Refer log of tools, for find out rule violation.
Fix possible multiple issues and then re-enable security
Other Points
- check node logs for errors. Make sure enough resources are allocated
- pod logs and state of pod
- troubleshoot pod DNS and pod network
- API calls between (1) controller < - > (2) kube API server
- inter node network issue: DNS, firewall
K8s Troubleshooting is similar to data center troubleshooting. Main differences are:
- See pod state: pending and error state
- See error in log files
- check resources are enough
Prometheus
counter, gauge, Histogram (server side), Summary (client side)
MetricsServer
It has only in memory DB. Now heapster is deprecated.
With MetricsServer we can use command
k top node
k top pod
Jaeger
feature:
- distributed context propagation
- transaction monitoring
- root cause analysis
Conformance Testing
Tool:
1. Sonobuoy https://sonobuoy.io/ , https://github.com/vmware-tanzu/sonobuoy
2. https://github.com/cncf/k8s-conformance/blob/master/instructions.md
3. Heptio
It makes sure that
- workload on one distribution works on another.
- API functions the same
- Minimum functionality exists.
Misc
Inside pod, each container has its own restart count. We can check by running command k describe pod . Pod's restart count is summation of restart count of all containers.
nslookup FQDN command is to check DNS query gets resolved or not. its configuration is /etc/resolv.conf (not resolve)
If pod is within service then it can have its DNS name as
"hyphen separated IP address"."pod name"."service name"."namespace name".svc.cluster.local
If pod is part of deployment, then pod name is not the absolute name.
If we change label of any pod in deployment, with --overwrite option then it will be removed from service, a new pod will be created. The removed pod's DNS entry will also get removed and new pod's DNS entry will be added.
To add label key=value on k8s object (e.g. pod) command is:
k label 'object type' 'object name' key=value
To overwrite label key=value1
k label 'object type' 'object name' --overwrite key=value1
To remove label with key
k label 'object type' 'object name' key-
There is no DNS entry for naked pod. There is no entry for pod, that belongs to daemon set.
With wget command, we can check DNS resolution is working or not.
Kube-proxy
We can check kube-proxy log by
k -n kube-system
8.1: 11,13
Reference:
https://kubernetes.io/docs/concepts/cluster-administration/logging/
https://kubernetes.io/docs/tasks/debug-application-cluster/logging-elasticsearch-kibana/
https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-application/
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-cluster/
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-pod-replication-controller/
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/
https://github.com/kubernetes/kubernetes/issues
CKAD
https://github.com/dgkanatsios/CKAD-exercises
https://github.com/lucassha/CKAD-resources
* busybox container has shell
* DNS configuration file
* dig (for DNS lookup)
* tcpdump
* Prometheus
* Fluentd
Commands
k exec -ti "deployment name"+Tab -- /bin/sh
Logging
Log Command
k logs "pod name"
- this command can also be used to find out name of containers, if there are multiple container inside pod.
To get live stream of logs use -f option. Same as we add -f to tail : tail -f
The actual command is:
k logs "pod name" "container name"
k logs "pod name" --all-containers=true
k logs command has some useful options like
--tail="N"
--since="1h"
-l for selector
--max-log-results="N" along with -l
-p for previous instance
--time-stamps=true to add timestamp on each line.
- Without logs, we can deploy sidecar container that generates logs. (1) stream application logs to their own stdout OR (2) run a logging agent.
The Kubelet uses Docker logging driver to write container logs to local files. These logs are retrieved by k logs command.
Tools
Elastic Search, Logstash, Kibana Stack (ELK), Fluentd
Fluent agent is daemonset. It feeds data to Elastic Search. Then one can visualize at Kibana dashboard.
kubelet is a non-container component. its log found in "/var/log/journal" folder. It is access with command journalctl -a
Networking
- DNS, firewall, general connectivity, using standard linux command tools
Changes at switches, routes, or other network settings. Inter node networking. Look at all recent relevant / irrelevant infrastructure changes.
Security
- RBAC,
- SELinux and AppArmor are important to check, for network-centric applications.
Disable security and test again
Refer log of tools, for find out rule violation.
Fix possible multiple issues and then re-enable security
Other Points
- check node logs for errors. Make sure enough resources are allocated
- pod logs and state of pod
- troubleshoot pod DNS and pod network
- API calls between (1) controller < - > (2) kube API server
- inter node network issue: DNS, firewall
K8s Troubleshooting is similar to data center troubleshooting. Main differences are:
- See pod state: pending and error state
- See error in log files
- check resources are enough
Prometheus
counter, gauge, Histogram (server side), Summary (client side)
MetricsServer
It has only in memory DB. Now heapster is deprecated.
With MetricsServer we can use command
k top node
k top pod
Jaeger
feature:
- distributed context propagation
- transaction monitoring
- root cause analysis
Conformance Testing
Tool:
1. Sonobuoy https://sonobuoy.io/ , https://github.com/vmware-tanzu/sonobuoy
2. https://github.com/cncf/k8s-conformance/blob/master/instructions.md
3. Heptio
It makes sure that
- workload on one distribution works on another.
- API functions the same
- Minimum functionality exists.
Misc
Inside pod, each container has its own restart count. We can check by running command k describe pod . Pod's restart count is summation of restart count of all containers.
nslookup FQDN command is to check DNS query gets resolved or not. its configuration is /etc/resolv.conf (not resolve)
If pod is within service then it can have its DNS name as
"hyphen separated IP address"."pod name"."service name"."namespace name".svc.cluster.local
If pod is part of deployment, then pod name is not the absolute name.
If we change label of any pod in deployment, with --overwrite option then it will be removed from service, a new pod will be created. The removed pod's DNS entry will also get removed and new pod's DNS entry will be added.
To add label key=value on k8s object (e.g. pod) command is:
k label 'object type' 'object name' key=value
To overwrite label key=value1
k label 'object type' 'object name' --overwrite key=value1
To remove label with key
k label 'object type' 'object name' key-
There is no DNS entry for naked pod. There is no entry for pod, that belongs to daemon set.
With wget command, we can check DNS resolution is working or not.
Kube-proxy
We can check kube-proxy log by
k -n kube-system
8.1: 11,13
Reference:
https://kubernetes.io/docs/concepts/cluster-administration/logging/
https://kubernetes.io/docs/tasks/debug-application-cluster/logging-elasticsearch-kibana/
https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-application/
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-cluster/
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-pod-replication-controller/
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/
https://github.com/kubernetes/kubernetes/issues
CKAD
https://github.com/dgkanatsios/CKAD-exercises
https://github.com/lucassha/CKAD-resources
CKAD : 7.Exposing Applications
ClusterIP range is defined via API server startup option --service-cluster-ip-range
NodePort range is defined in cluster configuration.
ExternalName has no port, no selector, no endpoint. Redirection happen at DNS level.
'kubectl proxy' command create a local service, to access ClusterIP. Useful for troubleshooting and development work.
If we create service with LoadBalancer type on bare metal, and we have not deployed any load balancer then also we can access it as NodePort service.
Grace Period
We should add
pod and deployment have terminationGracePeriodSeconds parameter in spec section. One cannot modify it runtime with kubectl edit command. We can modify it during deployment time only.
KubeProxy Mode
* K8s 1.0 userpace mode
* K8s 1.1 iptables introduced
1.2 iptables become default
it allows max upto approx 5000 worker nodes.
* K8s 1.9 ipvs. configurable load balancing algorithms
- round-robin
- shortest expected delay
- least connection
- others.
IPVS kernel module shall be installed and running.
KubeProxy Mode is configured as startup flag
mode=iptables, mode=IPVS, mode=userspace
Accessing an application with a service
k expose deploy "deploy name" --port=80 --type=NodePort
We can expose pod also as service, if pod has label
k expose pod "pod name" --port=80 --type=NodePort
The targetPort value by default set as value of port.
port is part of endpoint: clusterIP:port
targetPort is opened at pod.
Service can point to service in different namespace, to service which is outside cluster.
External Name is used to access resource, external to cluster. Here selector is not used.
Ingress resource
match : host and path both
rules: HTTP rules only to direct traffic.
Usecase:
- Fan out to service
- name based hosting
- TLS
- load balancing
- expose low numbered port
Ingress Controller
Officially supported
- nginx
- GCE
Community supported
- Traefik (pronounced Traffic)
- HAProxy
Other:
- Contour
- Istio
Ingress controller can be deployed as daemonset. It has its own service account, ClusterRole and ClusterRoleBinding. ClusterRole includes (1) get (2) list (3) watch access to (1) service (2) ep (3) secrets and (4) ingress resource.
Ingress resource has rules. This rule is kind of similar to (1) Ingress GW (2) Virtual Service (3) Destination Rule in Istio.
Ingress resource is created in a same namespace where we have all the svc and deployment.
Traefik has nice UI also accessible with default 8080 port
Questions
What is difference between containerPort and targetPort ?
NodePort range is defined in cluster configuration.
ExternalName has no port, no selector, no endpoint. Redirection happen at DNS level.
'kubectl proxy' command create a local service, to access ClusterIP. Useful for troubleshooting and development work.
If we create service with LoadBalancer type on bare metal, and we have not deployed any load balancer then also we can access it as NodePort service.
Grace Period
We should add
--grace-period=0 --force
for immediate deletionpod and deployment have terminationGracePeriodSeconds parameter in spec section. One cannot modify it runtime with kubectl edit command. We can modify it during deployment time only.
* K8s 1.0 userpace mode
* K8s 1.1 iptables introduced
1.2 iptables become default
it allows max upto approx 5000 worker nodes.
* K8s 1.9 ipvs. configurable load balancing algorithms
- round-robin
- shortest expected delay
- least connection
- others.
IPVS kernel module shall be installed and running.
KubeProxy Mode is configured as startup flag
mode=iptables, mode=IPVS, mode=userspace
Accessing an application with a service
k expose deploy "deploy name" --port=80 --type=NodePort
We can expose pod also as service, if pod has label
k expose pod "pod name" --port=80 --type=NodePort
The targetPort value by default set as value of port.
port is part of endpoint: clusterIP:port
targetPort is opened at pod.
Service can point to service in different namespace, to service which is outside cluster.
External Name is used to access resource, external to cluster. Here selector is not used.
Ingress resource
match : host and path both
rules: HTTP rules only to direct traffic.
Usecase:
- Fan out to service
- name based hosting
- TLS
- load balancing
- expose low numbered port
Ingress Controller
Officially supported
- nginx
- GCE
Community supported
- Traefik (pronounced Traffic)
- HAProxy
Other:
- Contour
- Istio
Ingress controller can be deployed as daemonset. It has its own service account, ClusterRole and ClusterRoleBinding. ClusterRole includes (1) get (2) list (3) watch access to (1) service (2) ep (3) secrets and (4) ingress resource.
Ingress resource has rules. This rule is kind of similar to (1) Ingress GW (2) Virtual Service (3) Destination Rule in Istio.
Ingress resource is created in a same namespace where we have all the svc and deployment.
Traefik has nice UI also accessible with default 8080 port
Questions
What is difference between containerPort and targetPort ?
Prometheus
1. Run this script
If needed step 2 to 6
2. change "--bind-address" value from 127.0.0.1 to 0.0.0.0 in below two files
/etc/kubernetes/manifests/kube-controller-manager.yaml
/etc/kubernetes/manifests/kube-scheduler.yaml
3. Run below two commands (optional)
k delete pod kube-controller-manager-minikube -n kube-system
KUBEADM_SYSTEMD_CONF=/etc/systemd/system/kubelet.service.d/10-kubeadm.conf
sed -e "/cadvisor-port=0/d" -i "$KUBEADM_SYSTEMD_CONF"
if ! grep -q "authentication-token-webhook=true" "$KUBEADM_SYSTEMD_CONF"; then
sed -e "s/--authorization-mode=Webhook/--authentication-token-webhook=true --authorization-mode=Webhook/" -i "$KUBEADM_SYSTEMD_CONF"
fi
systemctl daemon-reload
systemctl restart kubelet
2. change "--bind-address" value from 127.0.0.1 to 0.0.0.0 in below two files
/etc/kubernetes/manifests/kube-controller-manager.yaml
/etc/kubernetes/manifests/kube-scheduler.yaml
3. Run below two commands (optional)
k delete pod kube-controller-manager-minikube -n kube-system
k delete pod kube-scheduler-minikube -n kube-system
4. run command
kubectl api-resources | grep deployment
do require changes in deployment section of YAML as per https://stackoverflow.com/questions/58481850/no-matches-for-kind-deployment-in-version-extensions-v1beta1
5. run command
kubectl api-resources | grep DaemonSet
Do require changes
6. kubectl edit clusterrole system:node
add following section in - apiGroups:
- apiGroups:
- ""
resources:
- nodes/proxy
verbs:
- get
if needed step 7
7 k edit clusterrole prometheus
6. kubectl edit clusterrole system:node
add following section in - apiGroups:
- apiGroups:
- ""
resources:
- nodes/proxy
verbs:
- get
if needed step 7
7 k edit clusterrole prometheus
add "nodes/metrics"
URLs
http://127.0.0.1:30990/ Prometheus
http://127.0.0.1:31400/ Grafana
http://127.0.0.1:30993/ Prometheus Alert Manager
=========================
The Prometheus resource declaratively describes the desired state of a Prometheus deployment, while a ServiceMonitor describes the set of targets to be monitored by Prometheus.
Reference :
https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/getting-started.md
The Prometheus Operator reconciles
URLs
http://127.0.0.1:30990/ Prometheus
http://127.0.0.1:31400/ Grafana
http://127.0.0.1:30993/ Prometheus Alert Manager
=========================
The Prometheus resource declaratively describes the desired state of a Prometheus deployment, while a ServiceMonitor describes the set of targets to be monitored by Prometheus.
Reference :
https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/getting-started.md
The Prometheus Operator reconciles
services
called prometheus-operated
and alertmanager-operated
, which are used as governing Service
s for the StatefulSet
s. To perform this reconciliation
The custom resources that the Prometheus Operator introduces are:
Prometheus
ServiceMonitor
PodMonitor
Alertmanager
PrometheusRule
CKAD : 6.Security
Authentication
- X.509 client cert
- static token
- bearer or bootstrap token
- static password file
- service account
- OpenID connect tokens
Kube API server options
- basic-auth-file
- oidc-issuer-url
- token-auth-file
- authorization-webhook-config-file
Authorization
Kube API server option -authorization-mode
Values
- ABAC (API server additional option: - authorization-policy-file="file_name.json")
- RBAC
- Webhook
- AlwaysAllow
- AlwaysDeny
Authorization policy has user, group, namespace, verb (=operation)
Role = Many rules
Rule =
+ apiGroups
+ resources
+ resourceName
+ verb (= operation)
RoleBinding maps (1) role and
2.1 Service Account OR
2.2 User Account (mapped with context) OR
2.3 Group
The service account can be associated with pod or with deployment using serviceAccountName
It mounts a secret with name "service account name - token - random" at path
/var/run/secrets/kubernetes.io/serviceaccount path. This path has 3 files.
1. ca.cert
2. namespace
3. token
All files stored secrets in plain text format
The values are stored at etcd in base64 encoded format
we can very with command
base64 -d "file name"
echo "plain text" | base64
securityContext
Mapped with pod or containers inside pod. E.g.
- UID of process
- Linux capabilities (for containers)
- filesystem group
securityContext is cluster level rules
It can present at container level or pod level or both level.
If both are defined then securityContext at container level will override securityContext of container level.
If we have pod level securityContext about runAsUser then nginx container is not able to start. This container want to create a path /var/cache/nginx/client_temp it is possible with runAsUser 1 by default.
PodSecurityPolicieis (PSP) automate enforcement of securityContext.
To enable PSP, configure admission-controller of the controller manager to have PSP.
Network Policies
supported by CNI plugins: Calico, Romana, Cilium, Kube-router, WeaveNet
With network policy, all pod can communicate with all pods. So with network policy, let forbid communication from all pod to all pod and the allow as per requirement
- based on namespaceSelector
- based on podSelector (matchLabels)
- to IP address + port
- from IP address + port
The policyTypes are Ingress and Egress.
For WeveNet CNI plugin, we shall add annotation of network policy name at namespace. The flannel CNI plugin does not honor network policy
The Calico CNI shall be installed. Download latest calico.yaml file. Then install by command
k create -f calico.yaml
While starting minikube pass this additional flag
--extra-config=kubelet.network-plugin=cni --network-pugin=cni
We can add whitelist as below.
- ipBlock:
cidr: 192.168.0.0/16
ports:
- port: 80
Allow all ingress traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-all-ingress
spec:
podSelector: {}
ingress:
- {}
policyTypes:
- Ingress
Allow all egress traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-all-egress
spec:
podSelector: {}
egress:
- {}
policyTypes:
- Egress
Deny all ingress traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
spec:
podSelector: {}
policyTypes:
- Ingress
Deny all egress traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-egress
spec:
podSelector: {}
policyTypes:
- Egress
All policies are add / union. So there is no chance of conflict.
Whitelist can be keep growing.
Capabilities:
We can run this command inside container
grep Cap /proc/1/status
CapInh: 00000000a80425fb
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000
The capability can be decoded with
capsh --decode=00000000a80425fb
Capability can be added under securityContext
capabilities:
add: ["NET_ADMIN", "SYS_TIME", "MAC_ADMIN"]
References:
https://github.com/kelseyhightower/kubernetes-the-hard-way
https://kubernetes.io/docs/reference/access-authn-authz/controlling-access/
https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/
https://kubernetes.io/docs/reference/access-authn-authz/abac/#examples
https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
https://github.com/kubernetes/examples/blob/master/staging/podsecuritypolicy/rbac/README.md
https://github.com/ahmetb/kubernetes-network-policy-recipes
https://kubernetes.io/docs/concepts/services-networking/network-policies/
- X.509 client cert
- static token
- bearer or bootstrap token
- static password file
- service account
- OpenID connect tokens
Kube API server options
- basic-auth-file
- oidc-issuer-url
- token-auth-file
- authorization-webhook-config-file
Authorization
Kube API server option -authorization-mode
Values
- ABAC (API server additional option: - authorization-policy-file="file_name.json")
- RBAC
- Webhook
- AlwaysAllow
- AlwaysDeny
Authorization policy has user, group, namespace, verb (=operation)
Role = Many rules
Rule =
+ apiGroups
+ resources
+ resourceName
+ verb (= operation)
RoleBinding maps (1) role and
2.1 Service Account OR
2.2 User Account (mapped with context) OR
2.3 Group
The service account can be associated with pod or with deployment using serviceAccountName
It mounts a secret with name "service account name - token - random" at path
/var/run/secrets/kubernetes.io/serviceaccount path. This path has 3 files.
1. ca.cert
2. namespace
3. token
All files stored secrets in plain text format
The values are stored at etcd in base64 encoded format
we can very with command
base64 -d "file name"
echo "plain text" | base64
securityContext
Mapped with pod or containers inside pod. E.g.
- UID of process
- Linux capabilities (for containers)
- filesystem group
securityContext is cluster level rules
It can present at container level or pod level or both level.
If both are defined then securityContext at container level will override securityContext of container level.
If we have pod level securityContext about runAsUser then nginx container is not able to start. This container want to create a path /var/cache/nginx/client_temp it is possible with runAsUser 1 by default.
PodSecurityPolicieis (PSP) automate enforcement of securityContext.
To enable PSP, configure admission-controller of the controller manager to have PSP.
Network Policies
supported by CNI plugins: Calico, Romana, Cilium, Kube-router, WeaveNet
With network policy, all pod can communicate with all pods. So with network policy, let forbid communication from all pod to all pod and the allow as per requirement
- based on namespaceSelector
- based on podSelector (matchLabels)
- to IP address + port
- from IP address + port
The policyTypes are Ingress and Egress.
For WeveNet CNI plugin, we shall add annotation of network policy name at namespace. The flannel CNI plugin does not honor network policy
The Calico CNI shall be installed. Download latest calico.yaml file. Then install by command
k create -f calico.yaml
While starting minikube pass this additional flag
--extra-config=kubelet.network-plugin=cni --network-pugin=cni
We can add whitelist as below.
ingress:
- from:
- ipBlock:
cidr: 192.168.0.0/16
ports:
- port: 80
protocol: TCP
Allow all ingress traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-all-ingress
spec:
podSelector: {}
ingress:
- {}
policyTypes:
- Ingress
Allow all egress traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-all-egress
spec:
podSelector: {}
egress:
- {}
policyTypes:
- Egress
Deny all ingress traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
spec:
podSelector: {}
policyTypes:
- Ingress
Deny all egress traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-egress
spec:
podSelector: {}
policyTypes:
- Egress
Whitelist can be keep growing.
Capabilities:
We can run this command inside container
grep Cap /proc/1/status
CapInh: 00000000a80425fb
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000
The capability can be decoded with
capsh --decode=00000000a80425fb
Capability can be added under securityContext
capabilities:
add: ["NET_ADMIN", "SYS_TIME", "MAC_ADMIN"]
References:
https://github.com/kelseyhightower/kubernetes-the-hard-way
https://kubernetes.io/docs/reference/access-authn-authz/controlling-access/
https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/
https://kubernetes.io/docs/reference/access-authn-authz/abac/#examples
https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
https://github.com/kubernetes/examples/blob/master/staging/podsecuritypolicy/rbac/README.md
https://github.com/ahmetb/kubernetes-network-policy-recipes
https://kubernetes.io/docs/concepts/services-networking/network-policies/
CKAD : 5. Deployment Configuration
K8s volume share lifetime of pod, not lifetime of containers within. If container restart, then data available to new container. Volume is available to multiple pods. Volume's life can be longer than life of pod. If multiple pods have write access mode, then data corruption is possible unless there is locking mechanism.
Pod has Volume. Single volume can be available to multiple container within pod using VolumeMount (directory or folder path). (If VolumeMount has different value, then within each container, the path will be different.) So it can be used for intra pod communication.
28+ volume types available
- rbd for Ceph, GlusterFS: block storage.
- NFS, iSCSI : for multiple reading
- gcePersistentDisk
- Others: azureDisk, azureFile, csi, downardAPI, fc (fiber channel) it has raw block volume, Flocker, gitRepo, Local, projectd, portworxVolume, quobyte, scaleIO, secret, StorageOS, vsphereVolume, persistentVolumeClaim, awsElasticStore, CepthFS, Cinder, FlexVolume, GCEPersistentDisk, Quobyte,
PV K8s PV has longer lifetime than pod. Pod can claim PV as PVC.
PV is cluster scoped.
Volume plugin (as per CSI), are in-tree. i.e. they compiled along with k8s binaries
Out-of-tree object allow storage vendors to develop single driver and plugin can be containerized. it needs elevated access to host node so it is security risk.
Pod can claim PV as PVC. Multiple pods can share PVC. Other pod cannot use data inside volume.
Pod:Voulme:PVC
Then it is mounted in container.
Access Mode (accessModes) :
1. RWO (Read Write Once)
2. ROX (Read Only Many)
3. RWx (Read Write Many)
for node
1. cluster group all volumes with same access mode.
2. then sort volume by size. smallest to largest.
PVC request with parameter (1) access mode, (2) volume size (3) storage class is optional parameter.
PVC is namespaced scope
phase
1. Provisioning : PV
PV can be of type (1) empty or (2) pre-populated volume.
1.1. emptyDir
1.2. hostPath : mounts resource from host node file system
- directory
- file
- socket
- character device
- block device
path:
* All types have its own configuration settings.
The resource must exist on host node.
- DirecotryOrCreate
- FileOrCreate
When PV is created using NFS storage, we must mention, server and path both
nfs:
server: nfs01
# Exported path of your NFS server
path: "/mysql"
2. Binding: PVC
k get pv
Here CLAIM column indicate name of PVC
Other columns are : Capacity, Access Mode, Reclaim policy, status = Bound | Available, Storageclass. StorageClass = manual if PV is creating using hostPath.
3. use
If we create a file in path, which is mounted in container pointing to PVC of pod. Then we delete all pods, who have this PVC, then create again, the file is as it is. Because PVC is not deleted.
4. Release: when PVC is deleted
5. reclaim as per persistentVolumeReclaimPolicy
- Retain
- Delete
- Recycle
With retain, the PV is not available for any other PVC, even after old PVC is deleted.
StorageClass is used for dynamic provisioning. We no need to define PV. PV is created automatically. At PVC we can use storageClassName to link PVC with storage class.
Secret
k create secret generic "name of secret" --from-literal=key=value
To encrypt secret, use EncryptionConfiguration object with (1) key and (2) provider identity. The kube-apiserver should be started with --encryption-provider-config flag. Example provider: aescbc, ksm. Other tools Helm Secrets, HashiCorp Vault. Then we need to recreate all existing secrets.
Max size of secret = 1 MB
We can have many secret. No limit
They will stored in tempfs of host node.
It can be mounted in container as environment variable or file. It is stored as plain text inside container.
Inside pod it is referred as: secretKeyRef
Secret has two maps
1. data
2. stringData: It is a write-only convenience field. Not display in output of command kubectl get secret mysecret -o yaml
If any key is specified at both data and stringData, then value specified at stringData is used and value specified at data will be ignored
When we mount secret, we can mount individual key
volumeMounts:
- name: foo
mountPath: "/etc/foo"
readOnly: true
volumes:
- name: foo
secret:
secretName: mysecret
items:
- key: username
path: my-group/my-username
OR all keys of secret.
volumeMounts:
- name: foo
mountPath: "/etc/foo"
readOnly: true
volumes:
- name: foo
secret:
secretName: mysecret
If we specify file permission in JSON, it should be decimal. In YAML it should be octal.
In K8s 1.18 Alpha feature cm and secret both can be immutable, if they specify with "immutable: true"
Secret resources reside in a namespace. Secrets can only be referenced by Pods in that same namespace.
secret->data->.secret-file It will create hidden file, when mounted.
The ability to watch and list all secrets in a cluster should be reserved for only the most privileged, system-level components.
Configmap:
1. Key value pair
2. Plain config file in any format
Data come from
1. singe file
2. collection of files in a single directory.
3. literal value (use --from-literal with command k create cm 'cm name' --from-liternal)
Inside pod
1. env var
2. volume
3. a file in volume (a filename is key, file content is value)
4. in pod command
5. set file name and its access mode in volume using ConfigMaps
ConfigMaps can be used by system components and controllers.
Inside pod it is referred as: configMapKeyRef
If ConfigMap is defined as K,V pairs. Pod - > volume is linked with that ConfigMap. The volume is mounted in container at specific path. Inside that directory, there are multiple files as per K in config map and the content of file is V
If we want to mount env variable inside container then no need to modify anything in pod section. Suppose we have cm with name cmone and its content KONE=vone. Another cm with name cmtwo and its content is KA=va and KB=vb
Now to load only one env variable from cmone, we shall use:
envFrom:
- configMapRef:
name: cmone
All K,v pairs from cmone will be loaded to the container as env variables. env name = k and env value = v
If we want to load individual k,v pair use below syntax
- name: KAA
valueFrom:
configMapKeyRef:
name: cmtwo
key: KA
- name: KBB
valueFrom:
configMapKeyRef:
name: cmtwo
key: KB
Here we can change env variable name also.
Same thing applies for secret
envFrom:
- secretRef:
name: sone
and
- name: KAA
valueFrom:
secretKeyRef:
name: stwo
key: KA
- name: KBB
valueFrom:
secretKeyRef:
name: stwo
key: KB
We can also define a readiness probe with command ls /path/for/volumeMount. So when the configmap is loaded to container then only the container becomes ready.
Deployment Status
k get deploy grafana -o yaml
A deployment has strategy with value “Recreate” or “RollingUpdate”.
We can see status
k rollout status deploy "name of deployment"
We can see all events about new pod creation and old pod termination with command
k describe deploy "name of deployment"
Pod has Volume. Single volume can be available to multiple container within pod using VolumeMount (directory or folder path). (If VolumeMount has different value, then within each container, the path will be different.) So it can be used for intra pod communication.
28+ volume types available
- rbd for Ceph, GlusterFS: block storage.
- NFS, iSCSI : for multiple reading
- gcePersistentDisk
- Others: azureDisk, azureFile, csi, downardAPI, fc (fiber channel) it has raw block volume, Flocker, gitRepo, Local, projectd, portworxVolume, quobyte, scaleIO, secret, StorageOS, vsphereVolume, persistentVolumeClaim, awsElasticStore, CepthFS, Cinder, FlexVolume, GCEPersistentDisk, Quobyte,
PV K8s PV has longer lifetime than pod. Pod can claim PV as PVC.
PV is cluster scoped.
Volume plugin (as per CSI), are in-tree. i.e. they compiled along with k8s binaries
Out-of-tree object allow storage vendors to develop single driver and plugin can be containerized. it needs elevated access to host node so it is security risk.
Pod can claim PV as PVC. Multiple pods can share PVC. Other pod cannot use data inside volume.
Pod:Voulme:PVC
Then it is mounted in container.
Access Mode (accessModes) :
1. RWO (Read Write Once)
2. ROX (Read Only Many)
3. RWx (Read Write Many)
for node
1. cluster group all volumes with same access mode.
2. then sort volume by size. smallest to largest.
PVC request with parameter (1) access mode, (2) volume size (3) storage class is optional parameter.
PVC is namespaced scope
phase
1. Provisioning : PV
PV can be of type (1) empty or (2) pre-populated volume.
1.1. emptyDir
1.2. hostPath : mounts resource from host node file system
- directory
- file
- socket
- character device
- block device
path:
* All types have its own configuration settings.
The resource must exist on host node.
- DirecotryOrCreate
- FileOrCreate
When PV is created using NFS storage, we must mention, server and path both
nfs:
server: nfs01
# Exported path of your NFS server
path: "/mysql"
k get pv
Here CLAIM column indicate name of PVC
Other columns are : Capacity, Access Mode, Reclaim policy, status = Bound | Available, Storageclass. StorageClass = manual if PV is creating using hostPath.
3. use
If we create a file in path, which is mounted in container pointing to PVC of pod. Then we delete all pods, who have this PVC, then create again, the file is as it is. Because PVC is not deleted.
4. Release: when PVC is deleted
5. reclaim as per persistentVolumeReclaimPolicy
- Retain
- Delete
- Recycle
With retain, the PV is not available for any other PVC, even after old PVC is deleted.
StorageClass is used for dynamic provisioning. We no need to define PV. PV is created automatically. At PVC we can use storageClassName to link PVC with storage class.
Secret
k create secret generic "name of secret" --from-literal=key=value
To encrypt secret, use EncryptionConfiguration object with (1) key and (2) provider identity. The kube-apiserver should be started with --encryption-provider-config flag. Example provider: aescbc, ksm. Other tools Helm Secrets, HashiCorp Vault. Then we need to recreate all existing secrets.
Max size of secret = 1 MB
We can have many secret. No limit
They will stored in tempfs of host node.
It can be mounted in container as environment variable or file. It is stored as plain text inside container.
Inside pod it is referred as: secretKeyRef
Secret has two maps
1. data
2. stringData: It is a write-only convenience field. Not display in output of command kubectl get secret mysecret -o yaml
If any key is specified at both data and stringData, then value specified at stringData is used and value specified at data will be ignored
When we mount secret, we can mount individual key
volumeMounts:
- name: foo
mountPath: "/etc/foo"
readOnly: true
volumes:
- name: foo
secret:
secretName: mysecret
items:
- key: username
path: my-group/my-username
OR all keys of secret.
volumeMounts:
- name: foo
mountPath: "/etc/foo"
readOnly: true
volumes:
- name: foo
secret:
secretName: mysecret
If we specify file permission in JSON, it should be decimal. In YAML it should be octal.
In K8s 1.18 Alpha feature cm and secret both can be immutable, if they specify with "immutable: true"
Secret resources reside in a namespace. Secrets can only be referenced by Pods in that same namespace.
secret->data->.secret-file It will create hidden file, when mounted.
The ability to watch and list all secrets in a cluster should be reserved for only the most privileged, system-level components.
Configmap:
1. Key value pair
2. Plain config file in any format
Data come from
1. singe file
2. collection of files in a single directory.
3. literal value (use --from-literal with command k create cm 'cm name' --from-liternal)
Inside pod
1. env var
2. volume
3. a file in volume (a filename is key, file content is value)
4. in pod command
5. set file name and its access mode in volume using ConfigMaps
ConfigMaps can be used by system components and controllers.
Inside pod it is referred as: configMapKeyRef
If ConfigMap is defined as K,V pairs. Pod - > volume is linked with that ConfigMap. The volume is mounted in container at specific path. Inside that directory, there are multiple files as per K in config map and the content of file is V
If we want to mount env variable inside container then no need to modify anything in pod section. Suppose we have cm with name cmone and its content KONE=vone. Another cm with name cmtwo and its content is KA=va and KB=vb
Now to load only one env variable from cmone, we shall use:
envFrom:
- configMapRef:
name: cmone
All K,v pairs from cmone will be loaded to the container as env variables. env name = k and env value = v
If we want to load individual k,v pair use below syntax
- name: KAA
valueFrom:
configMapKeyRef:
name: cmtwo
key: KA
- name: KBB
valueFrom:
configMapKeyRef:
name: cmtwo
key: KB
Here we can change env variable name also.
Same thing applies for secret
envFrom:
- secretRef:
name: sone
and
- name: KAA
valueFrom:
secretKeyRef:
name: stwo
key: KA
- name: KBB
valueFrom:
secretKeyRef:
name: stwo
key: KB
We can also define a readiness probe with command ls /path/for/volumeMount. So when the configmap is loaded to container then only the container becomes ready.
Deployment Status
k get deploy grafana -o yaml
At the end Status is present.
1. availableReplicas
2. observedGeneration : For rollout and rollback situation. This parameter specifies current revision number.
Rolling update by
1. changing replica
k scale deploy "name of deployment" --replicas="new count for replica Count"
2. edit deployment and change container image with other version
A deployment has strategy with value “Recreate” or “RollingUpdate”.
We can use --record option to set annotation in deployment. then with command we can rollback. The --record option put annotation in CHANGE-CAUSE column. Once we use --record, then same annotation is continued to future upgrade also, if we do not use --record in future.
The rollback is possible with "rollout undo' command as below. Suppose current version is 5 and if you rollback to 2. Then new version 6 is created. version 2 is removed. record of version 2 is automatically added to 6. if you are at version 5, and you rollback with 'rollout undo' command then we are going back to revision 4. new revision is created as revision 6. So 4 is renamed as 6. You can see revisions 1, 2, 3, 5, 6
The rollback is possible with "rollout undo' command as below. Suppose current version is 5 and if you rollback to 2. Then new version 6 is created. version 2 is removed. record of version 2 is automatically added to 6. if you are at version 5, and you rollback with 'rollout undo' command then we are going back to revision 4. new revision is created as revision 6. So 4 is renamed as 6. You can see revisions 1, 2, 3, 5, 6
k rollout undo deploy "name of deployment"
rollback to specific version with option --to-revision='number'
k rollout undo deploy "name of deployment" --to-revision=1
When we create deployment, replicaset A is automatically created.
When we rollout, new replica set B is created. B's replica count increase from 0 to rpelica count of old A replicaset. A's replicacount will decrease from original value to 0. If we rollback (rollout undo) then reverse will happen.
if we use 'k rolling-update' command, then update will be stopped if client is closed.
When we create deployment, replicaset A is automatically created.
When we rollout, new replica set B is created. B's replica count increase from 0 to rpelica count of old A replicaset. A's replicacount will decrease from original value to 0. If we rollback (rollout undo) then reverse will happen.
if we use 'k rolling-update' command, then update will be stopped if client is closed.
We can see status
k rollout status deploy "name of deployment"
We can see all events about new pod creation and old pod termination with command
k describe deploy "name of deployment"
we can pause and resume deployment.
k rollout pause deploy "name of deployment"
k rollout resume deploy "name of deployment"
k rollout history deploy "name of deployment"
provides all the revisions.
We can see specific revision
k rollout history deploy "name of deployment" --revision=1
We can see diff
k rollout history deploy "name of deployment" --revision=1 > one.out
k rollout history deploy "name of deployment" --revision=1 > two.out
We can see specific revision
k rollout history deploy "name of deployment" --revision=1
We can see diff
k rollout history deploy "name of deployment" --revision=1 > one.out
k rollout history deploy "name of deployment" --revision=1 > two.out
diff one.out two.out
The changes are:
1. line 1 has revision number
2. pod-template-hash
3. Image
If we change replica count, then new revision is not created.
We can trigger rollout with command
k set image deploy "name of deployment" "name of container"=nginx:1.9.1
The changes are:
1. line 1 has revision number
2. pod-template-hash
3. Image
If we change replica count, then new revision is not created.
We can trigger rollout with command
k set image deploy "name of deployment" "name of container"=nginx:1.9.1
Real-Time Kubernetes Debugging, Monitoring and Alerting with BotKube
Ref: https://www.meetup.com/k8s-cloudnative-online/events/269706847
My take away points
Real-Time Kubernetes Debugging
1 Describe
* describe pod and see event
* describe service and see endpoint
2 Events
* kubectl get events
gives events in that namespace
* kubectl get events --all-namespace
3 exec with -it (interactive terminal) for pod. Use -c to run inside specific container.
4. kubectl debug is K8s 1.18 alpha feature. First enable alpha feature. We can dynamically insert container in pod, for debug purpose.
5. Logs
* kubectl logs,
Container shall expose its logs using stdout
* kubectl logs --previous
provides log for when it run last time. It is useful if any pod is keep restating.
Grafana Dashboard: https://grafana.com/grafana/dashboards?orderBy=name&direction=asc
For K8s + Prometheus setup, relevant dashboards are : https://grafana.com/grafana/dashboards?dataSource=prometheus&direction=asc&orderBy=name&search=kubernetes
Select one dashboard. import in you Grafana with its ID. OR one can download JSON file and import the JSON file
BotKube
BotKube is for debugging and monitoring
it can run kubectl also
kubectl get clusterroles
give all who have cluster role
BotKube can work with multiple cluster
We can add custom filter in GoLang to BotKube: https://www.botkube.io/filters/
BotKube next release
- It will be integrated with Microsoft Team
- It will have json path based status:field monitoring
Reference:
https://github.com/infracloudio/botkube
https://www.botkube.io
From Live Chat
* if service has two label and pod has only 1 out of 2 labels then also that pod is part of service due to OR condition.
* deployments are not associated with any node, so one cannot get node IP address from deployment
* Better go with Prometheus Operator Model that is a part of Kube-Prometheus project. It'll install node-exporter that is a daemonset. It'll act as a agent for nodes
* Prometheus can work with docker container also, without K8s
* splunk and kibana both can be combined.
* https://blogs.halodoc.io/production-grade-eks-logging-with-efk-via-sidecar/
* how scaling is designed for botkube ? once is a while, there can be many events from K8s cluster . Most of the time, there are no events from k8s cluster.
Scaling is using K8s HPA only. Even we have so many events, each event processing needs little CPU.
Meetup Recording : https://www.youtube.com/watch?v=bGnQep5bY6c
My take away points
Real-Time Kubernetes Debugging
1 Describe
* describe pod and see event
* describe service and see endpoint
2 Events
* kubectl get events
gives events in that namespace
* kubectl get events --all-namespace
3 exec with -it (interactive terminal) for pod. Use -c to run inside specific container.
4. kubectl debug is K8s 1.18 alpha feature. First enable alpha feature. We can dynamically insert container in pod, for debug purpose.
5. Logs
* kubectl logs,
Container shall expose its logs using stdout
* kubectl logs --previous
provides log for when it run last time. It is useful if any pod is keep restating.
Grafana Dashboard: https://grafana.com/grafana/dashboards?orderBy=name&direction=asc
For K8s + Prometheus setup, relevant dashboards are : https://grafana.com/grafana/dashboards?dataSource=prometheus&direction=asc&orderBy=name&search=kubernetes
Select one dashboard. import in you Grafana with its ID. OR one can download JSON file and import the JSON file
BotKube
BotKube is for debugging and monitoring
it can run kubectl also
kubectl get clusterroles
give all who have cluster role
BotKube can work with multiple cluster
We can add custom filter in GoLang to BotKube: https://www.botkube.io/filters/
BotKube next release
- It will be integrated with Microsoft Team
- It will have json path based status:field monitoring
Reference:
https://github.com/infracloudio/botkube
https://www.botkube.io
From Live Chat
* if service has two label and pod has only 1 out of 2 labels then also that pod is part of service due to OR condition.
* deployments are not associated with any node, so one cannot get node IP address from deployment
* Better go with Prometheus Operator Model that is a part of Kube-Prometheus project. It'll install node-exporter that is a daemonset. It'll act as a agent for nodes
* Prometheus can work with docker container also, without K8s
* splunk and kibana both can be combined.
* https://blogs.halodoc.io/production-grade-eks-logging-with-efk-via-sidecar/
* how scaling is designed for botkube ? once is a while, there can be many events from K8s cluster . Most of the time, there are no events from k8s cluster.
Scaling is using K8s HPA only. Even we have so many events, each event processing needs little CPU.
Meetup Recording : https://www.youtube.com/watch?v=bGnQep5bY6c