Kubernetes


1. Design
=========

API : Primitives (Building Blocks) for 
1. deploy 
2. maintain 
3. scale 
apps. 

1.1 Pod
=======

* Scheduling unit
* Pod = 1+ co-located containers. and options how container(s) should run
* Pod has unique IP within cluster. 
* Can be managed by Kubernetes API or controller. 
* ephemeral and disposable
* Pod provides a way to set env variables, mount storage, and feed other information to container. 
* States : pending, running, succeeded, failed, CrashLoopBackOff
* Pod is like an implementation of "composite container pattern"
** pod can have zero or more sidecar containers. Istio add one sidecar container to each pod.  

** pod can have zero or more ambassador containers. It proxy a local connection (to 127.0.0.1) towards outside world. 
** pod can have zero or more adapter containers. It standardize the output. 

Summary : A computer is a collection of resources, some processing, memory, disk, and network interfaces. In K8s the pod is the new computer.

Pod Implementation

* "pause container" is a parent container for rest of the container within pod. 
* Other containers share network namespace, ipc, pid namespace and access to storage with pause container. 
* "pause container" also reap all zombie processes created by child containers. 

1.2 Labels, Selectors and namespace
===================================

Labels

* Key-Value pair
* attached to pod and node
* grouping mechanism 

Selectors

1. Equality based selector (= and !=)
2. Set based selector (IN, NOT IN, EXISTS)

Selectors has two types

1. Label selector
2. Field selector

Namespace

* Multiple virtual cluster backed by same physical cluster. 
To divide cluster resources among multiple user using cluster quota. 
K8S start with 3 initial namespace
1. "default" 
2. "kube-system" for object created by k8s system
3. "kube-public" reserved for cluster usage. Anyone can access. 
Basically namespace is non-overlapping set of K8s objects. 
Object name must be unique within namespace. 


1.3 Controllers
===============

* Controllers are watch-loops
* Manage a set of podes as per "Labels and Selector"
* reconciliation loop drive cluster state from actual to desirable. So it query API server. 
* Benefits
1. App Reliability
2. Scaling
3. Load Balancing

Examples: 

1. Replication controller: to scale up and down. Maintain correct number of pods. It facilitate horizontal scaling and ensure that Pods are resilient in case of host or application failures. If a container goes down or a host becomes unavailable, Pods will re-start on different hosts as necessary to maintain a target
number of replicas. Now it replaces by Deployment Controller and raplicaset. 

2. Deployment controller : Declarative updates (YAML file) for pods and replica set. It updates PodTemplateSpec. So new Replicaset is created with new version of pod. If not OK, rollback to old Replicaset. 

3. Daemonset controller to run 1 pod on 1 node. We can run a specific pod on all node also. "nodeSelector" is used to specify the node. 
4. Job controller 
5. endpoints controller, joins service and pod together,
6. namespace controller, 
7. service accounts and token controller for access management
8. Node controller to manage worker states.
9. Stateful set : manage the deployment and scaling for a set of pods, and provide guarantees about (1) ordering and (2) uniqueness of these pods. But unlike a deployment, a stateful set manages the sticky identity for each of these pods.


* Kind of controllers
Replicasets
Deploymnets
DaemonSet
Jobs
Services 

1.4 Services
============

* set of pods works together, E.g. tier in multi-tier
* set defined by labels and selector.
* Kubernetes discover pods based on services
* A service round-robins requests between pods. It is load balancers and front-ends to a collection of Pods.
* Services are the external point of contact for container workloads, accessible via an internal DNS server. 
* A Services’ IP address remains stable and can be exposed to the outside world via an Ingress. It abstracts away the number of Pods as well as virtual IP addresses for each Pod that can change as Pods are scheduled to different cluster hosts.
* Service handles access policies for inbound req (useful for resource control, securiry, 

Service Discovery

* NodePort
* Load Balancer
* Ingress

Load Balancers:

* HAProxy, 
* Traefik, 
* F5 etc.

2. Architecture
===============

* Master-slave

Master node is controlled by kubectl. kubectl is CLI for k8s
Kubectl has kubeconfig file that stores : server information, authentication information to access API server
For production, min 3 node cluster. 

Master node in production has add-ons like
- DNS service
- cluster level logging by 3rd party Fluentd which filters, buffers and routes log messages. 
- resource monitoring


2.1 C-plane
===========

2.1.1 etcd
==========

* b+tree key-value data store
* Single DB OR in master+follower DBs
* configuration data of cluster , configmap
* represent overall state of cluster
* other components monitors changed at etcd as etcd provides reliable watch query. 
* it stores : job scheduling info, pod details, storage information etc.
* it can also store ThirdPartyResource. Suppose there is 3rd party resource by name "cron-tab.alpha.ianlewis.org" with version v1 at default namespace, the corresponding custom controller can access it using HTTP GET

http: // localhost: 8001 / apis / alpha.ianlewis.org / v1 / namespaces / default / crontabs

2.1.2 API server
================

* JSON over HTTP
* Validate REST request and update API objects's state at etcd
* It performs CRUD operations at etcd for K8s object data. 
* so client can configure workloads, containers across the worker nodes

2.1.3 Scheduler
===============

* plugable 
* match resource "supply" to workload "demands"
* select node to run pod
* inputs
- resource availability
- resource utilization
- resource requirement
- QoS
- affinity requirements
- anti-affinity requirements
- data locality 
- policy
- user specification 
* supports the use of user-defined custom schedulers
* Workload patterns
- Replica Sets and Deployments
- Statefulsets for services (old name PetSets)
- DaemonSets
- Jobs (run to completion) 
- Cron Jobs
* "pod start" and "pod stop" hook
* "Reschedular" for guaranteed scheduling 

2.1.4 controller manager
========================
* Controller is a daemon that constantly compare the desire state of cluster as per etcd and actual state and then take necessary corrective action. Observer - Diff - Act cycle. 
* Controller uses Watch API for add/delete/modify of K8S objects at API server. 
* controller should be accessible by k8s worker node of cluster. 
* process to run (1) Daemonset controller (2) Replication controller and many more as per section 1.3
* communicate with API server to create, update, delete (1) pod, (2) service end points (3) etc.

* ReplicaSets" are "low-type" in k8s. "DaemonSets" and "Deployments" are high-type

kube-controler-manager is now cloud-controller-manager. It intereacts with 3rd party tools for cluster management and reporting. Now each kubelet must use --cloud-provider-external settings passed to the binary. 


2.2 Kubernetes Node (worker node OR minion node)
===================

= Worker = Minion 
* run container runtime. e.g Docker, rkt or cri-o and below components

2.2.1 Kubelet (K8S Node Agent) 
=============

* heartbeat for health of node.
* it communicate with API server to see if the pod is to be run on this node. 
* If yes, it executes pod containers via container engine
* it mounts and run pod Secrets, ConfigMaps and volumes. Volumes are within pod
* it respond back the pod and node states to API server, after health check ( / master node) 
It used Podspec YAML file, that describe a pod
API Server / HTTP endpoint / File
* it is effectively 'pod - controller'

2.2.2 Kube-proxy
================

* n/w proxy + load balancer
* route to container based on IP + port
* It adds iptables rules to connect node IP address and cluster IP address. 
* Process on all worker node
* 3 modes
1. User space mode : monitor Services and Endpoints using random high number port to proxy traffic. 
2. iptables mode
3. ipvs mode : it will rplace iptables. 


The master node communicate with Kubelet and the end-user communicate with Kube-Proxy.

2.2.3 cAdvisor
==============

Agent to collect resource usage. 

2.2.4 container tooling 
=======================

e.g. Docker. rkt

kubelet can directly talk with containerd using CRI, without need of docker. 

2.2.5 supervisord
=================

Restart component, as and when needed. 

2.2.6 kube-dns
==============

It resolves Kubernetes service DNS names to IP addresses. 

* High Availability HAProxy auto configuration and auto service discovery for Kubernetes. https://github.com/AdoHe/kube2haproxy 


Other alternatives
==================

  1. Docker Swarm
  2. Kubernetes To get started : kubernetes.io
  3. Mesos Marathon
  4. Amazon ECS (Amazon EC2 container service)
    1. Task == Pod
    2. It has its own repository. 
    3. Task can be part of CloudFormation stack. Task, Queue, EC2 Volume all together in CloudFormation to start and to cleanup
    4. To get started https://aws.amazon.com/ecs/
  5. AWS Fargate https://aws.amazon.com/fargate
  6. Google Kubernetes Engine (^L = clear = cls at Google Cloud Shell)
  7. Microsoft Azure Kubernetes Services (AKS)
  8. Hashicorp Nomad
  9. Cloud Foundry
  10. Rackspace
  11. Oracle Cloud Infrastructure 
  12. Docker Compose : Single machine. Not for large scale. With one command, "docker compose up" it will bring up : containers, volumes, networks
  13. Rancher
  14. Nomad

To get started : kubernetes.io


K8s Installation
================

kubeadm is A tool to install k8s on any cloud. 

1. install docker
2. run 'kubeadm init' Get the join tocken
3. On each worker node run 'kubeadm join' along with join token. So all nodes will join the cluster
4. Pod n/w
4.1 All containers can communicate with all containers, without NAT
4.2 All nodes can communicate with all containers, without NAT
4.3 The IP that container sees itself is same as all other see for that container. 

"flannel", and "weave-net" are good starting point for such networking. Few more tools : "calico" and "Romana" 
For details: 


K8s uses https://github.com/containernetworking/cni Container Network Interface for networking among containers. 

kops is to install k8s cluster on AWS. Azure and GCP has similar tools

Logging and Monitoring
======================

logstash, Fluentd, Filebeats running at pod, will ship the logs to Elasticserach , Kabana



  • cAdvisor to collect container usage statistics. it is per node. 
  • Heapster runs as a pod in cluster. It collect data from kubelet per node. Kubelet collect from cAdvisor. Heapster groups all information by pod with relevant labels. 
  • Promethus framework is for application metrics. it is a time series DB.

All the above 3 tools sends data to Grafana for visualization. 


Enterprise tools : Datadog, Riverbed

Authentication and Authorization
================================

Users:
1. Normal users : Users in LDAP or SSO 
2. Service accounts
* Manage by Kube API server
* Bound to specific namespace
* Its credentials are managed in secrets

1. Username
2. UID
3. group : used for authorization. 
4. Extra fields

Popular authentication

1. x509 client certs. default. CA within k8s cluster. 

By default, a main Kubernetes API server configured with the --client-ca-file=/etc/kuberntes/ssl/ca.pem.  API servers use this CA certificate as the CA to verify client authentication. 

How to generate Client certificate (authenticated by API server CA)?
--------------------------------------------------------------------
1.1. Create a private key for your user. In this example, we will name the file manoj.key:
              openssl genrsa -out manoj.key 2048

1.2. Create a certificate sign request manoj.csr using the private key you just created (manoj.key in this example). Make sure you specify your username and group in the -subj section (CN is for the username and O for the group). As previously mentioned, we will use manoj as the name and bitnami as the group:
              openssl req -new -key manoj.key -out manoj.csr -subj "/CN=manoj“

1.3. Locate your Kubernetes cluster certificate authority (CA). This will be responsible for approving the request and generating the necessary certificate to access the cluster API. Its location is normally /etc/kubernetes/ssl/. Check that the files ca.pem, ca-key.pem files exist in the location.

Generate the final certificate employee.crt by approving the certificate sign request, manoj.csr, you made earlier. Make sure you substitute the CA_LOCATION placeholder with the location of your cluster CA. In this example, the certificate will be valid for 500 days:


openssl x509 -req -in manoj.csr -CA /etc/kubernetes/ssl/ca.pem -CAkey /etc/kubernetes/ssl/ca-key.pem -CAcreateserial -out manoj.crt -days 500 

2. static token files (bearer token authentication)
3. OpenID connect
4. Webhook mode
5. basic authentication

Popular authorization

1. ABAC : Attribute based Access control. Access based on (policy)attributes of
- users
- resources
- objects
- enviornments etc.
2. RBAC: Role-based Access control. Verbs are list, get, watch. Important objects defined with YAML are : Roles, ClusterRoles, RoleBindings and ClusterRoleBindings.
3. Webhook


RBAC/ABAC can only be applied to users already defined/added via authentication process.


Role bindings binds role and (1) user OR (2) group OR (3) service accounts

Webhook is for 3rd party integration OR to define complex set of rules. 

Reference 

70 Best Kubernetes Tutorials
https://www.aquasec.com/wiki/display/containers/70+Best+Kubernetes+Tutorials

Kubernetes Architecture

https://www.aquasec.com/wiki/display/containers/Kubernetes+Architecture