1. Design

API : Primitives (Building Blocks) for 
1. deploy 
2. maintain 
3. scale 

1.1 Pod

* Scheduling unit
* Pod = 1+ co-located containers. and options how container(s) should run
* Pod has unique IP within cluster. 
* Can be managed by Kubernetes API or controller. 
* they share storage, Linux namespace, IP address
* ephemeral and disposable
* States : pending, running, succeeded, failed, CrashLoopBackOff
* Pod is like an implementation of "composite container pattern"
** pod can have zero or more sidecar containers. Istio add one sidecar container to each pod.  

** pod can have zero or more ambassador containers. It proxy a local connection (to towards outside world. 
** pod can have zero or more adapter containers. It standardize the output. 

Summary : A computer is a collection of resources, some processing, memory, disk, and network interfaces. In K8s the pod is the new computer.

Pod Implementation

* "pause container" is a parent container for rest of the container within pod. 
* Other container will share network namespace, ipc and pid namespace with pause container. 
* "pause container" also reap all zombie processes created by child containers. 

1.2 Labels, Selectors and namespace


* Key-Value pair
* attached to pod and node
* grouping mechanism 


1. Equality based selector (= and !=)
2. Set based selector (IN, NOT IN, EXISTS)

Selectors has two types

1. Label selector
2. Field selector


Multiple virtual cluster backed by same physical cluster. 
To divide cluster resources among multiple user using cluster quota. 
K8S has "default" namespace
Basically namespace is non-overlapping set of K8s objects. 

1.3 Controllers

* Manage a set of podes as per "Labels and Selector"
* reconciliation loop drive cluster state from actual to desirable 
* Benefits
1. App Reliability
2. Scalling
3. Load Balancing

1. Replication controller: to scale up and down. Maintain correct number of pods. It facilitate horizontal scaling and ensure that Pods are resilient in case of host or application failures. If a container goes down or a host becomes unavailable, Pods will re-start on different hosts as necessary to maintain a target
number of replicas. Now it replaces by Deployment Controller and raplicaset. 

2. Deployment controller : Declarative updates (YAML file) for pods and replica set. It updates PodTemplateSpec. So new Replicaset is created with new version of pod. If not OK, rollback to old Replicaset. 

3. Daemonset controller to run 1 pod on 1 node. We can run a specific pod on all node also. "nodeSelector" is used to specify the node. 
4. Job controller 
5. endpoints controller, joins service and pod together,
6. namespace controller, 
7. service accounts and token controller for access mgmt
8. Node controller to manage worker states.
9. Stateful set : manage the deployment and scaling for a set of pods, and provide guarantees about (1) ordering and (2) uniqueness of these pods. But unlike a deployment, a stateful set manages the sticky identity for each of these pods.

* Kind of controllers

1.4 Services

* set of pods works together, E.g. tier in multi-tier
* set defined by labels and selector.
* Kubernetes discover pods based on services
* A service round-robins requests between pods. It is load balancers and front-ends to a collection of Pods.
* Services are the external point of contact for container workloads, accessible via an internal DNS server. 
* A Services’ IP address remains stable and can be exposed to the outside world via an Ingress. It abstracts away the number of Pods as well as virtual IP addresses for each Pod that can change as Pods are scheduled to different cluster hosts.

Service Discovery

* NodePort
* Load Balancer
* Ingress

Load Balancers:

* HAProxy, 
* Traefik, 
* F5 etc.

2. Architecture

* Master-slave

Master node is controlled by kubectl
Kubectl has kubeconfig file that stores : server information, authentication information to access API server
For production, min 3 node cluster. 

2.1 C-plane

2.1.1 etcd

* key value data store
* configuration data of cluster , configmap
* represent overall state of cluster
* other components monitors changed at etcd
* it stores : job scheduling info, pod details, storage information etc.
* it can also store ThirdPartyResource. Suppose there is 3rd party resource by name "cron-tab.alpha.ianlewis.org" with version v1 at default namespace, the corresponding custom controller can access it using HTTP GET

http: // localhost: 8001 / apis / alpha.ianlewis.org / v1 / namespaces / default / crontabs

2.1.2 API server

* JSON over HTTP
* Validate REST request and update API objects's state at etcd
* It performs CRUD operations for K8s object data. 
* so client can configure workloads, containers across the worker nodes

2.1.3 Schedular

* plugable 
* match resource "supploy" to workload "demands"
* select node to run pod
* inputs
- resource availability
- resource utilization
- resource requirement
- QoS
- afinity requirements
- anti-afinity requirements
- data locality 
- policy
- user specification 
* supports the use of user-defined custom schedulers
* Workload patterns
- Replica Sets and Deployments
- Stefulsets for services (old name PetSets)
- DaemonSets
- Jobs (run to completion) 
- Cron Jobs
* "pod start" and "pod stop" hook
* "Reschedular" for guaranteed scheduling 

2.1.4 controller manager
* Controller is a daemon that constantly compare the desire state of cluster as per etcd and actual state and then take necessary corrective action. Observer - Diff - Act cycle. 
* Controller uses Watch API for add/delete/modify of K8S objects at API server. 
* controller should be accessible by k8s worker node of cluster. 
* process to run (1) Daemonset controller (2) Replication controller and many more as per section 1.3
* communicate with API server to create, update, delete (1) pod, (2) service end points (3) etc.

2.2 Kubernetes Node (worker OR minion)

= Worker = Minion 
* run container runtime. e.g Docker, rkt and below components

2.2.1 Kubelet (K8S Node Agent) 

* hearbeat for health of node.
* it communicate with API server to see if the pod is to be run on this node. 
* If yes, it executes pod containers via container engine
* it mounts and run pod secrets and volumes. Volumes are within pod
* it respond back the pod and node states to API server, after health check ( / master node) 
It used Podspec YAML file, that describe a pod
API Server / HTTP endpoint / File
* it is effectively 'pod - controller'

2.2.2 Kube-proxy

* n/w proxy + load balancer
* route to container based on IP + port
* It adds iptables rules to connect node IP address and cluster IP address. 
* Process on all worker node
* 3 modes
1. User space mode
2. iptables mode
3. ipvs mode

The master node communicate with Kubelet and the end-user communicate with Kube-Proxy.

2.2.3 cAdvisor

Agent to collect resource usage. 

2.2.4 container tooling 

e.g. Docker

2.2.5 supervisord

Restart component, as and when needed. 

2.2.6 kube-dns

It resolves Kubernetes service DNS names to IP addresses. 

* High Availability HAProxy auto configuration and auto service discovery for Kubernetes. https://github.com/AdoHe/kube2haproxy 

Other alternatives

  1. Docker Swarm
  2. Kubernetes To get started : kubernetes.io
  3. Mesos Marathon
  4. Amazon ECS (Amazon EC2 container service)
    1. Task == Pod
    2. It has its own repository. 
    3. Task can be part of CloudFormation stack. Task, Queue, EC2 Volume all together in CloudFormation to start and to cleanup
    4. To get started https://aws.amazon.com/ecs/
  5. AWS Fargate https://aws.amazon.com/fargate
  6. Google Kubernetes Engine (^L = clear = cls at Google Cloud Shell)
  7. Microsoft Azure Kubernetes Services (AKS)
  8. Hashicorp Nomad
  9. Cloud Foundry
  10. Rackspace
  11. Oracle Cloud Infrastructure 
  12. Docker Compose : Single machine. Not for large scale. With one command, "docker compose up" it will bring up : containers, volumes, networks
  13. Rancher
  14. Nomad

To get started : kubernetes.io

K8s Installation

kubeadm is A tool to install k8s on any cloud. 

1. install docker
2. run 'kubeadm init' Get the join tocken
3. On each worker node run 'kubeadm join' along with join token. So all nodes will join the cluster
4. Pod n/w
4.1 All containers can communicate with all containers, without NAT
4.2 All nodes can communicate with all containers, without NAT
4.3 The IP that container sees itself is same as all other see for that container. 

"flannel", and "weave-net" are good starting point for such networking. Few more tools : "calico" and "Romana" 
For details: 

K8s uses https://github.com/containernetworking/cni Container Network Interface for networking among containers. 

kops is to install k8s cluster on AWS. Azure and GCP has similar tools

Logging and Monitoring

logstash, Fluentd, Filebeats running at pod, will ship the logs to Elasticserach , Kabana

  • cAdvisor to collect container usage statistics. it is per node. 
  • Heapster runs as a pod in cluster. It collect data from kubelet per node. Kubelet collect from cAdvisor. Heapster groups all information by pod with relevant labels. 
  • Promethus framework is for application metrics. it is a time series DB.

All the above 3 tools sends data to Grafana for visualization. 

Enterprise tools : Datadog, Riverbed

Authentication and Authorization

1. Normal users : Users in LDAP or SSO 
2. Service accounts
* Manage by Kube API server
* Bound to specific namespace
* Its credentials are managed in secrets

1. Username
2. UID
3. group : used for authorization. 
4. Extra fields

Popular authentication

1. client certs
2. static token files
3. OpenID connect
4. Webhook mode

Popular authorization

1. ABAC : Attribute based Access control
2. RBAC: Role-based Access control
3. Webhook

Role bindings binds role and (1) user OR (2) group OR (3) service accounts

Webhook is for 3rd party integration OR to define complex set of rules. 


70 Best Kubernetes Tutorials

Kubernetes Architecture