7. Workload Considerations : Tracee


 Tracee (uses eBPF) :monitors system call and kernel events. 

- It captures :  (1) precise time stamp, (2) uts_name, (3) UID, (4) Command COMM (5) PID, (6) TID/host (7) return code, RET (8) event, and (9) arguments.

- At least 3 volume locations needed. (1) /lib/modules , (2) /usr/src , (3) /tmp/tracee Tracee provides in-depth tracing of container or pod. 


Tracee has multiple options. Important ones are 

list: list of system calls and other events. 

trace: events. specific pid, uid, uts, mntns, pidns, command (comm), system call etc. We can use comparison operator to filer. 

Examples:

  --trace pid=new                                              | only trace events from new processes

  --trace pid=510,1709                                         | only trace events from pid 510 or pid 1709

  --trace p=510 --trace p=1709                                 | only trace events from pid 510 or pid 1709 (same as above)

  --trace container=new                                        | only trace events from newly created containers

  --trace container                                            | only trace events from containers

  --trace c                                                    | only trace events from containers (same as above)

  --trace '!container'                                         | only trace events from the host

  --trace uid=0                                                | only trace events from uid 0

  --trace mntns=4026531840                                     | only trace events from mntns id 4026531840

  --trace pidns!=4026531836                                    | only trace events from pidns id not equal to 4026531840

  --trace 'uid>0'                                              | only trace events from uids greater than 0

  --trace 'pid>0' --trace 'pid<1000'                           | only trace events from pids between 0 and 1000

  --trace 'u>0' --trace u!=1000                                | only trace events from uids greater than 0 but not 1000

  --trace event=execve,open                                    | only trace execve and open events

  --trace set=fs                                               | trace all file-system related events

  --trace s=fs --trace e!=open,openat                          | trace all file-system related events, but not open(at)

  --trace uts!=ab356bc4dd554                                   | don't trace events from uts name ab356bc4dd554

  --trace comm=ls                                              | only trace events from ls command

  --trace close.fd=5                                           | only trace 'close' events that have 'fd' equals 5

  --trace openat.pathname=/tmp*                                | only trace 'openat' events that have 'pathname' prefixed by "/tmp"

  --trace openat.pathname!=/tmp/1,/bin/ls                      | don't trace 'openat' events that have 'pathname' equals /tmp/1 or /bin/ls

  --trace comm=bash --trace follow                             | trace all events that originated from bash or from one of the processes spawned by bash

  --trace container=new  | all the events from container created after issuing this command

capture: suspicious artifacts. One can specify (1) files written at specific path (2) file executed from specific path (3) W and X access to specific memory region. 

Examples:
  --capture exec                                           | capture executed files into the default output directory
  --capture all --capture dir:/my/dir --capture clear-dir  | delete /my/dir/out and then capture all supported artifacts into it
  --capture write=/usr/bin/* --capture write=/etc/*        | capture files that were written into anywhere under /usr/bin/ or /etc/

output: format, op file path, include: (1) exec env (2) stack trace or not?

Examples:
  --output json --output option:eot                        | output as json and add an EOT event
  --output gotemplate=/path/to/my.tmpl                     | output as the provided go template
  --output out-file:/my/out err-file:/my/err               | output to /my/out and errors to /my/err

5. Securing Kube-APIServer: RBAC


 We can use

 kubectl auth reconcile -f  "filename.yaml"

to create missing objects and ns. It does not create sa

We can also run with 

 kubectl auth reconcile -f  "filename.yaml" --dry-run=client

--remove-extra-permissions will remove extra permission in role

--remove-extra-subjects will remove extra subjects in binding

The kubectl auth reconcile command will ignore any resources that are not Role, RoleBinding, ClusterRole, and ClusterRoleBinding objects, so you can safely run reconcile on the full set of manifests. Next we can run kubectl apply command. 

With kubectl apply command for rolebinding, we cannot update roleRef. it is immutable. However with this command kubectl auth reconcile, we can do it. 

All the above points are applicable to ClusterRole and ClusterRoleBinding also. 

Reference: https://www.mankier.com/1/kubectl-auth-reconcile

====================================

Regardless of namespace, by default, SA with name "default" is added to pod, in all namespace. 

====================================

In rolebinding if kind = User then only name is sufficinet. 

subjects:
- kind: User
  name: dan
If kind = ServiceAccount then we need to specify name and namespace

subjects:
- kind: ServiceAccount
  name: simple-sa
  namespace: prod-b

6. Networking : Network Policy


- We cannot use namespaceSelector, for target pod. The namespaceSelector is for (1) to and (2) from

- if we do not mention about podSelector at all, then it means none of the pod. 

- if we mention empty list , then also it means none of the pod. ingress: []

==================================

- For (1) to and (2) from, if you omit specifying a namespaceSelector it does not select any namespaces, which means it will allow traffic only from the namespace the NetworkPolicy is deployed to.

To allow all traffic from current namespace

ingress:

- from:

  - podSelector: {}

==================================

- if we mention 

ingress: {} 

OR

ingress:

- {}

then it means network all pods from all namespace + outside K8s cluster

- if we mention

  ingress:

  - from:

    - namespaceSelector: {}

Then it means, all pods from all namespace. outside cluster is excluded. 

==================================

- All policies are add / union. So there is no chance of conflict. Whitelist can be keep growing. Traffic is allowed, if we have at least one rule, that allow the traffic. 

By default, if no policies exist in a namespace, then all ingress and egress traffic is allowed to and from pods in that namespace

- Network Policy is connection level filter. It does not apply to packets

- Network Policy does not terminate established connection. 

- cluster level network policy is not part of core API. It is implemented by Calico 

==================================

Best practices

1. First block all ingress/egress in a namespace

2. start whitelisting for each app

3. While applying egress rule, we have to allow DNS, as it is needed in most cases, to resolve service FQDN

==================================

- If no policyTypes are specified on a NetworkPolicy then by default Ingress will always be set 

- policyTypes= Egress will be set if the NetworkPolicy has any egress rules.

==================================

This is OR condition

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: test-network-policy
  namespace: default
  ingress:

  - from:
    - ipBlock:
        cidr: 172.17.0.0/16
        except:
        - 172.17.1.0/24
    - namespaceSelector:
        matchLabels:
          KEY: VALUE
    - podSelector:
        matchLabels:
          KEY: VALUE

Here : any pod whose namespace has label key=value OR any pod with namespace of Networkpolicy  (default) who has label key=value OR pod has specific IP addresss

==================================

This is AND condition

  ingress:

  - from:
    - namespaceSelector:
        matchLabels:
          user: alice
      podSelector:
        matchLabels:
          role: client

Here : any pod whose namespace has label user=alice AND any pod who has label role=client 

If podSelector:{} then namespaceSelector can be AND or OR, does not matter. 

==================================

This is also AND condition

  ingress:
  - from:
...........
    ports:
    - protocol: TCP
      port: 6379

We have to use containerPort only. 

We can have multiple rules by multiple "-from" and/or multiple "-to"

==================================

To allow all traffic from all namespace

(1) 

ingress:

- from:

  - podSelector: {}

    namespaceSelector: {}    

(2) 

ingress:

- from:

  - namespaceSelector: {}    

==================================

Port is always destination port, for both ingress and egress. 

==================================

We can block egress traffic go outside cluster, by (1) specifying allow to all namespace

egress:

- to:

  - namespaceSelector: {}

(2) empty list

egress: []

==================================

First let's isolate Ingress and Egress both traffic to target pod as per podSelector. These pods belongs to same namespace, as the NetworkPolicy belong to. Here all pods with label role=db in default namespace are isolated. 

apiVersion: networking.k8s.io/v1

kind: NetworkPolicy

metadata:

  name: test-network-policy

  namespace: default

spec:

  podSelector:

    matchLabels:

      role: db

  policyTypes:

  - Ingress

  - Egress

Reference:

https://kubernetes.io/docs/concepts/services-networking/network-policies/

https://github.com/ahmetb/kubernetes-network-policy-recipes

https://www.youtube.com/watch?v=3gGpMmYeEO8

8. Issue Detection


Cyber Kill Chain

  • Reconnaissance 
  • Weaponization: Client application data file: PDF, DOC
  • Delivery: E-mail attachment, wesbite, USB removable media
  • Exploitation: 
  • Installation: 
  • Command and Control 
  • Actions on Objectives
HIDS
An agent or corn job
- scan anomalies on local node
- check system resources against database of attributes (MD5 sum, time-stamps, permissions etc.) for each resource. 

HIDS Tools

1. AIDE is shipped with enterprise Linux. It is highly configurable HIDS
2. Tripwire It is similar to AIDE. Extra features: (1) commercial management console (2) real-time auditing agent
3. OSSEC open source host-based intrusion detection system. It performs (1) log analysis (2) file integrity check (3) policy monitoring (4) root-kit detection (5) real time alerting (6) active response

Important files. So many files, list can be obtained by command
sudo ls -R /var/ossec

Main config file is: /var/ossec/etc/ossec.conf
Rule files: /var/ossec/rules
The active response to issues are in script files /var/ossec/active-response/bin
CIS benchmarks for different OS: /var/ossec/etc/shared
all logs file by ossec : /var/ossec/logs

NIDS
Tools, collect traffic from networking devices. Then analyze the traffic for attack signature and other anomalies 

NIDS Tools

1. SNORT: CISCO maintains it. CISCO add new rules and share with subscriber. After 30 days these new rules are available to community. 

2. Suricata: (1) real time IDS (2) IPS (3) NSM (4) offline pcap processing. Lua-script support for complex threat detection. 

Important files
(1) /etc/suricata/suricata.yaml If we edit this file then run command
sudo suricata-update
then new rules will be added to
(2) /var/lib/suricata/rules/classification.config and
(3) /var/lib/suricata/rules/suricata.rules
Next see /va/log/suricata folder
(4) suricata.log
(5) stats.log
(6) fast.log
(7) eve.json we can use | jq 'select{}'
(8) (9) /etc/suricata/enable.conf and disable.conf to enable/disable rules. then run sudo suricata-update


ML Based Tools Vendors

1. NeuVector
2. StackRox
3. Threat Stack
4. Trend Micro

Acronyms

AIDE: Advanced Intrusion Detection Environment

C2: Command and Control

COOP: COntinuity of OPeration

CVEs: Common Vulnerabilities and Exposures

DR: Disaster Recovery

HIDS: Host Intrusion Detection System

IDS: Intrusion Detection System

IPS: Inline Intrusion Prevention System

LM-CIRT: Lockheed Martin Computer Incident Response Team

NIDS: Network Intrusion Detection System

NSM: Network Security Monitoring 

NVD: National Vulnerability Database

PIDS: Physical Intrusion Detection System

US-CERT: United States Computer Emergency Readiness Team

Website

https://seclists.org/

https://cve.mitre.org/cve/


7. Workload Considerations : AppArmor


  • AppArmor is less complete and simple. 
  • It is available on Debian and SUSE Linux distribution. 
  • It supplements UNIX Discretionary Access Control (DAC) model. It provides MAC (Mandatory Access Control). 
  • Its learning mode (complain mode) is similar to SELinux's Permissive mode. Here, profile violations are logged but not prevented. This log can be turn into profile. 
  • No security labels are needed, so it is filesystem-neutral 
  • Administrator can associate security profile to program. 
  • Unlike SELinux: instead of direct labeling of objects, security policy is applied to pathnames.

K8s

  • AppArmor profile must be available at worker node, so pod can use it. With Ansible or Puppet, AppArmor profile can be added to worker node, during installation. OR use Daemonset. 
  • To disable AppArmor for entire cluster, pass --feature-gate=AppArmor=false. 
  • AppArmor profiles can be managed using PSP
  • If AppArmor kernel module is available then

sudo systemctl [start|stop|restart|status] apparmor

  • To load or not load at boot time

sudo systemctl [enable|disable] apparmor

  • To see current status

sudo apparmor_status

Modes

1. Enforced mode

Default mode

aa-enforce

2. Complain

also called learning mode

aa-complain 

Profiles

- Pre-package profile

- install along with new software install

- install with AppArmor package: apparmor-profile

- stored at /etc/apparmor.d

- "man apparmor.d" provides documentation. 

Other utilities

  • apparmor-notify: summary for AppArmor log messages
  • disable: unload a single profile. and not load during boot
  • easyprof: Help to setup a basic AppArmor profile for a program
  • logprof: Scan log. If any AppArmor event found, that is not covered by existing profiles, then suggest. 
  • genprof: Createa new complain mode profile, using existing profiles as input parameter. Run logprof to scan AppArmor events. All entries in system log has option (A) Allow (D) Deny (I) Ignore (N) New (G) Glob last piece (Q) Quit.  until Quit is selected. then new people is created. 
  • BaneAppArmor profile generator for docker container. It automatically install profile in directory /etc/apparmor.d/containers/

List all AppArmor utilities using

rpm -qil apparmor-utils | grep bin

Access control to assign in AppArmor profile

  • r : Read
  • w : Write
  • m : Memory map as executable
  • k : File locking
  • l : Create hard links
  • ix : Execute and inherit this profile
  • Px : Execute another profile after cleaning environment
  • Ux : Execute unconfined after cleaning environment.

--feature-gates=AppArmor=t|f

Add this metadata to pod

container.apparmor.security.beta.kubernetes.io/<container_name>: <profile_ref>

Note: This is container name, not pod name

This profile name, not profile file name

E.g. container.apparmor.security.beta.kubernetes.io/hello: localhost/k8s-apparmor-example-deny-write

We should have some file at /etc/apparmod.d/ path, which should this profile k8s-apparmor-example-deny-write

profile_ref

- runtime/default

- localhost/<profile_name>

- unconfined 

For PSP: 

apparmor.security.beta.kubernetes.io/defaultProfileName: <profile_ref>

apparmor.security.beta.kubernetes.io/allowedProfileNames: <profile_ref>[,others...]

7. Workload Considerations : SELinux


SELinux is about rules for which process can access which files, directories, ports etc.

SELinux meets Common Criteria, FIPS standard. SELinux has granular settings, based on user, role, category, sensitivity level etc.  SELinux is available on Debian, Redhat and SUSE Linux distribution. 

SELinux has 3 conceptual quantities 

(1) Contexts: labels for file, process and ports. Example: user, role, type, level . -Z to see context and chcon command to change context. commands extended support for Z : ps, ls, cp, mv, mkdir

By default file context do not change, when we move file.

Use restorecon command to restore context as per parent directory. 

use 'semanage fcconext' command to set default settings for future object in directory. to apply on existing objects, use restorecon command. 

semanage fccconext is policycoreutils-python package

(2) Rules : access control 

(3) Policies : Set of rules. 

Default policy is to deny any access. Rules are added to allow access. Allowed actions via "Access Vector Cache" 

SELinux enforcement mode

Refer file /etc/selinux/config OR /etc/sysconfig/selinux

1. Enforcing

- SELinux is operative
- by default access denied. 
- all audited violations are logged, except the ones with dontaudit 

sudo setenforce Enforcing 

2. Permissive

- SELinux is operative
- Access is allowed. but warning generated for denied access. 
- dontaudit event remains silent

sudo setenforce Permissive 

3. Disable

- SELinux is completely disabled. 
- reboot the system to enter or exit this mode. 
- After enabling SELinux again, first boot will take longer time. 

SELINUX=disabled at config file
OR
add kernel parameter selinux=0

getenforce and setenforce function. sestatus utility to display current mode and policy. seinfo command shows more details, with policy file. 

Default SELinux Policies 

Sensitivity levels and categories are not used in default policy. 

1. Targeted
Not for init process
Not for user process
for network service process
memory restrictions for all process, to avoid buffer overflow

2. Minimum
same as targeted, but only applicable to selected process

3. Multi-Level Security (MLS)
fine-grained security domains with particular policies

changing policy need reboot and time consuming file re-labeling. 

SELinux Booleans

possible values: on or off
commands
  • setsebool
  • getsebool
  • semanage boolean -l

Monitoring SELinux Access

install setroubleshot-server package. restart auditd daemon. 
Raw errors will be tagged as AVC error and appended to audit.log
These tools, collect issue at runtime. log them, and suggest solution 


7. Workload Considerations :


1. Static Analysis

Clair

Two Parts

1. Service wrapper: HTTP Interface , Notifier, Notification Storage

2. ClairCore: Download vulnerabilities, compare against index of image

3 Phase/Function

1. Download image layers, scan and generate IndexReport

2. Compare IndexReport with known vulerabilities

3. As per configuration for notifier, notify about vulnerability. 

It uses alpine-secdb

Trivy

It retrieves vuln-list

Trivy checks middle layers of image

Easy to integrate with CICD

2. Dynamic Analysis

Linux commands: perf, ftrace

 Tracee (uses eBPF) :monitors system call and kernel events. 

- It captures :  (1) precise time stamp, (2) uts_name, (3) UID, (4) Command COMM (5) PID, (6) TID/host (7) return code, RET (8) event, and (9) arguments.

- At least 3 volume locations needed. (1) /lib/modules , (2) /usr/src , (3) /tmp/tracee Tracee provides in-depth tracing of container or pod. 

Falco by Sysdigmultiple components (user space program, configuration, driver) working together in order to evaluate system calls against rules, and generate alerts when a rule is broken:

rule has lists. rule can have reference to list. List can be part of macro and other list, in addition to part of rule.

rule has 5 k-v pairs. (1) name, (2) description , (3) condition : Filtering expression for events. (4) output, (5) priority. (emergency, alert, critical, error, warning, notice, informational, debug) 

rule has 4 optional K-v pairs.(1) enabled. default is true (2) tags (filesystem, software_mgmt, process, database, host, shell, container, cis, users, network) . -T option to disable rules with given tag. -t option to enable. (3) warn_evttypes default is true. (4) skip-if-unknown-filter default is false. 

initContainer based approach

Insert initContainer using dynamic admission controller. 

initContainer contains scan/verification tool in pod spec

only if initContainer has exit zero code, then rest of pod spec is passed to container engine for execution. 

Example: cloud security tools by TrendMicro: 

3. Immutable container 

Check periodically as security spring scanning. 

Verify: 

* container has read/write file system? 

* container has ability to elevate privileged users 

* other such features. 

1. SELinux: Debian, RH, SUSE

* SELinux meets Common Criteria, FIPS standard. SELinux has granular settings, based on user, role, category, sensitivity level etc.  

2. AppArmor: Debian, SUSE

* AppArmor is less complete and simple

3. Smack (Simplified MAC Kernel) used with Yocto Linux and Automotive Grade Linux. 

4. TOMOYO (by NTT Data corporation) pathname based MAC (Mandatory Access Control)

Use only one tool, instead of cascading multiple tools. so no confusion, which tool is responsible. 

5. seccomp: Linux kernel feature. first iteration only allowed system calls are: read, write, exit, sigreturn. with Mode 2, BPF/eBPF determines which system call are allowed. 

In K8s, seccomp is used to (1) syscall auditing (2) denial of disallowed call. pod enters to CrashLoopBackoff state. 


spec:
  securityContext:
    seccompProfile:
      type: Localhost
      localhostProfile: profiles/audit.json



6. Networking


Session state: New, Established, Related : (1) related DNS queries, (2) netfilter need protocol specific module. E.g. FTP, VoIP require extra kernel module.

specify module "-m state --state" OR "-m conntrack --ctstate). state module is subest of conntrack module)

Invalid: out of sequence traffic. 

Anatomy of filter

1. Where to apply filter? (input, output, forward) chain



2. Which traffic to filter? (source and destination match criteria) 

3. What action? chain are grouped in tables, as per action (filter, NAT, mangle, raw, security) 

Applicable to both firewall and nwpolicy

chain v/s action

Action: PreRouting Input Output Forward PostRouting
raw Y N Y N N
mangle Y Y Y Y Y
nat Y N Y N Y
filter N Y Y Y N
security N Y Y Y N

Filter table in input, output, forward chain with actions: ACCEPT, DROP, REJECT
NAT table in PreRouting, Output and PostRouting with actions SNAT, DNAT, masquerade
Mangle in all chain for specialized packet alteration 
Raw in PreRouting and Output, to configure exception from connection tracking. Action: NOTRACK
Security in Input, Output and Forward with action SECMARK, CONNSECMARK 

Actions: Accept, Drop, Reject, SNAT, DNAT< masquerade, NOTRACK, SECMARK, CONNSECMARK
Matches: address (L1 | L3), protocol, port, state, 

Please refer iptables-extensions(8) and firewalld.richlanguage(5) man pages for limiting connection rate etc. 

netfilter is in kernel
at user space
1. iptables

iptables variants
1. iptables
2. ip6tables
3. ebtables

useful command to list all tables
iptables -vnL

on top of iptables
1. fwbuilder (GUI)
2. turtle firewall (GUI)
3. ipmenu (CLI)

Note: direct changes done in netfilter chains are not visible at GUI. 

2. firewalld-service
on top of firewalld-service
1. firewall-config
2. firewall-cmd

firewalld features
- timed rules
- rich language for specific firewall rules
- NAT support
- firewall zones
- DBUS API

firewalld support
- Network Manager
- libvert
- docker
- fail2ban (intrusion prevention software framework-Python)
etc.

Each firewall zone has "zone"_direct chain. Firewalld allow to insert rule at front in this chain

Netfilter hooks

nftables replaces: iptables, ip6tables, arptables, ebtables
they are on top of netfilter. netfilter has predefined hooks: raw, filter, NAT, mangle, raw, security
netfilter hooks are for different types of packets
ip, ip6, inet, arp, bridge, netdev. netdev handles packets from ingress.

it can be invoked with 'nft' command. pass file with 'nft -f' command. nft shell also accept file, with first word as nft in each line of file. 

firewalld can use FirewallBackend=nftables| iptables  in  /etc/firewalld/firewalld.conf file.

nftables configuration file
/etc/nftables.conf (Ubuntu)
/etc/sysconfig/nftables.conf (Fedora) it includes /etc/nftables/*.nft


Calico leverage WorkloadEndpoint resource to configure Calico container and host communication. HostEndPoint 

Calico-GlobalNetworkPolicy configure connectivity rules to join WokrloadEndpoint and HostEndPoint in all NS. It has precedence over Profiles. Profiles used before Calico-NetworkPolicy is functional. 

Calico n/w policy has

* (1) policy ordering/priority, (2) deny rules, and (3) more flexible match rules, over default K8s policy. 

* K8s n/w policy is only for pods. Calico n/w policy is for pod, VM, host interfaces.

* along with Istio it supports securing 5-7 layers match criteria & cryptographic identity. 

* works for all cloud provider. 

========

* Neither Ingress nor Egress is specified then default is Ingress

* If no policy then all traffic allowed for pod

* If Ingress policy then only those ingress traffic is allowed. 

* If egress policy then only those egress traffic is allowed. 

* If no policy then all traffic denied for node

Reference: https://docs.projectcalico.org/security/calico-network-policy

WireGuard: VPN

- easy to use

- less feature

- speed

- with Calico clusters

Ingress Controller: Envoy Proxy, NGINX, Traefik, Ambassador


We need to add annotations accordingly to "Ingress" resource

kubernetes.io/ingress.class: haproxy

kubernetes.io/ingress.class: nginx

We can use https://nip.io/ to convert IP and DNS which contain IP. 

In local setup, without Load Balancer, when we use NodePort, we have to use higher port in HOST, while using curl to ingress controller. E.g. 

 curl http://192.168.49.3 -H 'Host: nginx.192.168.49.3.nip.io:32735'

Service Mesh: Istio (security features: peer authentication, authorization, identity management. Zero-Trust Networking), Linkerd (for security), Countour (VMWare), Aspen (old name nginx. F5 purchased nginx and renamed as Aspen) 

5. Securing Kube-APIServer: PSP, IAM, CIS


Pod Security Policy (PSP)

- A set of rules

- provide/modify default values for fields

- change pod

- PSP ordered by name before applied. 

- Deprecated in K8s 1.21

- will be removed in K8s 1.25

Even if you are only planning on changing a single value, the policy file must contain several entries. Sample PSP, where pod can do anything

apiVersion: policy/v1beta1

kind: PodSecurityPolicy
metadata:
   name: basicpolicy
spec:
privileged: true
runAsUser:
   rule: RunAsAny
seLinux:
  rule: RunAsAny
fsGroup:
  rule: RunAsAny
supplementalGroups:
  rule: RunAsAny
allowedCapabilities:
  - '*'
volumes:
  -'*'

Most commonly changed parameters

1. privileged

2. runAsUser

Reference: https://kubernetes.io/docs/concepts/policy/pod-security-policy/

For allowedUnsafeSysctls  and forbiddenSysctls 

  • kernel (common prefix: kernel.)

    • kernel.shm*,
    • kernel.msg*,
    • kernel.sem,
  • networking (common prefix: net.)
  • virtual memory (common prefix: vm.)
  • MDADM (common prefix: dev.)
  seLinux:
    rule: RunAsAny

Means AppArmor is used instead of SELinux

If we have PodSecurityPolicy admission plugin enable, but no PSP defined, then by default, any new pod creation will fail. 

In order to use PSP, the requesting user or target pod's service account must be authorized to use the policy, by allowing the use verb on the policy.

With the plugin enable and appropriate policy, only pod create is allowed. Not deployment, not replicaset creation. To allow pod creation using deployment->replicaset use following ClusterRole and ClusterRoleBinding. Thee methods. Difference between them are highlighted with bold face

1. 

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: use-restricted-psp
rules:
- apiGroups: ['policy']
  resources: ['podsecuritypolicies']
  verbs:     ['use']
  resourceNames:
  - restricted-psp
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: restricted-role-bind
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: use-restricted-psp
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:serviceaccounts

2. 

31apiVersion: rbac.authorization.k8s.io/v1
32kind: ClusterRole
33metadata:
34  name: psp:restricted
35rules:
36- apiGroups:
37  - policy
38  resourceNames: <- resourceName is optional. It give control for individual PSP. 
39  - psp.restricted
40  resources:
41  - podsecuritypolicies
42  verbs:
43  - use
44---
45apiVersion: rbac.authorization.k8s.io/v1
46kind: ClusterRoleBinding
47metadata:
48  name: psp:restricted:binding
49roleRef:
50  apiGroup: rbac.authorization.k8s.io
51  kind: ClusterRole
52  name: psp:restricted
53subjects:
54  - kind: ServiceAccount
55    name: replicaset-controller
56    namespace: kube-system

If we use RoleBinding instead of ClusterRoleBinding then it is for same namespace

3. kubectl -n "namespace" create role "anyNameForRole" \
    --verb=use \
    --resource=podsecuritypolicy \
    --resource-name=" # This Is optional"
kubectl -n "namespace" create rolebinding "anyNameForRoleBinding" \
    --role="anyNameForRole" \
    --serviceaccount=namespace:default

The replicaset controller use default SA. So we should able to create deployment with about 2 commands also. 

If controller manager connects to API server using trusted/insecure port then all PSS allowed, as authorization (and authentication) is bypass. 

After enabling PodSecurityPolicy admission control plugin, we should have

1. This policy

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: default-allow-all
spec:
  allowPrivilegeEscalation: true
  allowedCapabilities:
  - '*'
  fsGroup:
    rule: RunAsAny
  hostIPC: true
  hostNetwork: true
  hostPID: true
  hostPorts:
  - max: 65535
    min: 0
  privileged: true
  runAsUser:
    rule: RunAsAny
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  volumes:
  - '*'

2. We need clusterrole in target namespace

k -n team-red create clusterrole cr --verb=use --resource=psp

3. To add any new PSP, it should have min these fields

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: example
spec:
  privileged: false  # Don't allow privileged pods!
  # The rest fills in some required fields.
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  runAsUser:
    rule: RunAsAny
  fsGroup:
    rule: RunAsAny
  volumes:
  - '*'

4. At each NS, we should have rolebidning.

k -n team-red create rolebinding rb --clusterrole=cr --user=system:serviceaccounts

OR

We can have clusterrolebinding

k -n team-red create clusterrolebinding crb --clusterrole=cr --user=system:serviceaccounts

References

https://banzaicloud.com/blog/pod-security-policy/

https://www.suse.com/c/rancher_blog/enhancing-kubernetes-security-with-pod-security-policies-part-2/

IAM using tools: keycloak , Active Directory, Amazon IAM

CIS It provides huge amount of free and paid resources to improve IT It provides security. tools, including benchmarks, scanning tools, threat tools, and hardened images. The CIS-CAT®Pro tool evaluates a target system against known issues and performance configurations. CIS also offers dashboards to view the ongoing state of compliance and security considerations.

For minikube setup we need to install kube-bench tool on individual node and run test. The test result recommend steps, for failure and warning cases. We can also run job.yaml at K8s cluster. 

Have a look to summary of CIS for K8s in this Excel file