7. Workload Considerations : Tracee
Tracee (uses eBPF) :monitors system call and kernel events.
- It captures : (1) precise time stamp, (2) uts_name, (3) UID, (4) Command COMM (5) PID, (6) TID/host (7) return code, RET (8) event, and (9) arguments.
- At least 3 volume locations needed. (1) /lib/modules , (2) /usr/src , (3) /tmp/tracee Tracee provides in-depth tracing of container or pod.
Tracee has multiple options. Important ones are
list: list of system calls and other events.
trace: events. specific pid, uid, uts, mntns, pidns, command (comm), system call etc. We can use comparison operator to filer.
--trace pid=new | only trace events from new processes
--trace pid=510,1709 | only trace events from pid 510 or pid 1709
--trace p=510 --trace p=1709 | only trace events from pid 510 or pid 1709 (same as above)
--trace container=new | only trace events from newly created containers
--trace container | only trace events from containers
--trace c | only trace events from containers (same as above)
--trace '!container' | only trace events from the host
--trace uid=0 | only trace events from uid 0
--trace mntns=4026531840 | only trace events from mntns id 4026531840
--trace pidns!=4026531836 | only trace events from pidns id not equal to 4026531840
--trace 'uid>0' | only trace events from uids greater than 0
--trace 'pid>0' --trace 'pid<1000' | only trace events from pids between 0 and 1000
--trace 'u>0' --trace u!=1000 | only trace events from uids greater than 0 but not 1000
--trace event=execve,open | only trace execve and open events
--trace set=fs | trace all file-system related events
--trace s=fs --trace e!=open,openat | trace all file-system related events, but not open(at)
--trace uts!=ab356bc4dd554 | don't trace events from uts name ab356bc4dd554
--trace comm=ls | only trace events from ls command
--trace close.fd=5 | only trace 'close' events that have 'fd' equals 5
--trace openat.pathname=/tmp* | only trace 'openat' events that have 'pathname' prefixed by "/tmp"
--trace openat.pathname!=/tmp/1,/bin/ls | don't trace 'openat' events that have 'pathname' equals /tmp/1 or /bin/ls
--trace comm=bash --trace follow | trace all events that originated from bash or from one of the processes spawned by bash
--trace container=new | all the events from container created after issuing this command
5. Securing Kube-APIServer: RBAC
We can use
kubectl auth reconcile -f "filename.yaml"
to create missing objects and ns. It does not create sa
We can also run with
kubectl auth reconcile -f "filename.yaml" --dry-run=client
--remove-extra-permissions will remove extra permission in role
--remove-extra-subjects will remove extra subjects in binding
The kubectl auth reconcile command will ignore any resources that are not Role, RoleBinding, ClusterRole, and ClusterRoleBinding objects, so you can safely run reconcile on the full set of manifests. Next we can run kubectl apply command.
With kubectl apply command for rolebinding, we cannot update roleRef. it is immutable. However with this command kubectl auth reconcile, we can do it.
All the above points are applicable to ClusterRole and ClusterRoleBinding also.
Reference: https://www.mankier.com/1/kubectl-auth-reconcile
Regardless of namespace, by default, SA with name "default" is added to pod, in all namespace.
In rolebinding if kind = User then only name is sufficinet.
- kind: User
name: dan
6. Networking : Network Policy
- We cannot use namespaceSelector, for target pod. The namespaceSelector is for (1) to and (2) from
- if we do not mention about podSelector at all, then it means none of the pod.
- if we mention empty list , then also it means none of the pod. ingress: []
- For (1) to and (2) from, if you omit specifying a namespaceSelector it does not select any namespaces, which means it will allow traffic only from the namespace the NetworkPolicy is deployed to.
To allow all traffic from current namespace
- from:
- podSelector: {}
- if we mention
ingress: {}
- {}
then it means network all pods from all namespace + outside K8s cluster
- if we mention
- from:
- namespaceSelector: {}
- All policies are add / union. So there is no chance of conflict. Whitelist can be keep growing. Traffic is allowed, if we have at least one rule, that allow the traffic.
- By default, if no policies exist in a namespace, then all ingress and egress traffic is allowed to and from pods in that namespace
- Network Policy is connection level filter. It does not apply to packets
- Network Policy does not terminate established connection.
- cluster level network policy is not part of core API. It is implemented by Calico
Best practices
1. First block all ingress/egress in a namespace
2. start whitelisting for each app
3. While applying egress rule, we have to allow DNS, as it is needed in most cases, to resolve service FQDN
- If no policyTypes are specified on a NetworkPolicy then by default Ingress will always be set
- policyTypes= Egress will be set if the NetworkPolicy has any egress rules.
This is OR condition
kind: NetworkPolicy
name: test-network-policy
namespace: default
- from:
- ipBlock:
- namespaceSelector:
- podSelector:
Here : any pod whose namespace has label key=value OR any pod with namespace of Networkpolicy (default) who has label key=value OR pod has specific IP addresss
This is AND condition
- namespaceSelector:
user: alice
role: client
This is also AND condition
- from:
- protocol: TCP
port: 6379
We have to use containerPort only.
We can have multiple rules by multiple "-from" and/or multiple "-to"
To allow all traffic from all namespace
- from:
- podSelector: {}
namespaceSelector: {}
- from:
- namespaceSelector: {}
Port is always destination port, for both ingress and egress.
We can block egress traffic go outside cluster, by (1) specifying allow to all namespace
- to:
- namespaceSelector: {}
(2) empty list
egress: []
First let's isolate Ingress and Egress both traffic to target pod as per podSelector. These pods belongs to same namespace, as the NetworkPolicy belong to. Here all pods with label role=db in default namespace are isolated.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
name: test-network-policy
namespace: default
role: db
- Ingress
- Egress
8. Issue Detection
Cyber Kill Chain
- Reconnaissance
- Weaponization: Client application data file: PDF, DOC
- Delivery: E-mail attachment, wesbite, USB removable media
- Exploitation:
- Installation:
- Command and Control
- Actions on Objectives
AIDE: Advanced Intrusion Detection Environment
C2: Command and Control
COOP: COntinuity of OPeration
CVEs: Common Vulnerabilities and Exposures
DR: Disaster Recovery
HIDS: Host Intrusion Detection System
IDS: Intrusion Detection System
IPS: Inline Intrusion Prevention System
LM-CIRT: Lockheed Martin Computer Incident Response Team
NIDS: Network Intrusion Detection System
NSM: Network Security Monitoring
NVD: National Vulnerability Database
PIDS: Physical Intrusion Detection System
US-CERT: United States Computer Emergency Readiness Team
7. Workload Considerations : AppArmor
- AppArmor is less complete and simple.
- It is available on Debian and SUSE Linux distribution.
- It supplements UNIX Discretionary Access Control (DAC) model. It provides MAC (Mandatory Access Control).
- Its learning mode (complain mode) is similar to SELinux's Permissive mode. Here, profile violations are logged but not prevented. This log can be turn into profile.
- No security labels are needed, so it is filesystem-neutral
- Administrator can associate security profile to program.
- Unlike SELinux: instead of direct labeling of objects, security policy is applied to pathnames.
- AppArmor profile must be available at worker node, so pod can use it. With Ansible or Puppet, AppArmor profile can be added to worker node, during installation. OR use Daemonset.
- To disable AppArmor for entire cluster, pass --feature-gate=AppArmor=false.
- AppArmor profiles can be managed using PSP
- If AppArmor kernel module is available then
sudo systemctl [start|stop|restart|status] apparmor
- To load or not load at boot time
sudo systemctl [enable|disable] apparmor
- To see current status
sudo apparmor_status
1. Enforced mode
Default mode
2. Complain
also called learning mode
- Pre-package profile
- install along with new software install
- install with AppArmor package: apparmor-profile
- stored at /etc/apparmor.d
- "man apparmor.d" provides documentation.
Other utilities
- apparmor-notify: summary for AppArmor log messages
- disable: unload a single profile. and not load during boot
- easyprof: Help to setup a basic AppArmor profile for a program
- logprof: Scan log. If any AppArmor event found, that is not covered by existing profiles, then suggest.
- genprof: Createa new complain mode profile, using existing profiles as input parameter. Run logprof to scan AppArmor events. All entries in system log has option (A) Allow (D) Deny (I) Ignore (N) New (G) Glob last piece (Q) Quit. until Quit is selected. then new people is created.
- Bane: AppArmor profile generator for docker container. It automatically install profile in directory /etc/apparmor.d/containers/
List all AppArmor utilities using
rpm -qil apparmor-utils | grep bin
Access control to assign in AppArmor profile
- r : Read
- w : Write
- m : Memory map as executable
- k : File locking
- l : Create hard links
- ix : Execute and inherit this profile
- Px : Execute another profile after cleaning environment
- Ux : Execute unconfined after cleaning environment.
Add this metadata to pod
container.apparmor.security.beta.kubernetes.io/<container_name>: <profile_ref>
Note: This is container name, not pod name
This profile name, not profile file name
E.g. container.apparmor.security.beta.kubernetes.io/hello: localhost/k8s-apparmor-example-deny-write
We should have some file at /etc/apparmod.d/ path, which should this profile k8s-apparmor-example-deny-write
- runtime/default
- localhost/<profile_name>
- unconfined
For PSP:
apparmor.security.beta.kubernetes.io/defaultProfileName: <profile_ref>
apparmor.security.beta.kubernetes.io/allowedProfileNames: <profile_ref>[,others...]
7. Workload Considerations : SELinux
SELinux is about rules for which process can access which files, directories, ports etc.
SELinux meets Common Criteria, FIPS standard. SELinux has granular settings, based on user, role, category, sensitivity level etc. SELinux is available on Debian, Redhat and SUSE Linux distribution.
(1) Contexts: labels for file, process and ports. Example: user, role, type, level . -Z to see context and chcon command to change context. commands extended support for Z : ps, ls, cp, mv, mkdir
By default file context do not change, when we move file.
Use restorecon command to restore context as per parent directory.
use 'semanage fcconext' command to set default settings for future object in directory. to apply on existing objects, use restorecon command.
semanage fccconext is policycoreutils-python package
(2) Rules : access control
(3) Policies : Set of rules.
Default policy is to deny any access. Rules are added to allow access. Allowed actions via "Access Vector Cache"
- setsebool
- getsebool
- semanage boolean -l
7. Workload Considerations :
1. Static Analysis
Two Parts
1. Service wrapper: HTTP Interface , Notifier, Notification Storage
2. ClairCore: Download vulnerabilities, compare against index of image
3 Phase/Function
1. Download image layers, scan and generate IndexReport
2. Compare IndexReport with known vulerabilities
3. As per configuration for notifier, notify about vulnerability.
It uses alpine-secdb
It retrieves vuln-list
Trivy checks middle layers of image
Easy to integrate with CICD
2. Dynamic Analysis
Linux commands: perf, ftrace
Tracee (uses eBPF) :monitors system call and kernel events.
- It captures : (1) precise time stamp, (2) uts_name, (3) UID, (4) Command COMM (5) PID, (6) TID/host (7) return code, RET (8) event, and (9) arguments.
- At least 3 volume locations needed. (1) /lib/modules , (2) /usr/src , (3) /tmp/tracee Tracee provides in-depth tracing of container or pod.
Falco by Sysdig: multiple components (user space program, configuration, driver) working together in order to evaluate system calls against rules, and generate alerts when a rule is broken:
rule has lists. rule can have reference to list. List can be part of macro and other list, in addition to part of rule.
rule has 5 k-v pairs. (1) name, (2) description , (3) condition : Filtering expression for events. (4) output, (5) priority. (emergency, alert, critical, error, warning, notice, informational, debug)
rule has 4 optional K-v pairs.(1) enabled. default is true (2) tags (filesystem, software_mgmt, process, database, host, shell, container, cis, users, network) . -T option to disable rules with given tag. -t option to enable. (3) warn_evttypes default is true. (4) skip-if-unknown-filter default is false.
initContainer based approach
Insert initContainer using dynamic admission controller.
initContainer contains scan/verification tool in pod spec
only if initContainer has exit zero code, then rest of pod spec is passed to container engine for execution.
Example: cloud security tools by TrendMicro:
3. Immutable container
Check periodically as security spring scanning.
* container has read/write file system?
* container has ability to elevate privileged users
* other such features.
1. SELinux: Debian, RH, SUSE
* SELinux meets Common Criteria, FIPS standard. SELinux has granular settings, based on user, role, category, sensitivity level etc.
2. AppArmor: Debian, SUSE
* AppArmor is less complete and simple
3. Smack (Simplified MAC Kernel) used with Yocto Linux and Automotive Grade Linux.
4. TOMOYO (by NTT Data corporation) pathname based MAC (Mandatory Access Control)
Use only one tool, instead of cascading multiple tools. so no confusion, which tool is responsible.
5. seccomp: Linux kernel feature. first iteration only allowed system calls are: read, write, exit, sigreturn. with Mode 2, BPF/eBPF determines which system call are allowed.
In K8s, seccomp is used to (1) syscall auditing (2) denial of disallowed call. pod enters to CrashLoopBackoff state.
type: Localhost
localhostProfile: profiles/audit.json
6. Networking
Session state: New, Established, Related : (1) related DNS queries, (2) netfilter need protocol specific module. E.g. FTP, VoIP require extra kernel module.
specify module "-m state --state" OR "-m conntrack --ctstate). state module is subest of conntrack module)
Invalid: out of sequence traffic.
Anatomy of filter
1. Where to apply filter? (input, output, forward) chain
2. Which traffic to filter? (source and destination match criteria)
3. What action? chain are grouped in tables, as per action (filter, NAT, mangle, raw, security)
Applicable to both firewall and nwpolicy
chain v/s action
Action: | PreRouting | Input | Output | Forward | PostRouting |
raw | Y | N | Y | N | N |
mangle | Y | Y | Y | Y | Y |
nat | Y | N | Y | N | Y |
filter | N | Y | Y | Y | N |
security | N | Y | Y | Y | N |
Calico-GlobalNetworkPolicy configure connectivity rules to join WokrloadEndpoint and HostEndPoint in all NS. It has precedence over Profiles. Profiles used before Calico-NetworkPolicy is functional.
Calico n/w policy has
* (1) policy ordering/priority, (2) deny rules, and (3) more flexible match rules, over default K8s policy.
* K8s n/w policy is only for pods. Calico n/w policy is for pod, VM, host interfaces.
* along with Istio it supports securing 5-7 layers match criteria & cryptographic identity.
* works for all cloud provider.
* Neither Ingress nor Egress is specified then default is Ingress
* If no policy then all traffic allowed for pod
* If Ingress policy then only those ingress traffic is allowed.
* If egress policy then only those egress traffic is allowed.
* If no policy then all traffic denied for node
Reference: https://docs.projectcalico.org/security/calico-network-policy
WireGuard: VPN
- easy to use
- less feature
- speed
- with Calico clusters
Ingress Controller: Envoy Proxy, NGINX, Traefik, Ambassador
We need to add annotations accordingly to "Ingress" resource
kubernetes.io/ingress.class: haproxy
kubernetes.io/ingress.class: nginx
We can use https://nip.io/ to convert IP and DNS which contain IP.
In local setup, without Load Balancer, when we use NodePort, we have to use higher port in HOST, while using curl to ingress controller. E.g.
curl -H 'Host: nginx.'
Service Mesh: Istio (security features: peer authentication, authorization, identity management. Zero-Trust Networking), Linkerd (for security), Countour (VMWare), Aspen (old name nginx. F5 purchased nginx and renamed as Aspen)
5. Securing Kube-APIServer: PSP, IAM, CIS
Pod Security Policy (PSP)
- A set of rules
- provide/modify default values for fields
- change pod
- PSP ordered by name before applied.
- Deprecated in K8s 1.21
- will be removed in K8s 1.25
Even if you are only planning on changing a single value, the policy file must contain several entries. Sample PSP, where pod can do anything
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
name: basicpolicy
privileged: true
rule: RunAsAny
rule: RunAsAny
rule: RunAsAny
rule: RunAsAny
- '*'
Most commonly changed parameters
1. privileged
2. runAsUser
Reference: https://kubernetes.io/docs/concepts/policy/pod-security-policy/
For allowedUnsafeSysctls and forbiddenSysctls
- kernel (common prefix: kernel.)
- kernel.shm*,
- kernel.msg*,
- kernel.sem,
- networking (common prefix: net.)
- virtual memory (common prefix: vm.)
- MDADM (common prefix: dev.)
If we use RoleBinding instead of ClusterRoleBinding then it is for same namespace
--resource=podsecuritypolicy \
--resource-name=" # This Is optional"
The replicaset controller use default SA. So we should able to create deployment with about 2 commands also.
If controller manager connects to API server using trusted/insecure port then all PSS allowed, as authorization (and authentication) is bypass.
After enabling PodSecurityPolicy admission control plugin, we should have
1. This policy
kind: PodSecurityPolicy
name: default-allow-all
allowPrivilegeEscalation: true
- '*'
rule: RunAsAny
hostIPC: true
hostNetwork: true
hostPID: true
- max: 65535
min: 0
privileged: true
rule: RunAsAny
rule: RunAsAny
rule: RunAsAny
- '*'
2. We need clusterrole in target namespace
k -n team-red create clusterrole cr --verb=use --resource=psp
3. To add any new PSP, it should have min these fields
kind: PodSecurityPolicy
name: example
privileged: false # Don't allow privileged pods!
# The rest fills in some required fields.
rule: RunAsAny
rule: RunAsAny
rule: RunAsAny
- '*'
4. At each NS, we should have rolebidning.
k -n team-red create rolebinding rb --clusterrole=cr --user=system:serviceaccounts
We can have clusterrolebinding
k -n team-red create clusterrolebinding crb --clusterrole=cr --user=system:serviceaccounts
IAM using tools: keycloak , Active Directory, Amazon IAM
CIS It provides huge amount of free and paid resources to improve IT It provides security. tools, including benchmarks, scanning tools, threat tools, and hardened images. The CIS-CAT®Pro tool evaluates a target system against known issues and performance configurations. CIS also offers dashboards to view the ongoing state of compliance and security considerations.
For minikube setup we need to install kube-bench tool on individual node and run test. The test result recommend steps, for failure and warning cases. We can also run job.yaml at K8s cluster.
Have a look to summary of CIS for K8s in this Excel file