Kubernetes Admission Controllers
1. What is an admission controller
An admission controller is a piece of code that intercepts requests to the Kubernetes API server prior to persistence of the object, but after the request is authenticated and authorized.
The controllers are compiled into the kube-apiserver binary, and may only be configured by the cluster administrator. There are two special controllers: MutatingAdmissionWebhook and ValidatingAdmissionWebhook. These execute the mutating and validating (respectively) admission control webhooks which are configured in the API.
Admission controllers limit requests to create, delete, modify objects or connect to proxy. They do not limit requests to read objects.
The admission control process proceeds in two phases. In the first phase, mutating admission controllers are run. In the second phase, validating admission controllers are run. Note again that some of the controllers are both.
If any of the controllers in either phase reject the request, the entire request is rejected immediately and an error is returned to the end-user.
1.1. Turn on/off an admission controller
The Kubernetes API server flag enable-admission-plugins
takes a comma-delimited list of admission control plugins to invoke prior to modifying objects in the cluster.
kube-apiserver --enable-admission-plugins=NamespaceLifecycle,LimitRanger ...
The Kubernetes API server flag disable-admission-plugins
takes a comma-delimited list of admission control plugins to be disabled, even if they are in the list of plugins enabled by default.
kube-apiserver --disable-admission-plugins=PodNodeSelector,AlwaysDeny ...
To see which admission plugins are enabled by default:
kube-apiserver -h | grep enable-admission-plugins
$ docker run --rm -it k8s.gcr.io/kube-apiserver:v1.22.3 kube-apiserver -h | grep enable-admission-plugins
--enable-admission-plugins strings admission plugins that should be enabled in addition to default enabled ones (NamespaceLifecycle, LimitRanger, ServiceAccount, TaintNodesByCondition, PodSecurity, Priority, DefaultTolerationSeconds, DefaultStorageClass, StorageObjectInUseProtection, PersistentVolumeClaimResize, RuntimeClass, CertificateApproval, CertificateSigning, CertificateSubjectRestriction, DefaultIngressClass, MutatingAdmissionWebhook, ValidatingAdmissionWebhook, ResourceQuota). Comma-delimited list of admission plugins: AlwaysAdmit, AlwaysDeny, AlwaysPullImages, CertificateApproval, CertificateSigning, CertificateSubjectRestriction, DefaultIngressClass, DefaultStorageClass, DefaultTolerationSeconds, DenyServiceExternalIPs, EventRateLimit, ExtendedResourceToleration, ImagePolicyWebhook, LimitPodHardAntiAffinityTopology, LimitRanger, MutatingAdmissionWebhook, NamespaceAutoProvision, NamespaceExists, NamespaceLifecycle, NodeRestriction, OwnerReferencesPermissionEnforcement, PersistentVolumeClaimResize, PersistentVolumeLabel, PodNodeSelector, PodSecurity, PodSecurityPolicy, PodTolerationRestriction, Priority, ResourceQuota, RuntimeClass, SecurityContextDeny, ServiceAccount, StorageObjectInUseProtection, TaintNodesByCondition, ValidatingAdmissionWebhook. The order of plugins in this flag does not matter.
1.2. Dynamic Admission Control
In addition to compiled-in admission plugins, admission plugins can be developed as extensions and run as webhooks configured at runtime.
Admission webhooks are HTTP callbacks that receive admission requests and do something with them. You can define both validating admission webhook and mutating admission webhook admission webhooks.
The webhook handles the AdmissionReview
request sent by the apiservers, and sends back its decision as an AdmissionReview
object in the same version it received.
Mutating admission webhooks are invoked first, and can modify objects sent to the API server to enforce custom defaults. After all object modifications are complete, and after the incoming object is validated by the API server, validating admission webhooks are invoked and can reject requests to enforce custom policies.
You can dynamically configure what resources are subject to what admission webhooks via ValidatingWebhookConfiguration or MutatingWebhookConfiguration.
You can use the follow commands to inspect details about each config field:
$ kubectl explain mutatingwebhookconfigurations
$ kubectl explain validatingwebhookconfigurations
The following is an example ValidatingWebhookConfiguration
, a mutating webhook configuration is similar.
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
name: "pod-policy.kube-admission.io"
webhooks:
- name: "pod-policy.kube-admission.io"
rules:
- apiGroups: [""]
apiVersions: ["v1"]
operations: ["CREATE"]
resources: ["pods"]
scope: "Namespaced"
clientConfig:
caBundle: LS0....
service:
namespace: "default"
name: "kube-admission"
path: /always-allow-delay-5s
admissionReviewVersions: ["v1"]
sideEffects: None
timeoutSeconds: 10
Note: When using clientConfig.service , the server cert must be valid for <svc_name>.<svc_namespace>.svc .
|
Besides, there’s a sample of admission controller at my GitHub: https://github.com/qqbuby/sample-kube-admission-controller.
2. Pod and Container Security Context
Principle of Least Privilege
In information security, computer science, and other fields, the principle of least privilege (PoLP), also known as the principle of minimal privilege or the principle of least authority, requires that in a particular abstraction layer of a computing environment, every module (such as a process, a user, or a program, depending on the subject) must be able to access only the information and resources that are necessary for its legitimate purpose. Benefits of the principle include:
|
A security context defines privilege and access control settings for a Pod or Container. Security context settings include, but are not limited to:
-
Discretionary Access Control: Permission to access an object, like a file, is based on user ID (UID) and group ID (GID).
-
Security Enhanced Linux (SELinux): Objects are assigned security labels.
-
Running as privileged or unprivileged.
-
Linux Capabilities: Give a process some privileges, but not all the privileges of the root user.
-
AppArmor: Use program profiles to restrict the capabilities of individual programs.
-
Seccomp: Filter a process’s system calls.
-
AllowPrivilegeEscalation: Controls whether a process can gain more privileges than its parent process. This bool directly controls whether the no_new_privs flag gets set on the container process.
AllowPrivilegeEscalation is true always when the container is:
1) run as Privileged
OR
2) has
CAP_SYS_ADMIN
. -
readOnlyRootFilesystem: Mounts the container’s root filesystem as read-only.
For more information about security mechanisms in Linux, see Overview of Linux Kernel Security Features.
2.1. Set the security context for a Pod
To specify security settings for a Pod, include the securityContext
field in the Pod specification.
The securityContext
field is a PodSecurityContext object.
The security settings that you specify for a Pod apply to all Containers in the Pod.
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo
spec:
securityContext:
runAsUser: 1000
runAsGroup: 3000
fsGroup: 2000
volumes:
- name: sec-ctx-vol
emptyDir: {}
containers:
- name: sec-ctx-demo
image: busybox:1
stdin: true
tty: true
volumeMounts:
- name: sec-ctx-vol
mountPath: /data/demo
securityContext:
allowPrivilegeEscalation: false
In the configuration file, the runAsUser
field specifies that for any Containers in the Pod, all processes run with user ID 1000. The runAsGroup
field specifies the primary group ID of 3000 for all processes within any containers of the Pod. If this field is omitted, the primary group ID of the containers will be root(0). Any files created will also be owned by user 1000 and group 3000 when runAsGroup
is specified. Since fsGroup
field is specified, all processes of the container are also part of the supplementary group ID 2000. The owner for volume /data/demo
and any files created in that volume will be Group ID 2000.
$ kubectl apply -f pods/security/security-context.yaml
$ kubectl exec -it security-context-demo -- sh
/ $ id
uid=1000 gid=3000 groups=2000
/ $ ls -l /data
total 4
drwxrwsrwx 2 root 2000 4096 Dec 16 09:14 demo
/ $ touch /data/demo/testfile
/ $ ls -l /data/demo/testfile
-rw-r--r-- 1 1000 2000 0 Dec 16 09:15 /data/demo/testfile
/ $ stat /data/demo/
File: /data/demo/
Size: 4096 Blocks: 8 IO Block: 4096 directory
Device: 801h/2049d Inode: 3539320 Links: 2
Access: (2777/drwxrwsrwx) Uid: ( 0/ root) Gid: ( 2000/ UNKNOWN)
<...>
/ $ cat /etc/passwd
root:x:0:0:root:/root:/bin/sh
<...>
www-data:x:33:33:www-data:/var/www:/bin/false
operator:x:37:37:Operator:/var:/bin/false
nobody:x:65534:65534:nobody:/home:/bin/false
/ $ cat /etc/group
root:x:0:
<...>
nobody:x:65534:
/ $ exit
2.2. Set the security context for a Container
To specify security settings for a Container, include the securityContext
field in the Container manifest.
The securityContext
field is a SecurityContext object.
Security settings that you specify for a Container apply only to the individual Container, and they override settings made at the Pod level when there is overlap.
Container settings do not affect the Pod’s Volumes.
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo-2
spec:
securityContext:
runAsUser: 1000
containers:
- name: sec-ctx-demo-2
image: busybox:1
stdin: true
tty: true
securityContext:
runAsUser: 2000
allowPrivilegeEscalation: false
$ kubectl apply -f pods/security/security-context-2.yaml
$ kubectl exec -it security-context-demo-2 -- sh
/ $ id
uid=2000 gid=0(root)
/ $ exit
2.3. Set capabilities for a Container
With Linux capabilities, you can grant certain privileges to a process without granting all the privileges of the root user. To add or remove Linux capabilities for a Container, include the capabilities
field in the securityContext
section of the Container manifest.
First, see what happens when you don’t include a capabilities
field.
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo-3
spec:
containers:
- name: sec-ctx-3
image: k8s.gcr.io/echoserver:1.10
ports:
- containerPort: 8080
$ kubectl exec -it security-context-demo-3 -- sh
# id
uid=0(root) gid=0(root) groups=0(root)
# cat /proc/1/status | grep Cap
CapInh: 00000000a80425fb
CapPrm: 00000000a80425fb
CapEff: 00000000a80425fb
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000
# exit
Next, run a Container that is the same as the preceding container, except that it has additional capabilities set.
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo-4
spec:
containers:
- name: sec-ctx-4
image: k8s.gcr.io/echoserver:1.10
ports:
- containerPort: 8080
securityContext:
capabilities:
add: ["NET_ADMIN", "SYS_TIME"]
$ kubectl exec -it security-context-demo-4 -- sh
# id
uid=0(root) gid=0(root) groups=0(root)
# cat /proc/1/status | grep Cap
CapInh: 00000000aa0435fb
CapPrm: 00000000aa0435fb
CapEff: 00000000aa0435fb
CapBnd: 00000000aa0435fb
CapAmb: 0000000000000000
# exit
Compare the capabilities of the two Containers:
00000000a80425fb
00000000aa0435fb
In the capability bitmap of the first container, bits 12 and 25 are clear. In the second container, bits 12 and 25 are set. Bit 12 is CAP_NET_ADMIN
, and bit 25 is CAP_SYS_TIME
. See capability.h for definitions of the capability constants.
Linux capability constants have the form |
2.4. Clean up
Delete the Pod:
kubectl delete pod security-context-demo
kubectl delete pod security-context-demo-2
kubectl delete pod security-context-demo-3
kubectl delete pod security-context-demo-4
3. What is a Pod Security Policy?
Kubernetes has officially deprecated PodSecurityPolicy in version 1.21. PodSecurityPolicy will be shut down in version 1.25. PodSecurityPolicy is being replaced by a new, simplified PodSecurity admission controller. |
PodSecurityPolicy is a built-in admission controller that allows a cluster administrator to control security-sensitive aspects of the Pod specification.
A PodSecurityPolicy is a built-in admission controller that allows a cluster administrator to control security-sensitive aspects of the Pod specification to create and update Pods on your cluster.
In most Kubernetes clusters, RBAC (Role-Based Access Control) rules control access to these resources. list
, get
, create
, edit
, and delete
are the sorts of API operations that RBAC cares about, but RBAC does not consider what settings are being put into the resources it controls.
To control what sorts of settings are allowed in the resources defined in your cluster, you need Admission Control in addition to RBAC.
Kubernetes SIG Security, SIG Auth, and a diverse collection of other community members have been working together for months to ensure that what’s coming next is going to be awesome. We have developed a Kubernetes Enhancement Proposal (KEP 2579) and a prototype for a new feature, currently being called by the temporary name "PSP Replacement Policy."
If your use of PSP is relatively simple, with a few policies and straightforward binding to service accounts in each namespace, you will likely find PSP Replacement Policy to be a good match for your needs. Evaluate your PSPs compared to the Kubernetes Pod Security Standards to get a feel for where you’ll be able to use the Restricted, Baseline, and Privileged policies. Please follow along with or contribute to the KEP and subsequent development, and try out the Alpha release of PSP Replacement Policy when it becomes available.
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: privileged
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
spec:
privileged: true
allowPrivilegeEscalation: true
allowedCapabilities:
- '*'
volumes:
- '*'
hostNetwork: true
hostPorts:
- min: 0
max: 65535
hostIPC: true
hostPID: true
runAsUser:
rule: 'RunAsAny'
seLinux:
rule: 'RunAsAny'
supplementalGroups:
rule: 'RunAsAny'
fsGroup:
rule: 'RunAsAny'
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: restricted
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: 'docker/default,runtime/default'
apparmor.security.beta.kubernetes.io/allowedProfileNames: 'runtime/default'
apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default'
spec:
privileged: false
# Required to prevent escalations to root.
allowPrivilegeEscalation: false
requiredDropCapabilities:
- ALL
# Allow core volume types.
volumes:
- 'configMap'
- 'emptyDir'
- 'projected'
- 'secret'
- 'downwardAPI'
# Assume that ephemeral CSI drivers & persistentVolumes set up by the cluster admin are safe to use.
- 'csi'
- 'persistentVolumeClaim'
- 'ephemeral'
hostNetwork: false
hostIPC: false
hostPID: false
runAsUser:
# Require the container to run without root privileges.
rule: 'MustRunAsNonRoot'
seLinux:
# This policy assumes the nodes are using AppArmor rather than SELinux.
rule: 'RunAsAny'
supplementalGroups:
rule: 'MustRunAs'
ranges:
# Forbid adding the root group.
- min: 1
max: 65535
fsGroup:
rule: 'MustRunAs'
ranges:
# Forbid adding the root group.
- min: 1
max: 65535
readOnlyRootFilesystem: false
3.1. Policy Order
In addition to restricting pod creation and update, pod security policies can also be used to provide default values for many of the fields that it controls. When multiple policies are available, the pod security policy controller selects policies according to the following criteria:
-
PodSecurityPolicies which allow the pod as-is, without changing defaults or mutating the pod, are preferred. The order of these non-mutating PodSecurityPolicies doesn’t matter.
-
If the pod must be defaulted or mutated, the first PodSecurityPolicy (ordered by name) to allow the pod is selected.
During update operations (during which mutations to pod specs are disallowed) only non-mutating PodSecurityPolicies are used to validate the pod. |
3.2. Enabling Pod Security Policies
Pod security policy control is implemented as an optional admission controller. PodSecurityPolicies are enforced by enabling the admission controller, but doing so without authorizing any policies will prevent any pods from being created in the cluster.
name: kube-apiserver
namespace: kube-system
spec:
containers:
- command:
- kube-apiserver
# ...
- --enable-admission-plugins=NodeRestriction,PodSecurityPolicy
# ...
$ kubectl create ns psp-test
namespace/psp-test created
$ kubectl create rolebinding -n psp-test default:edit --clusterrole edit --serviceaccount psp-test:default
rolebinding.rbac.authorization.k8s.io/default:edit created
$ kubectl --as system:serviceaccount:psp-test:default create -n psp-test -f- <<EOF
apiVersion: v1
kind: Pod
metadata:
name: pause
spec:
containers:
- name: pause
image: k8s.gcr.io/pause
EOF
Error from server (Forbidden): error when creating "STDIN": pods "pause" is forbidden: PodSecurityPolicy: unable to admit pod: []
$ kubectl delete ns psp-test
namespace "psp-test" deleted
3.3. Authorizing Policies
When a PodSecurityPolicy resource is created, it does nothing. In order to use it, the requesting user or target pod’s service account must be authorized to use the policy, by allowing the use
verb on the policy.
Most Kubernetes pods are not created directly by users. Instead, they are typically created indirectly as part of a Deployment, ReplicaSet, or other templated controller via the controller manager. Granting the controller access to the policy would grant access for all pods created by that controller, so the preferred method for authorizing policies is to grant access to the pod’s service account.
RBAC is a standard Kubernetes authorization mode, and can easily be used to authorize use of policies.
First, a Role
or ClusterRole
needs to grant access to use the desired policies. The rules to grant access look like this:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: <role name>
rules:
- apiGroups: ['policy']
resources: ['podsecuritypolicies']
verbs: ['use']
resourceNames:
- <list of policies to authorize>
Then the (Cluster)Role
is bound to the authorized user(s):
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: <binding name>
roleRef:
kind: ClusterRole
name: <role name>
apiGroup: rbac.authorization.k8s.io
subjects:
# Authorize all service accounts in a namespace (recommended):
- kind: Group
apiGroup: rbac.authorization.k8s.io
name: system:serviceaccounts:<authorized namespace>
# Authorize specific service accounts (not recommended):
- kind: ServiceAccount
name: <authorized service account name>
namespace: <authorized pod namespace>
# Authorize specific users (not recommended):
- kind: User
apiGroup: rbac.authorization.k8s.io
name: <authorized user name>
If a RoleBinding
(not a ClusterRoleBinding
) is used, it will only grant usage for pods being run in the same namespace as the binding. This can be paired with system groups to grant access to all pods run in the namespace:
# Authorize all service accounts in a namespace:
- kind: Group
apiGroup: rbac.authorization.k8s.io
name: system:serviceaccounts
# Or equivalently, all authenticated users in a namespace:
- kind: Group
apiGroup: rbac.authorization.k8s.io
name: system:authenticated
3.4. kube-psp-advisor
Kubernetes Pod Security Policy Advisor (a.k.a kube-psp-advisor) is an opensource tool from Sysdig. kube-psp-advisor scans the existing security context from Kubernetes resources like deployments, daementsets, replicasets, etc taken as the reference model we want to enforce and then automatically generates the Pod Security Policy for all the resources in the entire cluster.
$ kubectl krew install advise-psp
Updated the local copy of plugin index.
Installing plugin: advise-psp
Installed plugin: advise-psp
\
| Use this plugin:
| kubectl advise-psp
| Documentation:
| https://github.com/sysdiglabs/kube-psp-advisor
/
WARNING: You installed plugin "advise-psp" from the krew-index plugin repository.
These plugins are not audited for security by the Krew maintainers.
Run them at your own risk.
$ kubectl advise-psp inspect --namespace default --report
{
"podSecuritySpecs": {
"hostIPC": [],
"hostNetwork": [],
"hostPID": []
},
"podVolumeTypes": {
...
3.5. Example
$ kubectl apply -f - <<EOF
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: psp-hostpath
spec:
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
runAsUser:
rule: RunAsAny
fsGroup:
rule: RunAsAny
volumes:
- configMap
- emptyDir
- projected
- secret
- downwardAPI
EOF
Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
podsecuritypolicy.policy/psp-hostpath created
$ kubectl apply -f - <<EOF
> apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: psp:hostpath
rules:
- apiGroups: ['policy']
resources: ['podsecuritypolicies']
verbs: ['use']
resourceNames:
- psp-hostpath
EOF
clusterrole.rbac.authorization.k8s.io/psp:hostpath unchanged
$ kubectl create ns psp-test
namespace/psp-test created
$ kubectl create rolebinding -n psp-test edit --clusterrole edit --serviceaccount psp-test:default
rolebinding.rbac.authorization.k8s.io/edit created
$ kubectl create rolebinding -n psp-test psp:hostpath --clusterrole psp:hostpath --serviceaccount psp-test:default
rolebinding.rbac.authorization.k8s.io/psp:hostpath created
$ kubectl apply -n psp-test --as system:serviceaccount:psp-test:default -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: pause
spec:
containers:
- name: pause
image: k8s.gcr.io/pause:3.6
EOF
pod/pause created
$ kubectl apply -n psp-test --as system:serviceaccount:psp-test:default -f - <<EOF
> apiVersion: v1
kind: Pod
metadata:
name: hostpath
spec:
containers:
- name: pause
image: k8s.gcr.io/pause:3.6
volumes:
- name: hostpath
hostPath:
path: /tmp
EOF
Error from server (Forbidden): error when creating "STDIN": pods "hostpath" is forbidden: PodSecurityPolicy: unable to admit pod: [spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used]
$ kubectl delete ns psp-test
namespace "psp-test" deleted
$ kubectl delete psp psp-hostpath
Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
podsecuritypolicy.policy "psp-hostpath" deleted
$ kubectl delete clusterrole psp:hostpath
clusterrole.rbac.authorization.k8s.io "psp:hostpath" deleted
4. Pod Security Admission Controller
FEATURE STATE: Kubernetes v1.23 [beta]
|
The Kubernetes Pod Security Standards define different isolation levels for Pods. These standards let you define how you want to restrict the behavior of pods in a clear, consistent fashion.
Kubernetes offers a built-in Pod Security admission controller, the successor to PodSecurityPolicies.
Pod security restrictions are applied at the namespace level when pods are created.
5. References
-
https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/
-
https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/
-
https://kubernetes.io/docs/concepts/policy/pod-security-policy/
-
https://kubernetes.io/blog/2021/04/06/podsecuritypolicy-deprecation-past-present-and-future/
-
https://kubernetes.io/docs/concepts/security/pod-security-admission/
-
https://www.suse.com/c/rancher_blog/enhancing-kubernetes-security-with-pod-security-policies-part-2/
-
https://sysdig.com/blog/enable-kubernetes-pod-security-policy/
-
https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
-
https://bridgecrew.io/blog/creating-a-secure-kubernetes-nginx-deployment-using-checkov/