From 9213f0775f4bbe225b6a9b5889e9541100609959 Mon Sep 17 00:00:00 2001 From: Nate W <4453979+nate-double-u@users.noreply.github.com> Date: Tue, 3 May 2022 17:13:38 -0700 Subject: [PATCH 01/77] Tracking commit for v1.25 docs From ed9ea9c4f26bf225d3359d86b519db29d22cc9d7 Mon Sep 17 00:00:00 2001 From: "Lubomir I. Ivanov" Date: Thu, 19 May 2022 22:09:27 +0300 Subject: [PATCH 02/77] kubeadm: apply changes around "master" taint for 1.25 The "master" taint is no longer applied on control plane nodes by kubeadm 1.25. Remove mentions of the taint from the documentation: - implementation details - create a kubeadm cluster - known labels / taints --- .../labels-annotations-taints/_index.md | 17 +++-------------- .../kubeadm/implementation-details.md | 8 +++----- .../tools/kubeadm/create-cluster-kubeadm.md | 13 ++++--------- .../tools/kubeadm/troubleshooting-kubeadm.md | 2 +- 4 files changed, 11 insertions(+), 29 deletions(-) diff --git a/content/en/docs/reference/labels-annotations-taints/_index.md b/content/en/docs/reference/labels-annotations-taints/_index.md index e68cb668d3081..c6baa12557825 100644 --- a/content/en/docs/reference/labels-annotations-taints/_index.md +++ b/content/en/docs/reference/labels-annotations-taints/_index.md @@ -493,9 +493,9 @@ The kubelet checks D-value of the size of `/proc/sys/kernel/pid_max` and the PID Example: `node.kubernetes.io/out-of-service:NoExecute` -A user can manually add the taint to a Node marking it out-of-service. If the `NodeOutOfServiceVolumeDetach` +A user can manually add the taint to a Node marking it out-of-service. If the `NodeOutOfServiceVolumeDetach` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled on -`kube-controller-manager`, and a Node is marked out-of-service with this taint, the pods on the node will be forcefully deleted if there are no matching tolerations on it and volume detach operations for the pods terminating on the node will happen immediately. This allows the Pods on the out-of-service node to recover quickly on a different node. +`kube-controller-manager`, and a Node is marked out-of-service with this taint, the pods on the node will be forcefully deleted if there are no matching tolerations on it and volume detach operations for the pods terminating on the node will happen immediately. This allows the Pods on the out-of-service node to recover quickly on a different node. {{< caution >}} Refer to @@ -627,7 +627,7 @@ This determines whether a user can modify the mode of the source volume when a {{< glossary_tooltip text="PersistentVolumeClaim" term_id="persistent-volume-claim" >}} is being created from a VolumeSnapshot. -Refer to [Converting the volume mode of a Snapshot](/docs/concepts/storage/volume-snapshots/#convert-volume-mode) +Refer to [Converting the volume mode of a Snapshot](/docs/concepts/storage/volume-snapshots/#convert-volume-mode) and the [Kubernetes CSI Developer Documentation](https://kubernetes-csi.github.io/docs/) for more information. ## Annotations used for audit @@ -695,14 +695,3 @@ Used on: Node Example: `node-role.kubernetes.io/control-plane:NoSchedule` Taint that kubeadm applies on control plane nodes to allow only critical workloads to schedule on them. - -### node-role.kubernetes.io/master - -Used on: Node - -Example: `node-role.kubernetes.io/master:NoSchedule` - -Taint that kubeadm applies on control plane nodes to allow only critical workloads to schedule on them. - -{{< note >}} Starting in v1.20, this taint is deprecated in favor of `node-role.kubernetes.io/control-plane` -and will be removed in v1.25.{{< /note >}} diff --git a/content/en/docs/reference/setup-tools/kubeadm/implementation-details.md b/content/en/docs/reference/setup-tools/kubeadm/implementation-details.md index 74428b914834e..b98fdcd66e218 100644 --- a/content/en/docs/reference/setup-tools/kubeadm/implementation-details.md +++ b/content/en/docs/reference/setup-tools/kubeadm/implementation-details.md @@ -319,12 +319,10 @@ Please note that: As soon as the control plane is available, kubeadm executes following actions: - Labels the node as control-plane with `node-role.kubernetes.io/control-plane=""` -- Taints the node with `node-role.kubernetes.io/master:NoSchedule` and `node-role.kubernetes.io/control-plane:NoSchedule` +- Taints the node with `node-role.kubernetes.io/control-plane:NoSchedule` -Please note that: - -1. The `node-role.kubernetes.io/master` taint is deprecated and will be removed in kubeadm version 1.25 -1. Mark control-plane phase phase can be invoked individually with the [`kubeadm init phase mark-control-plane`](/docs/reference/setup-tools/kubeadm/kubeadm-init-phase/#cmd-phase-mark-control-plane) command +Please note that the phase to mark the control-plane phase phase can be invoked +individually with the [`kubeadm init phase mark-control-plane`](/docs/reference/setup-tools/kubeadm/kubeadm-init-phase/#cmd-phase-mark-control-plane) command. ### Configure TLS-Bootstrapping for node joining diff --git a/content/en/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm.md b/content/en/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm.md index d7897dfec5817..9a7e4056dc7e9 100644 --- a/content/en/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm.md +++ b/content/en/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm.md @@ -303,7 +303,7 @@ reasons. If you want to be able to schedule Pods on the control plane nodes, for example for a single machine Kubernetes cluster, run: ```bash -kubectl taint nodes --all node-role.kubernetes.io/control-plane- node-role.kubernetes.io/master- +kubectl taint nodes --all node-role.kubernetes.io/control-plane- ``` The output will look something like: @@ -313,14 +313,9 @@ node "test-01" untainted ... ``` -This will remove the `node-role.kubernetes.io/control-plane` and -`node-role.kubernetes.io/master` taints from any nodes that have them, -including the control plane nodes, meaning that the scheduler will then be able -to schedule Pods everywhere. - -{{< note >}} -The `node-role.kubernetes.io/master` taint is deprecated and kubeadm will stop using it in version 1.25. -{{< /note >}} +This will remove the `node-role.kubernetes.io/control-plane:NoSchedule` taint +from any nodes that have it, including the control plane nodes, meaning that the +scheduler will then be able to schedule Pods everywhere. ### Joining your nodes {#join-nodes} diff --git a/content/en/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm.md b/content/en/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm.md index ac8d89ee3a17b..147ccd000e1e0 100644 --- a/content/en/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm.md +++ b/content/en/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm.md @@ -351,7 +351,7 @@ A known solution is to patch the kube-proxy DaemonSet to allow scheduling it on nodes regardless of their conditions, keeping it off of other nodes until their initial guarding conditions abate: ``` -kubectl -n kube-system patch ds kube-proxy -p='{ "spec": { "template": { "spec": { "tolerations": [ { "key": "CriticalAddonsOnly", "operator": "Exists" }, { "effect": "NoSchedule", "key": "node-role.kubernetes.io/master" }, { "effect": "NoSchedule", "key": "node-role.kubernetes.io/control-plane" } ] } } } }' +kubectl -n kube-system patch ds kube-proxy -p='{ "spec": { "template": { "spec": { "tolerations": [ { "key": "CriticalAddonsOnly", "operator": "Exists" }, { "effect": "NoSchedule", "key": "node-role.kubernetes.io/control-plane" } ] } } } }' ``` The tracking issue for this problem is [here](https://github.com/kubernetes/kubeadm/issues/1027). From edb74f1decc65770b3750364aefbeb88de8a5f27 Mon Sep 17 00:00:00 2001 From: "Lubomir I. Ivanov" Date: Tue, 7 Jun 2022 17:01:01 +0300 Subject: [PATCH 03/77] update kubeadm pages to use registry.k8s.io k8s.gcr.io is a deprecated in favor of registry.k8s.io. The kubeadm code in k/k was already changed to use the new domain name. --- .../setup-tools/kubeadm/implementation-details.md | 4 ++-- .../reference/setup-tools/kubeadm/kubeadm-init.md | 12 ++++++------ .../tools/kubeadm/create-cluster-kubeadm.md | 2 +- .../tools/kubeadm/high-availability.md | 6 +++--- .../tools/kubeadm/setup-ha-etcd-with-kubeadm.md | 4 ++-- 5 files changed, 14 insertions(+), 14 deletions(-) diff --git a/content/en/docs/reference/setup-tools/kubeadm/implementation-details.md b/content/en/docs/reference/setup-tools/kubeadm/implementation-details.md index 74428b914834e..038f210f73d7e 100644 --- a/content/en/docs/reference/setup-tools/kubeadm/implementation-details.md +++ b/content/en/docs/reference/setup-tools/kubeadm/implementation-details.md @@ -199,7 +199,7 @@ Static Pod manifest share a set of common properties: Please note that: -1. All images will be pulled from k8s.gcr.io by default. See [using custom images](/docs/reference/setup-tools/kubeadm/kubeadm-init/#custom-images) for customizing the image repository +1. All images will be pulled from registry.k8s.io by default. See [using custom images](/docs/reference/setup-tools/kubeadm/kubeadm-init/#custom-images) for customizing the image repository 2. In case of kubeadm is executed in the `--dry-run` mode, static Pods files are written in a temporary folder 3. Static Pod manifest generation for control plane components can be invoked individually with the [`kubeadm init phase control-plane all`](/docs/reference/setup-tools/kubeadm/kubeadm-init-phase/#cmd-phase-control-plane) command @@ -289,7 +289,7 @@ a local etcd instance running in a Pod with following attributes: Please note that: -1. The etcd image will be pulled from `k8s.gcr.io` by default. See [using custom images](/docs/reference/setup-tools/kubeadm/kubeadm-init/#custom-images) for customizing the image repository +1. The etcd image will be pulled from `registry.k8s.io` by default. See [using custom images](/docs/reference/setup-tools/kubeadm/kubeadm-init/#custom-images) for customizing the image repository 2. in case of kubeadm is executed in the `--dry-run` mode, the etcd static Pod manifest is written in a temporary folder 3. Static Pod manifest generation for local etcd can be invoked individually with the [`kubeadm init phase etcd local`](/docs/reference/setup-tools/kubeadm/kubeadm-init-phase/#cmd-phase-etcd) command diff --git a/content/en/docs/reference/setup-tools/kubeadm/kubeadm-init.md b/content/en/docs/reference/setup-tools/kubeadm/kubeadm-init.md index fc87e796c2ed7..8d51c1ef694e5 100644 --- a/content/en/docs/reference/setup-tools/kubeadm/kubeadm-init.md +++ b/content/en/docs/reference/setup-tools/kubeadm/kubeadm-init.md @@ -212,11 +212,11 @@ kubeadm config images pull You can pass `--config` to the above commands with a [kubeadm configuration file](#config-file) to control the `kubernetesVersion` and `imageRepository` fields. -All default `k8s.gcr.io` images that kubeadm requires support multiple architectures. +All default `registry.k8s.io` images that kubeadm requires support multiple architectures. ### Using custom images {#custom-images} -By default, kubeadm pulls images from `k8s.gcr.io`. If the +By default, kubeadm pulls images from `registry.k8s.io`. If the requested Kubernetes version is a CI label (such as `ci/latest`) `gcr.io/k8s-staging-ci-images` is used. @@ -225,18 +225,18 @@ Allowed customization are: * To provide `kubernetesVersion` which affects the version of the images. * To provide an alternative `imageRepository` to be used instead of - `k8s.gcr.io`. + `registry.k8s.io`. * To provide a specific `imageRepository` and `imageTag` for etcd or CoreDNS. -Image paths between the default `k8s.gcr.io` and a custom repository specified using +Image paths between the default `registry.k8s.io` and a custom repository specified using `imageRepository` may differ for backwards compatibility reasons. For example, -one image might have a subpath at `k8s.gcr.io/subpath/image`, but be defaulted +one image might have a subpath at `registry.k8s.io/subpath/image`, but be defaulted to `my.customrepository.io/image` when using a custom repository. To ensure you push the images to your custom repository in paths that kubeadm can consume, you must: -* Pull images from the defaults paths at `k8s.gcr.io` using `kubeadm config images {list|pull}`. +* Pull images from the defaults paths at `registry.k8s.io` using `kubeadm config images {list|pull}`. * Push images to the paths from `kubeadm config images list --config=config.yaml`, where `config.yaml` contains the custom `imageRepository`, and/or `imageTag` for etcd and CoreDNS. diff --git a/content/en/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm.md b/content/en/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm.md index d7897dfec5817..7d3490beccb67 100644 --- a/content/en/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm.md +++ b/content/en/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm.md @@ -87,7 +87,7 @@ After you initialize your control-plane, the kubelet runs normally. ### Preparing the required container images This step is optional and only applies in case you wish `kubeadm init` and `kubeadm join` -to not download the default container images which are hosted at `k8s.gcr.io`. +to not download the default container images which are hosted at `registry.k8s.io`. Kubeadm has commands that can help you pre-pull the required images when creating a cluster without an internet connection on its nodes. diff --git a/content/en/docs/setup/production-environment/tools/kubeadm/high-availability.md b/content/en/docs/setup/production-environment/tools/kubeadm/high-availability.md index a26209e7a3c5c..43c317a215ee0 100644 --- a/content/en/docs/setup/production-environment/tools/kubeadm/high-availability.md +++ b/content/en/docs/setup/production-environment/tools/kubeadm/high-availability.md @@ -97,7 +97,7 @@ _See [External etcd topology](/docs/setup/production-environment/tools/kubeadm/h ### Container images -Each host should have access read and fetch images from the Kubernetes container image registry, `k8s.gcr.io`. +Each host should have access read and fetch images from the Kubernetes container image registry, `registry.k8s.io`. If you want to deploy a highly-available cluster where the hosts do not have access to pull images, this is possible. You must ensure by some other means that the correct container images are already available on the relevant hosts. ### Command line interface {#kubectl} @@ -226,8 +226,8 @@ option. Your cluster requirements may need a different configuration. As stated in the command output, the certificate key gives access to cluster sensitive data, keep it secret! {{< /caution >}} -1. Apply the CNI plugin of your choice: - [Follow these instructions](/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#pod-network) +1. Apply the CNI plugin of your choice: + [Follow these instructions](/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#pod-network) to install the CNI provider. Make sure the configuration corresponds to the Pod CIDR specified in the kubeadm configuration file (if applicable). diff --git a/content/en/docs/setup/production-environment/tools/kubeadm/setup-ha-etcd-with-kubeadm.md b/content/en/docs/setup/production-environment/tools/kubeadm/setup-ha-etcd-with-kubeadm.md index 0573fb942e44d..6e2f336f0ca9d 100644 --- a/content/en/docs/setup/production-environment/tools/kubeadm/setup-ha-etcd-with-kubeadm.md +++ b/content/en/docs/setup/production-environment/tools/kubeadm/setup-ha-etcd-with-kubeadm.md @@ -31,7 +31,7 @@ etcd cluster of three members that can be used by kubeadm during cluster creatio the kubeadm config file. * Each host must have systemd and a bash compatible shell installed. * Each host must [have a container runtime, kubelet, and kubeadm installed](/docs/setup/production-environment/tools/kubeadm/install-kubeadm/). -* Each host should have access to the Kubernetes container image registry (`k8s.gcr.io`) or list/pull the required etcd image using +* Each host should have access to the Kubernetes container image registry (`registry.k8s.io`) or list/pull the required etcd image using `kubeadm config images list/pull`. This guide will setup etcd instances as [static pods](/docs/tasks/configure-pod-container/static-pod/) managed by a kubelet. * Some infrastructure to copy files between hosts. For example `ssh` and `scp` @@ -276,7 +276,7 @@ on Kubernetes dual-stack support see [Dual-stack support with kubeadm](/docs/set ```sh docker run --rm -it \ --net host \ - -v /etc/kubernetes:/etc/kubernetes k8s.gcr.io/etcd:${ETCD_TAG} etcdctl \ + -v /etc/kubernetes:/etc/kubernetes registry.k8s.io/etcd:${ETCD_TAG} etcdctl \ --cert /etc/kubernetes/pki/etcd/peer.crt \ --key /etc/kubernetes/pki/etcd/peer.key \ --cacert /etc/kubernetes/pki/etcd/ca.crt \ From 90dc4e70a670957b059b5a2b7496f05c91924496 Mon Sep 17 00:00:00 2001 From: "Lubomir I. Ivanov" Date: Tue, 7 Jun 2022 16:34:06 +0300 Subject: [PATCH 04/77] kubeadm-init.md: adjust info for UnversionedKubeletConfigMap The feature gate goes GA in 1.25 and becomes locked by default to "enabled". --- .../setup-tools/kubeadm/kubeadm-init.md | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/content/en/docs/reference/setup-tools/kubeadm/kubeadm-init.md b/content/en/docs/reference/setup-tools/kubeadm/kubeadm-init.md index fc87e796c2ed7..865da3332ce1c 100644 --- a/content/en/docs/reference/setup-tools/kubeadm/kubeadm-init.md +++ b/content/en/docs/reference/setup-tools/kubeadm/kubeadm-init.md @@ -147,15 +147,15 @@ directly to kubeadm is not supported. Instead, it is possible to pass them by List of feature gates: {{< table caption="kubeadm feature gates" >}} -Feature | Default | Alpha | Beta -:-------|:--------|:------|:----- -`PublicKeysECDSA` | `false` | 1.19 | - -`RootlessControlPlane` | `false` | 1.22 | - -`UnversionedKubeletConfigMap` | `true` | 1.22 | 1.23 +Feature | Default | Alpha | Beta | GA +:-------|:--------|:------|:-----|:---- +`PublicKeysECDSA` | `false` | 1.19 | - | - +`RootlessControlPlane` | `false` | 1.22 | - | - +`UnversionedKubeletConfigMap` | `true` | 1.22 | 1.23 | 1.25 {{< /table >}} {{< note >}} -Once a feature gate goes GA it is removed from this list as its value becomes locked to `true` by default. +Once a feature gate goes GA its value becomes locked to `true` by default. {{< /note >}} Feature gate descriptions: @@ -181,10 +181,6 @@ or `kubeadm upgrade apply`), kubeadm respects the value of `UnversionedKubeletCo (during `kubeadm join`, `kubeadm reset`, `kubeadm upgrade ...`), kubeadm attempts to use unversioned ConfigMap name first; if that does not succeed, kubeadm falls back to using the legacy (versioned) name for that ConfigMap. -{{< note >}} -Setting `UnversionedKubeletConfigMap` to `false` is supported but **deprecated**. -{{< /note >}} - ### Adding kube-proxy parameters {#kube-proxy} For information about kube-proxy parameters in the kubeadm configuration see: From fefcf469226ca245dfd7d30ba0d7cfed208bbeb0 Mon Sep 17 00:00:00 2001 From: "Lubomir I. Ivanov" Date: Mon, 13 Jun 2022 15:43:44 +0300 Subject: [PATCH 05/77] kubeadm: document the option to use kubeletconfiguration patches The 'kubeletconfiguration' patch target is a new one in 1.25. It allows to apply instance-specific configuration to kubelets in a kubeadm cluster by patching the base KubeletConfiguration object that is shared by all nodes. --- .../tools/kubeadm/control-plane-flags.md | 33 ++++++++++--------- .../tools/kubeadm/kubelet-integration.md | 13 ++++---- 2 files changed, 25 insertions(+), 21 deletions(-) diff --git a/content/en/docs/setup/production-environment/tools/kubeadm/control-plane-flags.md b/content/en/docs/setup/production-environment/tools/kubeadm/control-plane-flags.md index f15315f3ad2d6..6b3724c8f7781 100644 --- a/content/en/docs/setup/production-environment/tools/kubeadm/control-plane-flags.md +++ b/content/en/docs/setup/production-environment/tools/kubeadm/control-plane-flags.md @@ -134,13 +134,13 @@ etcd: election-timeout: 1000 ``` -## Customizing the control plane with patches {#patches} +## Customizing with patches {#patches} {{< feature-state for_k8s_version="v1.22" state="beta" >}} Kubeadm allows you to pass a directory with patch files to `InitConfiguration` and `JoinConfiguration` -on individual nodes. These patches can be used as the last customization step before the control -plane component manifests are written to disk. +on individual nodes. These patches can be used as the last customization step before component configuration +is written to disk. You can pass this file to `kubeadm init` with `--config `: @@ -168,7 +168,8 @@ patches: The directory must contain files named `target[suffix][+patchtype].extension`. For example, `kube-apiserver0+merge.yaml` or just `etcd.json`. -- `target` can be one of `kube-apiserver`, `kube-controller-manager`, `kube-scheduler` and `etcd`. +- `target` can be one of `kube-apiserver`, `kube-controller-manager`, `kube-scheduler`, `etcd` +and `kubeletconfiguration`. - `patchtype` can be one of `strategic`, `merge` or `json` and these must match the patching formats [supported by kubectl](/docs/tasks/manage-kubernetes-objects/update-api-object-kubectl-patch). The default `patchtype` is `strategic`. @@ -183,20 +184,22 @@ flag, which must point to the same directory. `kubeadm upgrade` currently does n API structure that can be used for the same purpose. {{< /note >}} -## Customizing the kubelet +## Customizing the kubelet {#kubelet} -To customize the kubelet you can add a `KubeletConfiguration` next to the `ClusterConfiguration` or -`InitConfiguration` separated by `---` within the same configuration file. This file can then be passed to `kubeadm init`. +To customize the kubelet you can add a [`KubeletConfiguration`](/docs/reference/config-api/kubelet-config.v1beta1/) +next to the `ClusterConfiguration` or `InitConfiguration` separated by `---` within the same configuration file. +This file can then be passed to `kubeadm init` and kubeadm will apply the same base `KubeletConfiguration` +to all nodes in the cluster. -{{< note >}} -kubeadm applies the same `KubeletConfiguration` to all nodes in the cluster. To apply node -specific settings you can use kubelet flags as overrides by passing them in the `nodeRegistration.kubeletExtraArgs` -field supported by both `InitConfiguration` and `JoinConfiguration`. Some kubelet flags are deprecated, -so check their status in the [kubelet reference documentation](/docs/reference/command-line-tools-reference/kubelet) -before using them. -{{< /note >}} +For applying instance-specific configuration over the base `KubeletConfiguration` you can use the +[`kubeletconfiguration` patch target](#patches). + +Alternatively, you can use kubelet flags as overrides by passing them in the +`nodeRegistration.kubeletExtraArgs` field supported by both `InitConfiguration` and `JoinConfiguration`. +Some kubelet flags are deprecated, so check their status in the +[kubelet reference documentation](/docs/reference/command-line-tools-reference/kubelet) before using them. -For more details see [Configuring each kubelet in your cluster using kubeadm](/docs/setup/production-environment/tools/kubeadm/kubelet-integration) +For additional details see [Configuring each kubelet in your cluster using kubeadm](/docs/setup/production-environment/tools/kubeadm/kubelet-integration) ## Customizing kube-proxy diff --git a/content/en/docs/setup/production-environment/tools/kubeadm/kubelet-integration.md b/content/en/docs/setup/production-environment/tools/kubeadm/kubelet-integration.md index 63c1f718abf76..13814f411799a 100644 --- a/content/en/docs/setup/production-environment/tools/kubeadm/kubelet-integration.md +++ b/content/en/docs/setup/production-environment/tools/kubeadm/kubelet-integration.md @@ -87,20 +87,21 @@ networking, or other host-specific parameters. The following list provides a few - To specify the container runtime you must set its endpoint with the `--container-runtime-endpoint=` flag. -You can specify these flags by configuring an individual kubelet's configuration in your service manager, -such as systemd. +The recommended way of applying such instance-specific configuration is by using +[`KubeletConfiguration` patches](/docs/setup/production-environment/tools/kubeadm/control-plane-flags#patches). ## Configure kubelets using kubeadm -It is possible to configure the kubelet that kubeadm will start if a custom `KubeletConfiguration` +It is possible to configure the kubelet that kubeadm will start if a custom +[`KubeletConfiguration`](/docs/reference/config-api/kubelet-config.v1beta1/) API object is passed with a configuration file like so `kubeadm ... --config some-config-file.yaml`. By calling `kubeadm config print init-defaults --component-configs KubeletConfiguration` you can see all the default values for this structure. -Also have a look at the -[reference for the KubeletConfiguration](/docs/reference/config-api/kubelet-config.v1beta1/) -for more information on the individual fields. +It is also possible to apply instance-specific patches over the base `KubeletConfiguration`. +Have a look at [Customizing the kubelet](/docs/setup/production-environment/tools/kubeadm/control-plane-flags#kubelet) +for more details. ### Workflow when using `kubeadm init` From 355413fc4b7243c68d921f259766ff2e0c07f6c0 Mon Sep 17 00:00:00 2001 From: Kristin Martin Date: Tue, 14 Jun 2022 16:03:26 -0700 Subject: [PATCH 06/77] update config.toml for 1.25 release --- config.toml | 31 +++++++++++++++++++------------ 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/config.toml b/config.toml index 6ad3ac39f3f11..e445656f950a6 100644 --- a/config.toml +++ b/config.toml @@ -139,10 +139,10 @@ time_format_default = "January 02, 2006 at 3:04 PM PST" description = "Production-Grade Container Orchestration" showedit = true -latest = "v1.24" +latest = "v1.25" -fullversion = "v1.24.0" -version = "v1.24" +fullversion = "v1.25.0" +version = "v1.25" githubbranch = "main" docsbranch = "main" deprecated = false @@ -179,30 +179,37 @@ js = [ ] [[params.versions]] -fullversion = "v1.24.0" -version = "v1.24" -githubbranch = "v1.24.0" +fullversion = "v1.25.0" +version = "v1.25" +githubbranch = "v1.25.0" docsbranch = "main" url = "https://kubernetes.io" [[params.versions]] -fullversion = "v1.23.6" +fullversion = "v1.24.2" +version = "v1.24" +githubbranch = "v1.24.2" +docsbranch = "release-1.24" +url = "https://v1-24.docs.kubernetes.io" + +[[params.versions]] +fullversion = "v1.23.8" version = "v1.23" -githubbranch = "v1.23.6" +githubbranch = "v1.23.8" docsbranch = "release-1.23" url = "https://v1-23.docs.kubernetes.io" [[params.versions]] -fullversion = "v1.22.9" +fullversion = "v1.22.11" version = "v1.22" -githubbranch = "v1.22.9" +githubbranch = "v1.22.11" docsbranch = "release-1.22" url = "https://v1-22.docs.kubernetes.io" [[params.versions]] -fullversion = "v1.21.12" +fullversion = "v1.21.14" version = "v1.21" -githubbranch = "v1.21.12" +githubbranch = "v1.21.14" docsbranch = "release-1.21" url = "https://v1-21.docs.kubernetes.io" From cac34962ebadc7074c14c94735a4c6e12d27276e Mon Sep 17 00:00:00 2001 From: Kristin Martin Date: Tue, 14 Jun 2022 18:30:22 -0700 Subject: [PATCH 07/77] Remove 1.20 params block --- config.toml | 7 ------- 1 file changed, 7 deletions(-) diff --git a/config.toml b/config.toml index e445656f950a6..ade9eb83f0037 100644 --- a/config.toml +++ b/config.toml @@ -213,13 +213,6 @@ githubbranch = "v1.21.14" docsbranch = "release-1.21" url = "https://v1-21.docs.kubernetes.io" -[[params.versions]] -fullversion = "v1.20.15" -version = "v1.20" -githubbranch = "v1.20.15" -docsbranch = "release-1.20" -url = "https://v1-20.docs.kubernetes.io" - # User interface configuration [params.ui] # Enable to show the side bar menu in its compact state. From 92e5fc8f6934ff77bf3521ef8498b9f39f96e197 Mon Sep 17 00:00:00 2001 From: "Lubomir I. Ivanov" Date: Tue, 28 Jun 2022 00:03:45 +0300 Subject: [PATCH 08/77] Update content/en/docs/setup/production-environment/tools/kubeadm/kubelet-integration.md Co-authored-by: Rey Lejano --- .../production-environment/tools/kubeadm/kubelet-integration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/docs/setup/production-environment/tools/kubeadm/kubelet-integration.md b/content/en/docs/setup/production-environment/tools/kubeadm/kubelet-integration.md index 13814f411799a..408ab8a3a2c0d 100644 --- a/content/en/docs/setup/production-environment/tools/kubeadm/kubelet-integration.md +++ b/content/en/docs/setup/production-environment/tools/kubeadm/kubelet-integration.md @@ -100,7 +100,7 @@ By calling `kubeadm config print init-defaults --component-configs KubeletConfig see all the default values for this structure. It is also possible to apply instance-specific patches over the base `KubeletConfiguration`. -Have a look at [Customizing the kubelet](/docs/setup/production-environment/tools/kubeadm/control-plane-flags#kubelet) +Have a look at [Customizing the kubelet](/docs/setup/production-environment/tools/kubeadm/control-plane-flags#customizing-the-kubelet) for more details. ### Workflow when using `kubeadm init` From 5d66e4b0d2231e8af5fbbec5887ca3ead97943d1 Mon Sep 17 00:00:00 2001 From: Sascha Grunert Date: Mon, 27 Jun 2022 10:52:09 +0200 Subject: [PATCH 09/77] Graduate SeccompDefault feature to beta We now update the documentation to reflect the current state of the feature. Refers to: https://github.com/kubernetes/enhancements/issues/2413 Signed-off-by: Sascha Grunert Co-authored-by: Tim Bannister Signed-off-by: Sascha Grunert --- .../feature-gates.md | 3 +- content/en/docs/tutorials/security/seccomp.md | 32 ++++++++++++------- 2 files changed, 22 insertions(+), 13 deletions(-) diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 8eb23b3a473eb..f769b081dece2 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -178,7 +178,8 @@ different Kubernetes components. | `RemainingItemCount` | `true` | Beta | 1.16 | | | `RotateKubeletServerCertificate` | `false` | Alpha | 1.7 | 1.11 | | `RotateKubeletServerCertificate` | `true` | Beta | 1.12 | | -| `SeccompDefault` | `false` | Alpha | 1.22 | | +| `SeccompDefault` | `false` | Alpha | 1.22 | 1.24 | +| `SeccompDefault` | `true` | Beta | 1.25 | | | `ServerSideFieldValidation` | `false` | Alpha | 1.23 | - | | `ServiceInternalTrafficPolicy` | `false` | Alpha | 1.21 | 1.21 | | `ServiceInternalTrafficPolicy` | `true` | Beta | 1.22 | | diff --git a/content/en/docs/tutorials/security/seccomp.md b/content/en/docs/tutorials/security/seccomp.md index 5a3fa4a641b17..48c991cc4aa57 100644 --- a/content/en/docs/tutorials/security/seccomp.md +++ b/content/en/docs/tutorials/security/seccomp.md @@ -39,7 +39,7 @@ profiles that give only the necessary privileges to your container processes. In order to complete all steps in this tutorial, you must install [kind](/docs/tasks/tools/#kind) and [kubectl](/docs/tasks/tools/#kubectl). -This tutorial shows some examples that are still alpha (since v1.22) and +This tutorial shows some examples that are still beta (since v1.25) and others that use only generally available seccomp functionality. You should make sure that your cluster is [configured correctly](https://kind.sigs.k8s.io/docs/user/quick-start/#setting-kubernetes-version) @@ -112,7 +112,7 @@ See [Nodes](https://kind.sigs.k8s.io/docs/user/configuration/#nodes) within the kind documentation about configuration for more details on this. This tutorial assumes you are using Kubernetes {{< param "version" >}}. -As an alpha feature, you can configure Kubernetes to use the profile that the +As a beta feature, you can configure Kubernetes to use the profile that the {{< glossary_tooltip text="container runtime" term_id="container-runtime" >}} prefers by default, rather than falling back to `Unconfined`. If you want to try that, see @@ -159,11 +159,12 @@ running within kind. ## Enable the use of `RuntimeDefault` as the default seccomp profile for all workloads -{{< feature-state state="alpha" for_k8s_version="v1.22" >}} +{{< feature-state state="beta" for_k8s_version="v1.25" >}} -`SeccompDefault` is an optional kubelet -[feature gate](/docs/reference/command-line-tools-reference/feature-gates) as -well as corresponding `--seccomp-default` +To use seccomp profile defaulting, you must run the kubelet with the `SeccompDefault` +[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) enabled +(this is the default). You must also explicitly enable the defaulting behavior for each +node where you want to use this with the corresponding `--seccomp-default` [command line flag](/docs/reference/command-line-tools-reference/kubelet). Both have to be enabled simultaneously to use the feature. @@ -196,13 +197,20 @@ If you were introducing this feature into production-like cluster, the Kubernete recommends that you enable this feature gate on a subset of your nodes and then test workload execution before rolling the change out cluster-wide. -More detailed information about a possible upgrade and downgrade strategy can be -found in the [related Kubernetes Enhancement Proposal (KEP)](https://github.com/kubernetes/enhancements/tree/a70cc18/keps/sig-node/2413-seccomp-by-default#upgrade--downgrade-strategy). +You can find more detailed information about a possible upgrade and downgrade strategy +in the related Kubernetes Enhancement Proposal (KEP): +[Enable seccomp by default](https://github.com/kubernetes/enhancements/tree/9a124fd29d1f9ddf2ff455c49a630e3181992c25/keps/sig-node/2413-seccomp-by-default#upgrade--downgrade-strategy). -Since the feature is in alpha state it is disabled per default. To enable it, -pass the flags `--feature-gates=SeccompDefault=true --seccomp-default` to the -`kubelet` CLI or enable it via the [kubelet configuration -file](/docs/tasks/administer-cluster/kubelet-config-file/). To enable the +Seccomp defaulting for Pods is a beta feature in Kubernetes {{< skew currentVersion >}}, +and the corresponding `SeccompDefault` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) +is enabled by default. However, you still need to enable this defaulting for each node where +you would like to use it. + +If you are running a Kubernetes {{< skew currentVersion >}} cluster and want to enable Seccomp +defaulting, either run the kubelet with the `--seccomp-default` command line flag, or enable +Seccomp defaulting through the +[kubelet +configuration file](/docs/tasks/administer-cluster/kubelet-config-file/). To enable the feature gate in [kind](https://kind.sigs.k8s.io), ensure that `kind` provides the minimum required Kubernetes version and enables the `SeccompDefault` feature [in the kind configuration](https://kind.sigs.k8s.io/docs/user/quick-start/#enable-feature-gates-in-your-cluster): From df55ed55167707720f66508942452995f17823e8 Mon Sep 17 00:00:00 2001 From: Adrian Reber Date: Tue, 15 Feb 2022 13:12:05 +0100 Subject: [PATCH 10/77] Add documentation for container checkpointing feature (KEP 2008) Co-authored-by: Tim Bannister Signed-off-by: Adrian Reber --- .../feature-gates.md | 3 + .../reference/node/kubelet-checkpoint-api.md | 96 +++++++++++++++++++ 2 files changed, 99 insertions(+) create mode 100644 content/en/docs/reference/node/kubelet-checkpoint-api.md diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 9d1e67b3c05ae..262736a820ccc 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -64,6 +64,7 @@ different Kubernetes components. | `AnyVolumeDataSource` | `false` | Alpha | 1.18 | 1.23 | | `AnyVolumeDataSource` | `true` | Beta | 1.24 | | | `AppArmor` | `true` | Beta | 1.4 | | +| `CheckpointContainer` | `false` | Alpha | 1.25 | | | `CPUManager` | `false` | Alpha | 1.8 | 1.9 | | `CPUManager` | `true` | Beta | 1.10 | | | `CPUManagerPolicyAlphaOptions` | `false` | Alpha | 1.23 | | @@ -634,6 +635,8 @@ Each feature gate is designed for enabling/disabling a specific feature: flag `--service-account-extend-token-expiration=false`. Check [Bound Service Account Tokens](https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/1205-bound-service-account-tokens/README.md) for more details. +- `CheckpointContainer`: Enables the kubelet `checkpoint` API. + See [Kubelet Checkpoint API](/docs/reference/node/kubelet-checkpoint-api/) for more details. - `ControllerManagerLeaderMigration`: Enables Leader Migration for [kube-controller-manager](/docs/tasks/administer-cluster/controller-manager-leader-migration/#initial-leader-migration-configuration) and [cloud-controller-manager](/docs/tasks/administer-cluster/controller-manager-leader-migration/#deploy-cloud-controller-manager) diff --git a/content/en/docs/reference/node/kubelet-checkpoint-api.md b/content/en/docs/reference/node/kubelet-checkpoint-api.md new file mode 100644 index 0000000000000..13602a2cafc9c --- /dev/null +++ b/content/en/docs/reference/node/kubelet-checkpoint-api.md @@ -0,0 +1,96 @@ +--- +content_type: "reference" +title: Kubelet Checkpoint API +weight: 10 +--- + + +{{< feature-state for_k8s_version="v1.25" state="alpha" >}} + +Checkpointing a container is the functionality to create a stateful copy of a +running container. Once you have a stateful copy of a container, you could +move it to a different computer for debugging or similar purposes. + +If you move the checkpointed container data to a computer that's able to restore +it, that restored container continues to run at exactly the same +point it was checkpointed. You can also inspect the saved data, provided that you +have suitable tools for doing so. + +Creating a checkpoint of a container might have security implications. Typically +a checkpoint contains all memory pages of all processes in the checkpointed +container. This means that everything that used to be in memory is now available +on the local disk. This includes all private data and possibly keys used for +encryption. The underlying CRI implementations (the container runtime on that node) +should create the checkpoint archive to be only accessible by the `root` user. It +is still important to remember if the checkpoint archive is transferred to another +system all memory pages will be readable by the owner of the checkpoint archive. + +## Operations {#operations} + +### `post` checkpoint the specified container {#post-checkpoint} + +Tell the kubelet to checkpoint a specific container from the specified Pod. + +Consult the [Kubelet authentication/authorization reference](/docs/reference/command-line-tools-reference/kubelet-authentication-authorization) +for more information about how access to the kubelet checkpoint interface is +controlled. + +The kubelet will request a checkpoint from the underlying +{{}} implementation. In the checkpoint +request the kubelet will specify the name of the checkpoint archive as +`checkpoint---.tar` and also request to +store the checkpoint archive in the `checkpoints` directory below its root +directory (as defined by `--root-dir`). This defaults to +`/var/lib/kubelet/checkpoints`. + +The checkpoint archive is in _tar_ format, and could be listed using an implementation of +[`tar`](https://pubs.opengroup.org/onlinepubs/7908799/xcu/tar.html). The contents of the +archive depend on the underlying CRI implementation (the container runtime on that node). + +#### HTTP Request {#post-checkpoint-request} + +POST /checkpoint/{namespace}/{pod}/{container} + +#### Parameters {#post-checkpoint-params} + +- **namespace** (*in path*): string, required + + {{< glossary_tooltip term_id="namespace" >}} + +- **pod** (*in path*): string, required + + {{< glossary_tooltip term_id="pod" >}} + +- **container** (*in path*): string, required + + {{< glossary_tooltip term_id="container" >}} + +- **timeout** (*in query*): integer + + Timeout in seconds to wait until the checkpoint creation is finished. + If zero or no timeout is specfied the default {{}} timeout value will be used. Checkpoint + creation time depends directly on the used memory of the container. + The more memory a container uses the more time is required to create + the corresponding checkpoint. + +#### Response {#post-checkpoint-response} + +200: OK + +401: Unauthorized + +404: Not Found (if the `CheckpointContainer` feature gate is disabled) + +404: Not Found (if the specified `namespace`, `pod` or `container` cannot be found) + +500: Internal Server Error (if the CRI implementation encounter an error during checkpointing (see error message for further details)) + +500: Internal Server Error (if the CRI implementation does not implement the checkpoint CRI API (see error message for further details)) + +{{< comment >}} +TODO: Add more information about return codes once CRI implementation have checkpoint/restore. + This TODO cannot be fixed before the release, because the CRI implementation need + the Kubernetes changes to be merged to implement the new CheckpointContainer CRI API + call. We need to wait after the 1.25 release to fix this. +{{< /comment >}} From 5246ba43a95e5cb84ee530ec31530209c05c3607 Mon Sep 17 00:00:00 2001 From: Ayushman Mishra Date: Fri, 15 Jul 2022 12:23:27 +0530 Subject: [PATCH 11/77] Remove list of container runtimes tested with v1.24 --- .../troubleshooting-cni-plugin-related-errors.md | 5 ----- 1 file changed, 5 deletions(-) diff --git a/content/en/docs/tasks/administer-cluster/migrating-from-dockershim/troubleshooting-cni-plugin-related-errors.md b/content/en/docs/tasks/administer-cluster/migrating-from-dockershim/troubleshooting-cni-plugin-related-errors.md index 91a471290da8d..34e2b112efce2 100644 --- a/content/en/docs/tasks/administer-cluster/migrating-from-dockershim/troubleshooting-cni-plugin-related-errors.md +++ b/content/en/docs/tasks/administer-cluster/migrating-from-dockershim/troubleshooting-cni-plugin-related-errors.md @@ -13,11 +13,6 @@ To avoid CNI plugin-related errors, verify that you are using or upgrading to a container runtime that has been tested to work correctly with your version of Kubernetes. -For example, the following container runtimes are being prepared, or have already been prepared, for Kubernetes v1.24: - -* containerd v1.6.4 and later, v1.5.11 and later -* The CRI-O v1.24.0 and later - ## About the "Incompatible CNI versions" and "Failed to destroy network for sandbox" errors Service issues exist for pod CNI network setup and tear down in containerd From 765fd75fd01faa7313c8e4fa048c0f5f9bfb25b4 Mon Sep 17 00:00:00 2001 From: Nick Turner Date: Thu, 14 Jul 2022 09:53:39 -0700 Subject: [PATCH 12/77] Update alpha.kubernetes.io/provided-node-ip * The annotation is no longer set ONLY when --cloud-provider=external * Now, it is set on kubelet startup if the --cloud-provider flag is set at all, including the deprecated in-tree values like `aws` and `gcp`. --- content/en/docs/reference/labels-annotations-taints/_index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/docs/reference/labels-annotations-taints/_index.md b/content/en/docs/reference/labels-annotations-taints/_index.md index 927c45c613e94..fba76a93030ff 100644 --- a/content/en/docs/reference/labels-annotations-taints/_index.md +++ b/content/en/docs/reference/labels-annotations-taints/_index.md @@ -420,7 +420,7 @@ Used on: Node The kubelet can set this annotation on a Node to denote its configured IPv4 address. -When kubelet is started with the "external" cloud provider, it sets this annotation on the Node to denote an IP address set from the command line flag (`--node-ip`). This IP is verified with the cloud provider as valid by the cloud-controller-manager. +When kubelet is started with the `--cloud-provider` flag set to any value (includes both external and legacy in-tree cloud providers), it sets this annotation on the Node to denote an IP address set from the command line flag (`--node-ip`). This IP is verified with the cloud provider as valid by the cloud-controller-manager. ### batch.kubernetes.io/job-completion-index From 641a8e2c0c7e91fae0d413fdb6d2a768d71a0133 Mon Sep 17 00:00:00 2001 From: Sascha Grunert Date: Mon, 18 Jul 2022 10:28:19 +0200 Subject: [PATCH 13/77] Improve 'Seccomp defaulting' feature name We're now rephrasing those two paragraphs to avoid confusing readers. Signed-off-by: Sascha Grunert --- content/en/docs/tutorials/security/seccomp.md | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/content/en/docs/tutorials/security/seccomp.md b/content/en/docs/tutorials/security/seccomp.md index 48c991cc4aa57..21ea7a5f62937 100644 --- a/content/en/docs/tutorials/security/seccomp.md +++ b/content/en/docs/tutorials/security/seccomp.md @@ -201,16 +201,17 @@ You can find more detailed information about a possible upgrade and downgrade st in the related Kubernetes Enhancement Proposal (KEP): [Enable seccomp by default](https://github.com/kubernetes/enhancements/tree/9a124fd29d1f9ddf2ff455c49a630e3181992c25/keps/sig-node/2413-seccomp-by-default#upgrade--downgrade-strategy). -Seccomp defaulting for Pods is a beta feature in Kubernetes {{< skew currentVersion >}}, -and the corresponding `SeccompDefault` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) -is enabled by default. However, you still need to enable this defaulting for each node where +Kubernetes {{< skew currentVersion >}} lets you configure the seccomp profile +that applies when the spec for a Pod doesn't define a specific seccomp profile. +This is a beta feature and the corresponding `SeccompDefault` [feature +gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled by +default. However, you still need to enable this defaulting for each node where you would like to use it. -If you are running a Kubernetes {{< skew currentVersion >}} cluster and want to enable Seccomp -defaulting, either run the kubelet with the `--seccomp-default` command line flag, or enable -Seccomp defaulting through the -[kubelet -configuration file](/docs/tasks/administer-cluster/kubelet-config-file/). To enable the +If you are running a Kubernetes {{< skew currentVersion >}} cluster and want to +enable the feature, either run the kubelet with the `--seccomp-default` command +line flag, or enable it through the [kubelet configuration +file](/docs/tasks/administer-cluster/kubelet-config-file/). To enable the feature gate in [kind](https://kind.sigs.k8s.io), ensure that `kind` provides the minimum required Kubernetes version and enables the `SeccompDefault` feature [in the kind configuration](https://kind.sigs.k8s.io/docs/user/quick-start/#enable-feature-gates-in-your-cluster): From 83bcd9aec1cf107556777298d1c11ff56f5ead31 Mon Sep 17 00:00:00 2001 From: Maciej Szulik Date: Tue, 26 Jul 2022 13:40:01 +0200 Subject: [PATCH 14/77] Promote CronJobTimeZone to beta --- content/en/docs/concepts/workloads/controllers/cron-jobs.md | 2 +- .../reference/command-line-tools-reference/feature-gates.md | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/content/en/docs/concepts/workloads/controllers/cron-jobs.md b/content/en/docs/concepts/workloads/controllers/cron-jobs.md index 2d2416226a8e6..1f64045422249 100644 --- a/content/en/docs/concepts/workloads/controllers/cron-jobs.md +++ b/content/en/docs/concepts/workloads/controllers/cron-jobs.md @@ -94,7 +94,7 @@ To generate CronJob schedule expressions, you can also use web tools like [cront ## Time zones For CronJobs with no time zone specified, the kube-controller-manager interprets schedules relative to its local time zone. -{{< feature-state for_k8s_version="v1.24" state="alpha" >}} +{{< feature-state for_k8s_version="v1.25" state="beta" >}} If you enable the `CronJobTimeZone` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/), you can specify a time zone for a CronJob (if you don't enable that feature gate, or if you are using a version of diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 2500db659e05a..3672690b34b66 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -285,7 +285,8 @@ different Kubernetes components. | `CronJobControllerV2` | `false` | Alpha | 1.20 | 1.20 | | `CronJobControllerV2` | `true` | Beta | 1.21 | 1.21 | | `CronJobControllerV2` | `true` | GA | 1.22 | - | -| `CronJobTimeZone` | `false` | Alpha | 1.24 | | +| `CronJobTimeZone` | `false` | Alpha | 1.24 | 1.24 | +| `CronJobTimeZone` | `true` | Beta | 1.25 | | | `CustomPodDNS` | `false` | Alpha | 1.9 | 1.9 | | `CustomPodDNS` | `true` | Beta| 1.10 | 1.13 | | `CustomPodDNS` | `true` | GA | 1.14 | - | From bea32223802f58795eed8a1907582f3c7c75ab26 Mon Sep 17 00:00:00 2001 From: Oksana Naumov Date: Fri, 8 Jul 2022 11:36:07 -0700 Subject: [PATCH 15/77] Move CSI Migration Portworx feature to Beta --- .../docs/reference/command-line-tools-reference/feature-gates.md | 1 + 1 file changed, 1 insertion(+) diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 2500db659e05a..15874fba3f400 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -85,6 +85,7 @@ different Kubernetes components. | `CSIMigrationGCE` | `true` | Beta | 1.23 | | | `CSIMigrationvSphere` | `false` | Beta | 1.19 | | | `CSIMigrationPortworx` | `false` | Alpha | 1.23 | | +| `CSIMigrationPortworx` | `false` | Beta | 1.25 | | | `csiMigrationRBD` | `false` | Alpha | 1.23 | | | `CSIVolumeHealth` | `false` | Alpha | 1.21 | | | `ContextualLogging` | `false` | Alpha | 1.24 | | From 37647c4ce8368e51782d0f08385afb12cc6f3f5a Mon Sep 17 00:00:00 2001 From: Jonathan Dobson Date: Thu, 28 Jul 2022 09:31:37 -0600 Subject: [PATCH 16/77] KEP-596: Move CSIInlineVolume to GA --- content/en/docs/concepts/storage/ephemeral-volumes.md | 5 +---- content/en/docs/concepts/storage/volumes.md | 5 ++--- .../reference/command-line-tools-reference/feature-gates.md | 5 +++-- 3 files changed, 6 insertions(+), 9 deletions(-) diff --git a/content/en/docs/concepts/storage/ephemeral-volumes.md b/content/en/docs/concepts/storage/ephemeral-volumes.md index 045bcafe768a9..4c81ebebe29f2 100644 --- a/content/en/docs/concepts/storage/ephemeral-volumes.md +++ b/content/en/docs/concepts/storage/ephemeral-volumes.md @@ -74,10 +74,7 @@ is managed by kubelet, or injecting different data. ### CSI ephemeral volumes -{{< feature-state for_k8s_version="v1.16" state="beta" >}} - -This feature requires the `CSIInlineVolume` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) -to be enabled. It is enabled by default starting with Kubernetes 1.16. +{{< feature-state for_k8s_version="v1.25" state="stable" >}} {{< note >}} CSI ephemeral volumes are only supported by a subset of CSI drivers. diff --git a/content/en/docs/concepts/storage/volumes.md b/content/en/docs/concepts/storage/volumes.md index fd994c97fa7fb..e580d623624c0 100644 --- a/content/en/docs/concepts/storage/volumes.md +++ b/content/en/docs/concepts/storage/volumes.md @@ -1179,8 +1179,7 @@ A `csi` volume can be used in a Pod in three different ways: * through a reference to a [PersistentVolumeClaim](#persistentvolumeclaim) * with a [generic ephemeral volume](/docs/concepts/storage/ephemeral-volumes/#generic-ephemeral-volumes) -* with a [CSI ephemeral volume](/docs/concepts/storage/ephemeral-volumes/#csi-ephemeral-volumes) -if the driver supports that (beta feature) +* with a [CSI ephemeral volume](/docs/concepts/storage/ephemeral-volumes/#csi-ephemeral-volumes) if the driver supports that The following fields are available to storage administrators to configure a CSI persistent volume: @@ -1241,7 +1240,7 @@ You can set up your #### CSI ephemeral volumes -{{< feature-state for_k8s_version="v1.16" state="beta" >}} +{{< feature-state for_k8s_version="v1.25" state="stable" >}} You can directly configure CSI volumes within the Pod specification. Volumes specified in this way are ephemeral and do not diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 2500db659e05a..e736f596e7066 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -70,8 +70,6 @@ different Kubernetes components. | `CPUManagerPolicyBetaOptions` | `true` | Beta | 1.23 | | | `CPUManagerPolicyOptions` | `false` | Alpha | 1.22 | 1.22 | | `CPUManagerPolicyOptions` | `true` | Beta | 1.23 | | -| `CSIInlineVolume` | `false` | Alpha | 1.15 | 1.15 | -| `CSIInlineVolume` | `true` | Beta | 1.16 | - | | `CSIMigration` | `false` | Alpha | 1.14 | 1.16 | | `CSIMigration` | `true` | Beta | 1.17 | | | `CSIMigrationAWS` | `false` | Alpha | 1.14 | 1.16 | @@ -246,6 +244,9 @@ different Kubernetes components. | `CSIDriverRegistry` | `false` | Alpha | 1.12 | 1.13 | | `CSIDriverRegistry` | `true` | Beta | 1.14 | 1.17 | | `CSIDriverRegistry` | `true` | GA | 1.18 | - | +| `CSIInlineVolume` | `false` | Alpha | 1.15 | 1.15 | +| `CSIInlineVolume` | `true` | Beta | 1.16 | 1.24 | +| `CSIInlineVolume` | `true` | GA | 1.25 | - | | `CSIMigrationAWSComplete` | `false` | Alpha | 1.17 | 1.20 | | `CSIMigrationAWSComplete` | - | Deprecated | 1.21 | - | | `CSIMigrationAzureDisk` | `false` | Alpha | 1.15 | 1.18 | From 2dc5f254e9d9893f154e05dd516f8dd77216ae5b Mon Sep 17 00:00:00 2001 From: Kevin Delgado Date: Fri, 18 Mar 2022 20:47:13 +0000 Subject: [PATCH 17/77] Add Server Side Field Validation to API Concepts --- .../feature-gates.md | 3 +- .../docs/reference/using-api/api-concepts.md | 82 +++++++++++++++++++ 2 files changed, 84 insertions(+), 1 deletion(-) diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 2500db659e05a..d2ba70d53ae11 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -180,7 +180,8 @@ different Kubernetes components. | `RotateKubeletServerCertificate` | `true` | Beta | 1.12 | | | `SeccompDefault` | `false` | Alpha | 1.22 | 1.24 | | `SeccompDefault` | `true` | Beta | 1.25 | | -| `ServerSideFieldValidation` | `false` | Alpha | 1.23 | - | +| `ServerSideFieldValidation` | `false` | Alpha | 1.23 | 1.24 | +| `ServerSideFieldValidation` | `true` | Beta | 1.25 | | | `ServiceInternalTrafficPolicy` | `false` | Alpha | 1.21 | 1.21 | | `ServiceInternalTrafficPolicy` | `true` | Beta | 1.22 | | | `ServiceIPStaticSubrange` | `false` | Alpha | 1.24 | | diff --git a/content/en/docs/reference/using-api/api-concepts.md b/content/en/docs/reference/using-api/api-concepts.md index 1a722acffd229..0feca02fc2716 100644 --- a/content/en/docs/reference/using-api/api-concepts.md +++ b/content/en/docs/reference/using-api/api-concepts.md @@ -661,6 +661,88 @@ of single-resource API requests, then aggregates the responses if needed. By contrast, the Kubernetes API verbs **list** and **watch** allow getting multiple resources, and **deletecollection** allows deleting multiple resources. +## Field validation + +Kubernetes always validates the type of fields. For example, if a field in the +API is defined as a number, you cannot set the field to a text value. If a field +is defined as an array of strings, you can only provide an array. Some fields +allow you to omit them, other fields are required. Omitting a required field +from an API request is an error. + +If you make a request with an extra field, one that the cluster's control plane +does not recognize, then the behavior of the API server is more complicated. + +By default, the API server drops fields that it does not recognize +from an input that it receives (for example, the JSON body of a `PUT` request). + +There are two situations where the API server drops fields that you supplied in +an HTTP request. + +These situations are: + +1. The field is unrecognized because it is not in the resource's OpenAPI schema. (One + exception to this is for {{< glossary_tooltip + term_id="CustomResourceDefinition" text="CRDs" >}} that explicitly choose not to prune unknown + fields via `x-kubernetes-preserve-unknown-fields`). +2. The field is duplicated in the object. + +### Setting the field validation level + + {{< feature-state for_k8s_version="v1.25" state="beta" >}} + +Provided that the `ServerSideFieldValidation` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled (disabled +by default in 1.23 and 1.24, enabled by default starting in 1.25), you can take +advantage of server side field validation to catch these unrecognized fields. + +When you use HTTP verbs that can submit data (`POST`, `PUT`, and `PATCH`), field +validation gives you the option to choose how you would like to be notified of +these fields that are being dropped by the API server. Possible levels of +validation are `Ignore`, `Warn`, and `Strict`. + +{{< note >}} +If you submit a request that specifies an unrecognized field, and that is also invalid for +a different reason (for example, the request provides a string value where the API expects +an integer), then the API server responds with a 400 Bad Request error response. + +You always receive an error response in this case, no matter what field validation level you requested. +{{< /note >}} + +Field validation is set by the `fieldValidation` query parameter. The three +values that you can provide for this parameter are: + +`Ignore` +: The API server succeeds in handling the request as it would without the erroneous fields +being set, dropping all unknown and duplicate fields and giving no indication it +has done so. + +`Warn` +: (Default) The API server succeeds in handling the request, and reports a +warning to the client. The warning is sent using the `Warning:` response header, +adding one warning item for each unknown or duplicate field. For more +information about warnings and the Kubernetes API, see the blog article +[Warning: Helpful Warnings Ahead](/blog/2020/09/03/warnings/). + +`Strict` +: The API server rejects the request with a 400 Bad Request error when it +detects any unknown or duplicate fields. The response message from the API +server specifies all the unknown or duplicate fields that the API server has +detected. + +Tools that submit requests to the server (such as `kubectl`), might set their own +defaults that are different from the `Warn` validation level that the API server uses +by default. + +The `kubectl` tool uses the `--validate` flag to set the level of field validation. +Historically `--validate` was used to toggle client-side validation on or off as +a boolean flag. Since Kubernetes 1.25, kubectl uses +server-side field validation when sending requests to a serer with this feature +enabled. Validation will fall back to client-side only when it cannot connect +to an API server with field validation enabled. +It accepts the values `ignore`, `warn`, +and `strict` while also accepting the values `true` (equivalent to `strict`) and `false` +(equivalent to `ignore`). The default validation setting for kubectl is `--validate=true`, +which means strict server-side field validation. + ## Dry-run {{< feature-state for_k8s_version="v1.18" state="stable" >}} From 753633545f5ff5e00febb998f8be5d6ef52ff77e Mon Sep 17 00:00:00 2001 From: ravisantoshgudimetla Date: Sun, 31 Jul 2022 19:00:44 -0400 Subject: [PATCH 18/77] Promote PodOS field to GA --- content/en/docs/concepts/windows/intro.md | 5 ++--- content/en/docs/concepts/windows/user-guide.md | 7 +++---- .../command-line-tools-reference/feature-gates.md | 1 + 3 files changed, 6 insertions(+), 7 deletions(-) diff --git a/content/en/docs/concepts/windows/intro.md b/content/en/docs/concepts/windows/intro.md index f3867ea86cdc4..ac5a0533490ff 100644 --- a/content/en/docs/concepts/windows/intro.md +++ b/content/en/docs/concepts/windows/intro.md @@ -88,13 +88,12 @@ section refers to several key workload abstractions and how they map to Windows. * OS field: The `.spec.os.name` field should be set to `windows` to indicate that the current Pod uses Windows containers. - The `IdentifyPodOS` feature gate needs to be enabled for this field to be recognized. {{< note >}} - Starting from 1.24, the `IdentifyPodOS` feature gate is in Beta stage and defaults to be enabled. + Starting from 1.25, the `IdentifyPodOS` feature gate is in GA stage and defaults to be enabled. {{< /note >}} - If the `IdentifyPodOS` feature gate is enabled and you set the `.spec.os.name` field to `windows`, + If you set the `.spec.os.name` field to `windows`, you must not set the following fields in the `.spec` of that Pod: * `spec.hostPID` diff --git a/content/en/docs/concepts/windows/user-guide.md b/content/en/docs/concepts/windows/user-guide.md index c9e5775da8582..c40f6e7e68843 100644 --- a/content/en/docs/concepts/windows/user-guide.md +++ b/content/en/docs/concepts/windows/user-guide.md @@ -158,14 +158,13 @@ schedule Linux and Windows workloads to their respective OS-specific nodes. The recommended approach is outlined below, with one of its main goals being that this approach should not break compatibility for existing Linux workloads. -If the `IdentifyPodOS` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is -enabled, you can (and should) set `.spec.os.name` for a Pod to indicate the operating system +Starting from 1.25, please set `.spec.os.name` for a Pod to indicate the operating system that the containers in that Pod are designed for. For Pods that run Linux containers, set `.spec.os.name` to `linux`. For Pods that run Windows containers, set `.spec.os.name` -to Windows. +to `windows`. {{< note >}} -Starting from 1.24, the `IdentifyPodOS` feature is in Beta stage and defaults to be enabled. +Starting from 1.25, the `IdentifyPodOS` feature is in GA stage and defaults to be enabled. {{< /note >}} The scheduler does not use the value of `.spec.os.name` when assigning Pods to nodes. You should diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 2500db659e05a..6777161ad047d 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -120,6 +120,7 @@ different Kubernetes components. | `HPAScaleToZero` | `false` | Alpha | 1.16 | | | `IdentifyPodOS` | `false` | Alpha | 1.23 | 1.23 | | `IdentifyPodOS` | `true` | Beta | 1.24 | | +| `IdentifyPodOS` | `true` | GA | 1.25 | | | `InTreePluginAWSUnregister` | `false` | Alpha | 1.21 | | | `InTreePluginAzureDiskUnregister` | `false` | Alpha | 1.21 | | | `InTreePluginAzureFileUnregister` | `false` | Alpha | 1.21 | | From 2c306bcc12c4ac3444b0b9c60a4d65797205ce74 Mon Sep 17 00:00:00 2001 From: Humble Chirammal Date: Mon, 25 Jul 2022 12:58:59 +0530 Subject: [PATCH 19/77] csi: add nodeExpandSecret KEP details Ref# KEP: https://github.com/kubernetes/enhancements/pull/3173/ Implementation: https://github.com/kubernetes/kubernetes/pull/105963 Blog: https://github.com/kubernetes/website/pull/33979 Signed-off-by: Humble Chirammal --- content/en/docs/concepts/storage/volumes.md | 22 ++++++++++++++----- .../feature-gates.md | 3 +++ 2 files changed, 20 insertions(+), 5 deletions(-) diff --git a/content/en/docs/concepts/storage/volumes.md b/content/en/docs/concepts/storage/volumes.md index fd994c97fa7fb..53454677457fc 100644 --- a/content/en/docs/concepts/storage/volumes.md +++ b/content/en/docs/concepts/storage/volumes.md @@ -1218,16 +1218,28 @@ persistent volume: `ControllerPublishVolume` and `ControllerUnpublishVolume` calls. This field is optional, and may be empty if no secret is required. If the Secret contains more than one secret, all secrets are passed. -* `nodeStageSecretRef`: A reference to the secret object containing - sensitive information to pass to the CSI driver to complete the CSI - `NodeStageVolume` call. This field is optional, and may be empty if no secret - is required. If the Secret contains more than one secret, all secrets - are passed. +`nodeExpandSecretRef`: A reference to the secret containing sensitive + information to pass to the CSI driver to complete the CSI + `NodeExpandVolume` call. This field is optional, and may be empty if no + secret is required. If the object contains more than one secret, all + secrets are passed. When you have configured secret data for node-initiated + volume expansion, the kubelet passes that data via the `NodeExpandVolume()` + call to the CSI driver. In order to use the `nodeExpandSecretRef` field, your + cluster should be running Kubernetes version 1.25 or later and you must enable + the [feature gate](https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/) + named `CSINodeExpandSecret` for each kube-apiserver and for the kubelet on every + node. You must also be using a CSI driver that supports or requires secret data during + node-initiated storage resize operations. * `nodePublishSecretRef`: A reference to the secret object containing sensitive information to pass to the CSI driver to complete the CSI `NodePublishVolume` call. This field is optional, and may be empty if no secret is required. If the secret object contains more than one secret, all secrets are passed. +* `nodeStageSecretRef`: A reference to the secret object containing + sensitive information to pass to the CSI driver to complete the CSI + `NodeStageVolume` call. This field is optional, and may be empty if no secret + is required. If the Secret contains more than one secret, all secrets + are passed. #### CSI raw block volume support diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 2500db659e05a..38faac0a1d178 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -86,6 +86,7 @@ different Kubernetes components. | `CSIMigrationvSphere` | `false` | Beta | 1.19 | | | `CSIMigrationPortworx` | `false` | Alpha | 1.23 | | | `csiMigrationRBD` | `false` | Alpha | 1.23 | | +| `CSINodeExpandSecret` | `false` | Alpha | 1.25 | | | `CSIVolumeHealth` | `false` | Alpha | 1.21 | | | `ContextualLogging` | `false` | Alpha | 1.24 | | | `CustomCPUCFSQuotaPeriod` | `false` | Alpha | 1.12 | | @@ -761,6 +762,8 @@ Each feature gate is designed for enabling/disabling a specific feature: from the Portworx in-tree plugin to Portworx CSI plugin. Requires Portworx CSI driver to be installed and configured in the cluster. - `CSINodeInfo`: Enable all logic related to the CSINodeInfo API object in `csi.storage.k8s.io`. +- `CSINodeExpandSecret`: Enable passing secret authentication data to a CSI driver for use + during a `NodeExpandVolume` CSI operation. - `CSIPersistentVolume`: Enable discovering and mounting volumes provisioned through a [CSI (Container Storage Interface)](https://git.k8s.io/design-proposals-archive/storage/container-storage-interface.md) compatible volume plugin. From c6ff69558c3813a6f61997451f9b8a688fd95894 Mon Sep 17 00:00:00 2001 From: "Paul S. Schweigert" Date: Tue, 28 Jun 2022 08:52:37 -0400 Subject: [PATCH 20/77] update feature gate docs for probe-level termination grace period Signed-off-by: Paul S. Schweigert --- .../feature-gates.md | 3 ++- ...igure-liveness-readiness-startup-probes.md | 23 ++++++++++--------- 2 files changed, 14 insertions(+), 12 deletions(-) diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 8eb23b3a473eb..dabb3c9237342 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -168,7 +168,8 @@ different Kubernetes components. | `PodSecurity` | `false` | Alpha | 1.22 | 1.22 | | `PodSecurity` | `true` | Beta | 1.23 | | | `ProbeTerminationGracePeriod` | `false` | Alpha | 1.21 | 1.21 | -| `ProbeTerminationGracePeriod` | `false` | Beta | 1.22 | | +| `ProbeTerminationGracePeriod` | `false` | Beta | 1.22 | 1.24 | +| `ProbeTerminationGracePeriod` | `true` | Beta | 1.25 | | | `ProcMountType` | `false` | Alpha | 1.12 | | | `ProxyTerminatingEndpoints` | `false` | Alpha | 1.22 | | | `QOSReserved` | `false` | Alpha | 1.11 | | diff --git a/content/en/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes.md b/content/en/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes.md index b26baabe9e35a..5cb34353acfa6 100644 --- a/content/en/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes.md +++ b/content/en/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes.md @@ -481,7 +481,7 @@ to resolve it. ### Probe-level `terminationGracePeriodSeconds` -{{< feature-state for_k8s_version="v1.22" state="beta" >}} +{{< feature-state for_k8s_version="v1.25" state="beta" >}} Prior to release 1.21, the pod-level `terminationGracePeriodSeconds` was used for terminating a container that failed its liveness or startup probe. This @@ -489,22 +489,23 @@ coupling was unintended and may have resulted in failed containers taking an unusually long time to restart when a pod-level `terminationGracePeriodSeconds` was set. -In 1.21 and beyond, when the feature gate `ProbeTerminationGracePeriod` is -enabled, users can specify a probe-level `terminationGracePeriodSeconds` as -part of the probe specification. When the feature gate is enabled, and both a -pod- and probe-level `terminationGracePeriodSeconds` are set, the kubelet will -use the probe-level value. +In 1.25 and beyond, users can specify a probe-level `terminationGracePeriodSeconds` +as part of the probe specification. When both a pod- and probe-level +`terminationGracePeriodSeconds` are set, the kubelet will use the probe-level value. {{< note >}} -As of Kubernetes 1.22, the `ProbeTerminationGracePeriod` feature gate is only -available on the API Server. The kubelet always honors the probe-level -`terminationGracePeriodSeconds` field if it is present on a Pod. +Beginning in Kubernetes 1.25, the `ProbeTerminationGracePeriod` feature is enabled +by default. For users choosing to disable this feature, please note the following: -If you have existing Pods where the `terminationGracePeriodSeconds` field is set and +* The `ProbeTerminationGracePeriod` feature gate is only available on the API Server. +The kubelet always honors the probe-level `terminationGracePeriodSeconds` field if +it is present on a Pod. + +* If you have existing Pods where the `terminationGracePeriodSeconds` field is set and you no longer wish to use per-probe termination grace periods, you must delete those existing Pods. -When you (or the control plane, or some other component) create replacement +* When you (or the control plane, or some other component) create replacement Pods, and the feature gate `ProbeTerminationGracePeriod` is disabled, then the API server ignores the Pod-level `terminationGracePeriodSeconds` field, even if a Pod or pod template specifies it. From 02b0dea3829ba7bf64dfb5b9b1d17c8ba4e6b898 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Filip=20K=C5=99epinsk=C3=BD?= Date: Fri, 29 Jul 2022 10:29:10 +0200 Subject: [PATCH 21/77] Promote DaemonSet MaxSurge to GA --- .../reference/command-line-tools-reference/feature-gates.md | 5 +++-- content/en/docs/tasks/manage-daemon/update-daemon-set.md | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 2500db659e05a..b7f8449abde14 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -90,8 +90,6 @@ different Kubernetes components. | `ContextualLogging` | `false` | Alpha | 1.24 | | | `CustomCPUCFSQuotaPeriod` | `false` | Alpha | 1.12 | | | `CustomResourceValidationExpressions` | `false` | Alpha | 1.23 | | -| `DaemonSetUpdateSurge` | `false` | Alpha | 1.21 | 1.21 | -| `DaemonSetUpdateSurge` | `true` | Beta | 1.22 | | | `DelegateFSGroupToCSIDriver` | `false` | Alpha | 1.22 | 1.22 | | `DelegateFSGroupToCSIDriver` | `true` | Beta | 1.23 | | | `DevicePlugins` | `false` | Alpha | 1.8 | 1.9 | @@ -304,6 +302,9 @@ different Kubernetes components. | `CustomResourceWebhookConversion` | `false` | Alpha | 1.13 | 1.14 | | `CustomResourceWebhookConversion` | `true` | Beta | 1.15 | 1.15 | | `CustomResourceWebhookConversion` | `true` | GA | 1.16 | - | +| `DaemonSetUpdateSurge` | `false` | Alpha | 1.21 | 1.21 | +| `DaemonSetUpdateSurge` | `true` | Beta | 1.22 | 1.24 | +| `DaemonSetUpdateSurge` | `true` | GA | 1.25 | - | | `DefaultPodTopologySpread` | `false` | Alpha | 1.19 | 1.19 | | `DefaultPodTopologySpread` | `true` | Beta | 1.20 | 1.23 | | `DefaultPodTopologySpread` | `true` | GA | 1.24 | - | diff --git a/content/en/docs/tasks/manage-daemon/update-daemon-set.md b/content/en/docs/tasks/manage-daemon/update-daemon-set.md index d6d6b68f198a2..53e959d0e2ca1 100644 --- a/content/en/docs/tasks/manage-daemon/update-daemon-set.md +++ b/content/en/docs/tasks/manage-daemon/update-daemon-set.md @@ -40,7 +40,7 @@ You may want to set [`.spec.minReadySeconds`](/docs/reference/kubernetes-api/workload-resources/daemon-set-v1/#DaemonSetSpec) (default to 0) and [`.spec.updateStrategy.rollingUpdate.maxSurge`](/docs/reference/kubernetes-api/workload-resources/daemon-set-v1/#DaemonSetSpec) -(a beta feature and defaults to 0) as well. +(defaults to 0) as well. ### Creating a DaemonSet with `RollingUpdate` update strategy From 2d03570219091038293353f759952cd1229ec1ba Mon Sep 17 00:00:00 2001 From: Tim Bannister Date: Mon, 1 Aug 2022 18:38:38 +0100 Subject: [PATCH 22/77] Update PSP annotation for v1.25 Write about the kubernetes.io/psp annotation in the past tense. --- .../reference/labels-annotations-taints/_index.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/content/en/docs/reference/labels-annotations-taints/_index.md b/content/en/docs/reference/labels-annotations-taints/_index.md index 3f70407618fba..a06fba6e81b2d 100644 --- a/content/en/docs/reference/labels-annotations-taints/_index.md +++ b/content/en/docs/reference/labels-annotations-taints/_index.md @@ -622,11 +622,14 @@ for more information. Example: `kubernetes.io/psp: restricted` -This annotation is only relevant if you are using [PodSecurityPolicies](/docs/concepts/security/pod-security-policy/). +Used on: Pod + +This annotation was only relevant if you were using [PodSecurityPolicies](/docs/concepts/security/pod-security-policy/). +Kubernetes v{{< skew currentVersion >}} does not support the PodSecurityPolicy API. -When the PodSecurityPolicy admission controller admits a Pod, the admission controller -modifies the Pod to have this annotation. -The value of the annotation is the name of the PodSecurityPolicy that was used for validation. +When the PodSecurityPolicy admission controller admitted a Pod, the admission controller +modified the Pod to have this annotation. +The value of the annotation was the name of the PodSecurityPolicy that was used for validation. ### seccomp.security.alpha.kubernetes.io/pod (deprecated) {#seccomp-security-alpha-kubernetes-io-pod} From ce898c50be7acd45a33521d762a5e58f9fb2e63b Mon Sep 17 00:00:00 2001 From: Tim Allclair Date: Mon, 1 Aug 2022 16:49:09 -0700 Subject: [PATCH 23/77] Update Pod Security Admission docs for graduation to stable --- .../security/pod-security-admission.md | 39 +++---------------- 1 file changed, 5 insertions(+), 34 deletions(-) diff --git a/content/en/docs/concepts/security/pod-security-admission.md b/content/en/docs/concepts/security/pod-security-admission.md index 60f87c04d9e09..fb4d8de0c2330 100644 --- a/content/en/docs/concepts/security/pod-security-admission.md +++ b/content/en/docs/concepts/security/pod-security-admission.md @@ -13,23 +13,16 @@ min-kubernetes-server-version: v1.22 -{{< feature-state for_k8s_version="v1.23" state="beta" >}} +{{< feature-state for_k8s_version="v1.25" state="stable" >}} The Kubernetes [Pod Security Standards](/docs/concepts/security/pod-security-standards/) define different isolation levels for Pods. These standards let you define how you want to restrict the behavior of pods in a clear, consistent fashion. -As a beta feature, Kubernetes offers a built-in _Pod Security_ {{< glossary_tooltip -text="admission controller" term_id="admission-controller" >}}, the successor -to [PodSecurityPolicies](/docs/concepts/security/pod-security-policy/). Pod security restrictions -are applied at the {{< glossary_tooltip text="namespace" term_id="namespace" >}} level when pods -are created. - -{{< note >}} -The PodSecurityPolicy API is deprecated and will be -[removed](/docs/reference/using-api/deprecation-guide/#v1-25) from Kubernetes in v1.25. -{{< /note >}} - +Kubernetes offers a built-in _Pod Security_ {{< glossary_tooltip text="admission controller" +term_id="admission-controller" >}} to enforce the Pod Security Standards. Pod security restrictions +are applied at the {{< glossary_tooltip text="namespace" term_id="namespace" >}} level when pods are +created. ## {{% heading "prerequisites" %}} @@ -37,31 +30,9 @@ To use this mechanism, your cluster must enforce Pod Security admission. ### Built-in Pod Security admission enforcement -From Kubernetes v1.23, the `PodSecurity` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is a beta feature and is enabled by default. This page is part of the documentation for Kubernetes v{{< skew currentVersion >}}. If you are running a different version of Kubernetes, consult the documentation for that release. -### Alternative: installing the `PodSecurity` admission webhook {#webhook} - -The `PodSecurity` admission logic is also available as a [validating admission webhook](https://git.k8s.io/pod-security-admission/webhook). This implementation is also beta. -For environments where the built-in `PodSecurity` admission plugin cannot be enabled, you can instead enable that logic via a validating admission webhook. - -A pre-built container image, certificate generation scripts, and example manifests -are available at [https://git.k8s.io/pod-security-admission/webhook](https://git.k8s.io/pod-security-admission/webhook). - -To install: -```shell -git clone https://github.com/kubernetes/pod-security-admission.git -cd pod-security-admission/webhook -make certs -kubectl apply -k . -``` - -{{< note >}} -The generated certificate is valid for 2 years. Before it expires, -regenerate the certificate or remove the webhook in favor of the built-in admission plugin. -{{< /note >}} - ## Pod Security levels From 45d0bdeaecb29c7df2c2951d781511c1b03fee64 Mon Sep 17 00:00:00 2001 From: Sascha Grunert Date: Fri, 6 May 2022 12:06:49 +0200 Subject: [PATCH 24/77] Partly remove support for seccomp annotations From the release notes of https://github.com/kubernetes/kubernetes/pull/109819, we have to update according to the following situation: ``` Action required: support for the alpha seccomp annotations `seccomp.security.alpha.kubernetes.io/pod` and `container.seccomp.security.alpha.kubernetes.io`, deprecated since v1.19, has been partially removed. Kubelets no longer support the annotations, use of the annotations in static pods is no longer supported, and the seccomp annotations are no longer auto-populated when pods with seccomp fields are created. Auto-population of the seccomp fields from the annotations is planned to be removed in 1.27. Pods should use the corresponding pod or container `securityContext.seccompProfile` field instead. ``` Signed-off-by: Sascha Grunert --- .../en/docs/reference/labels-annotations-taints/_index.md | 6 ++++-- content/en/docs/tutorials/security/seccomp.md | 8 +++++++- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/content/en/docs/reference/labels-annotations-taints/_index.md b/content/en/docs/reference/labels-annotations-taints/_index.md index 3f70407618fba..3ad63fd7a12f9 100644 --- a/content/en/docs/reference/labels-annotations-taints/_index.md +++ b/content/en/docs/reference/labels-annotations-taints/_index.md @@ -630,7 +630,8 @@ The value of the annotation is the name of the PodSecurityPolicy that was used f ### seccomp.security.alpha.kubernetes.io/pod (deprecated) {#seccomp-security-alpha-kubernetes-io-pod} -This annotation has been deprecated since Kubernetes v1.19 and will become non-functional in v1.25. +This annotation has been deprecated since Kubernetes v1.19 and will become non-functional in a future release. +please use the corresponding pod or container `securityContext.seccompProfile` field instead. To specify security settings for a Pod, include the `securityContext` field in the Pod specification. The [`securityContext`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context) field within a Pod's `.spec` defines pod-level security attributes. When you [specify the security context for a Pod](/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-pod), @@ -638,7 +639,8 @@ the settings you specify apply to all containers in that Pod. ### container.seccomp.security.alpha.kubernetes.io/[NAME] (deprecated) {#container-seccomp-security-alpha-kubernetes-io} -This annotation has been deprecated since Kubernetes v1.19 and will become non-functional in v1.25. +This annotation has been deprecated since Kubernetes v1.19 and will become non-functional in a future release. +please use the corresponding pod or container `securityContext.seccompProfile` field instead. The tutorial [Restrict a Container's Syscalls with seccomp](/docs/tutorials/security/seccomp/) takes you through the steps you follow to apply a seccomp profile to a Pod or to one of its containers. That tutorial covers the supported mechanism for configuring seccomp in Kubernetes, diff --git a/content/en/docs/tutorials/security/seccomp.md b/content/en/docs/tutorials/security/seccomp.md index 21ea7a5f62937..0c91c1013de0c 100644 --- a/content/en/docs/tutorials/security/seccomp.md +++ b/content/en/docs/tutorials/security/seccomp.md @@ -281,8 +281,14 @@ Here's a manifest for that Pod: The functional support for the already deprecated seccomp annotations `seccomp.security.alpha.kubernetes.io/pod` (for the whole pod) and `container.seccomp.security.alpha.kubernetes.io/[name]` (for a single container) -is going to be removed with the release of Kubernetes v1.25. Please always use +is going to be removed with a future release of Kubernetes. Please always use the native API fields in favor of the annotations. + +Since Kubernetes v1.25, kubelets no longer support the annotations, use of the +annotations in static pods is no longer supported, and the seccomp annotations +are no longer auto-populated when pods with seccomp fields are created. +Auto-population of the seccomp fields from the annotations is planned to be +removed in a future release. {{< /note >}} Create the Pod in the cluster: From cafe6d258c91c3814d83c0655c8c6354e3eade1c Mon Sep 17 00:00:00 2001 From: Pushkar Joglekar Date: Tue, 19 Jul 2022 15:47:27 -0700 Subject: [PATCH 25/77] Fetch and Render CVE JSON feed - Pull JSON blob from queried issues - Use layout output formats + templates to generate HTML table and JSON blob - Add localized strings and caption for CVE feed - Add a new page to describe details about CVE feed and how to use it - Update existing pages and link the official CVE feed from it Co-authored-by: Neha Lohia Co-authored-by: Tim Bannister --- config.toml | 3 ++ .../docs/reference/issues-security/issues.md | 7 ++- .../issues-security/official-cve-feed.md | 44 +++++++++++++++++++ data/i18n/en/en.toml | 28 ++++++++++++ layouts/_default/cve-feed.json | 23 ++++++++++ layouts/shortcodes/cve-feed.html | 19 ++++++++ 6 files changed, 122 insertions(+), 2 deletions(-) create mode 100644 content/en/docs/reference/issues-security/official-cve-feed.md create mode 100644 layouts/_default/cve-feed.json create mode 100644 layouts/shortcodes/cve-feed.html diff --git a/config.toml b/config.toml index a153bdf11bbea..470be9773bfa7 100644 --- a/config.toml +++ b/config.toml @@ -169,6 +169,9 @@ algolia_docsearch = false # Enable Lunr.js offline search offlineSearch = false +# Official CVE feed bucket URL +cveFeedBucket = "https://storage.googleapis.com/k8s-cve-feed/official-cve-feed.json" + [params.pushAssets] css = [ "callouts", diff --git a/content/en/docs/reference/issues-security/issues.md b/content/en/docs/reference/issues-security/issues.md index 5e4ebe85c036a..3dbec5bdcc9d4 100644 --- a/content/en/docs/reference/issues-security/issues.md +++ b/content/en/docs/reference/issues-security/issues.md @@ -8,6 +8,9 @@ To report a security issue, please follow the [Kubernetes security disclosure pr Work on Kubernetes code and public issues are tracked using [GitHub Issues](https://github.com/kubernetes/kubernetes/issues/). -* [CVE-related issues](https://github.com/kubernetes/kubernetes/issues?utf8=%E2%9C%93&q=is%3Aissue+label%3Aarea%2Fsecurity+in%3Atitle+CVE) +* Official [list of known CVEs](/docs/reference/issues-security/official-cve-feed/) + (security vulnerabilities) that have been announced by the + [Security Response Committee](https://github.com/kubernetes/committee-security-response) +* [CVE-related GitHub issues](https://github.com/kubernetes/kubernetes/issues?utf8=%E2%9C%93&q=is%3Aissue+label%3Aarea%2Fsecurity+in%3Atitle+CVE) -Security-related announcements are sent to the [kubernetes-security-announce@googlegroups.com](https://groups.google.com/forum/#!forum/kubernetes-security-announce) mailing list. +Security-related announcements are sent to the [kubernetes-security-announce@googlegroups.com](https://groups.google.com/forum/#!forum/kubernetes-security-announce) mailing list. \ No newline at end of file diff --git a/content/en/docs/reference/issues-security/official-cve-feed.md b/content/en/docs/reference/issues-security/official-cve-feed.md new file mode 100644 index 0000000000000..6a48924e42b12 --- /dev/null +++ b/content/en/docs/reference/issues-security/official-cve-feed.md @@ -0,0 +1,44 @@ +--- +title: Official CVE Feed +weight: 25 +outputs: + - json + - html +layout: cve-feed +--- + +{{< feature-state for_k8s_version="v1.25" state="alpha" >}} + +This is a community maintained list of official CVEs announced by +the Kubernetes Security Response Committee. See +[Kubernetes Security and Disclosure Information](/docs/reference/issues-security/security/) +for more details. + +The Kubernetes project publishes a programmatically accessible +[JSON Feed](/docs/reference/issues-security/official-cve-feed/index.json) of +published security issues. You can access it by executing the following command: + +{{< comment >}} +`replace` is used to bypass known issue with rendering ">" +: https://github.com/gohugoio/hugo/issues/7229 in JSON layouts template +`layouts/_default/cve-feed.json` +{{< /comment >}} + +```shell +curl -v https://k8s.io/docs/reference/issues-security/official-cve-feed/index.json +``` + +{{< cve-feed >}} + + + +This feed is auto-refreshing with a noticeable but small lag (minutes to hours) +from the time a CVE is announced to the time it is accessible in this feed. + +The source of truth of this feed is a set of GitHub Issues, filtered by a controlled and +restricted label `official-cve-feed`. The raw data is stored in a Google Cloud +Bucket which is writable only by a small number of trusted members of the +Community. \ No newline at end of file diff --git a/data/i18n/en/en.toml b/data/i18n/en/en.toml index 6c57982f9b11e..831401536c58b 100644 --- a/data/i18n/en/en.toml +++ b/data/i18n/en/en.toml @@ -27,6 +27,34 @@ other = "Twitter" [community_youtube_name] other = "YouTube" + +[cve_id] +other = "CVE ID" + +[cve_issue_url] +other = "CVE GitHub Issue URL" + +[cve_json_external_url] +other = "external_url" + +[cve_json_id] +other = "id" + +[cve_json_summary] +other = "summary" + +[cve_json_url] +other = "url" + +[cve_summary] +other = "Issue Summary" + +[cve_table] +other = "Official Kubernetes CVE List" + +[cve_url] +other = "CVE URL" + [deprecation_title] other = "You are viewing documentation for Kubernetes version:" diff --git a/layouts/_default/cve-feed.json b/layouts/_default/cve-feed.json new file mode 100644 index 0000000000000..a185fde22fc77 --- /dev/null +++ b/layouts/_default/cve-feed.json @@ -0,0 +1,23 @@ +{ + "version": "https://jsonfeed.org/version/1.1", + "title": "Auto-refreshing Official CVE Feed", + "home_page_url": "https://kubernetes.io", + "feed_url": "https://kubernetes.io/docs/reference/issues-security/official-cve-feed/index.json", + "description": "Auto-refreshing official CVE feed for Kubernetes repository", + "authors": [ + { + "name": "Kubernetes Community", + "url": "https://www.kubernetes.dev" + } + ], + "items": [ + {{ range $i, $e := getJSON .Site.Params.cveFeedBucket }} + {{ if $i }}, {{ end }} + { + {{ T "cve_json_id" | jsonify }}: {{ .cve_id | jsonify }}, + {{ T "cve_json_url" | jsonify }}: {{ .issue_url | jsonify }}, + {{ T "cve_json_external_url" | jsonify }}: {{ .cve_url | jsonify}}, + {{ T "cve_json_summary" | jsonify }}: {{ replace (.summary | jsonify ) "\\u003e" ">" }} + }{{ end }} + ] +} diff --git a/layouts/shortcodes/cve-feed.html b/layouts/shortcodes/cve-feed.html new file mode 100644 index 0000000000000..1c04efab7ea8b --- /dev/null +++ b/layouts/shortcodes/cve-feed.html @@ -0,0 +1,19 @@ + + + + + + + + + + + {{ range $issues := getJSON .Site.Params.cveFeedBucket }} + + + + + + {{ end }} + +
{{ T "cve_table" }}
{{ T "cve_id" }}{{ T "cve_summary"}}{{ T "cve_issue_url" }}
{{ .cve_id | htmlEscape | safeHTML }}{{ .summary | htmlEscape | safeHTML }}#{{ .number }}
\ No newline at end of file From 5af4db73f3d9c7477eed87f2b6001e74370d8301 Mon Sep 17 00:00:00 2001 From: kerthcet Date: Fri, 15 Jul 2022 13:58:29 +0800 Subject: [PATCH 26/77] Feat: ga component config in kube-scheduler Signed-off-by: kerthcet --- .../en/docs/reference/scheduling/config.md | 52 +++++++++---------- 1 file changed, 26 insertions(+), 26 deletions(-) diff --git a/content/en/docs/reference/scheduling/config.md b/content/en/docs/reference/scheduling/config.md index 0911058ad46cc..5f1377ec43163 100644 --- a/content/en/docs/reference/scheduling/config.md +++ b/content/en/docs/reference/scheduling/config.md @@ -4,7 +4,7 @@ content_type: concept weight: 20 --- -{{< feature-state for_k8s_version="v1.19" state="beta" >}} +{{< feature-state for_k8s_version="v1.25" state="stable" >}} You can customize the behavior of the `kube-scheduler` by writing a configuration file and passing its path as a command line argument. @@ -78,7 +78,7 @@ extension points: least one bind plugin is required. 1. `postBind`: This is an informational extension point that is called after a Pod has been bound. -1. `multiPoint`: This is a config-only field that allows plugins to be enabled +1. `multiPoint`: This is a config-only field that allows plugins to be enabled or disabled for all of their applicable extension points simultaneously. For each extension point, you could disable specific [default plugins](#scheduling-plugins) @@ -231,13 +231,13 @@ only has one pending pods queue. ### Plugins that apply to multiple extension points {#multipoint} -Starting from `kubescheduler.config.k8s.io/v1beta3`, there is an additional field in the -profile config, `multiPoint`, which allows for easily enabling or disabling a plugin -across several extension points. The intent of `multiPoint` config is to simplify the +Starting from `kubescheduler.config.k8s.io/v1beta3`, there is an additional field in the +profile config, `multiPoint`, which allows for easily enabling or disabling a plugin +across several extension points. The intent of `multiPoint` config is to simplify the configuration needed for users and administrators when using custom profiles. -Consider a plugin, `MyPlugin`, which implements the `preScore`, `score`, `preFilter`, -and `filter` extension points. To enable `MyPlugin` for all its available extension +Consider a plugin, `MyPlugin`, which implements the `preScore`, `score`, `preFilter`, +and `filter` extension points. To enable `MyPlugin` for all its available extension points, the profile config looks like: ```yaml @@ -251,7 +251,7 @@ profiles: - name: MyPlugin ``` -This would equate to manually enabling `MyPlugin` for all of its extension +This would equate to manually enabling `MyPlugin` for all of its extension points, like so: ```yaml @@ -274,13 +274,13 @@ profiles: - name: MyPlugin ``` -One benefit of using `multiPoint` here is that if `MyPlugin` implements another -extension point in the future, the `multiPoint` config will automatically enable it +One benefit of using `multiPoint` here is that if `MyPlugin` implements another +extension point in the future, the `multiPoint` config will automatically enable it for the new extension. -Specific extension points can be excluded from `MultiPoint` expansion using -the `disabled` field for that extension point. This works with disabling default -plugins, non-default plugins, or with the wildcard (`'*'`) to disable all plugins. +Specific extension points can be excluded from `MultiPoint` expansion using +the `disabled` field for that extension point. This works with disabling default +plugins, non-default plugins, or with the wildcard (`'*'`) to disable all plugins. An example of this, disabling `Score` and `PreScore`, would be: ```yaml @@ -300,10 +300,10 @@ profiles: - name: '*' ``` -In `v1beta3`, all [default plugins](#scheduling-plugins) are enabled internally through `MultiPoint`. -However, individual extension points are still available to allow flexible -reconfiguration of the default values (such as ordering and Score weights). For -example, consider two Score plugins `DefaultScore1` and `DefaultScore2`, each with +In `v1beta3`, all [default plugins](#scheduling-plugins) are enabled internally through `MultiPoint`. +However, individual extension points are still available to allow flexible +reconfiguration of the default values (such as ordering and Score weights). For +example, consider two Score plugins `DefaultScore1` and `DefaultScore2`, each with a weight of `1`. They can be reordered with different weights like so: ```yaml @@ -318,10 +318,10 @@ profiles: weight: 5 ``` -In this example, it's unnecessary to specify the plugins in `MultiPoint` explicitly +In this example, it's unnecessary to specify the plugins in `MultiPoint` explicitly because they are default plugins. And the only plugin specified in `Score` is `DefaultScore2`. -This is because plugins set through specific extension points will always take precedence -over `MultiPoint` plugins. So, this snippet essentially re-orders the two plugins +This is because plugins set through specific extension points will always take precedence +over `MultiPoint` plugins. So, this snippet essentially re-orders the two plugins without needing to specify both of them. The general hierarchy for precedence when configuring `MultiPoint` plugins is as follows: @@ -363,8 +363,8 @@ profiles: - name: 'DefaultPlugin2' ``` -Note that there is no error for re-declaring a `MultiPoint` plugin in a specific -extension point. The re-declaration is ignored (and logged), as specific extension points +Note that there is no error for re-declaring a `MultiPoint` plugin in a specific +extension point. The re-declaration is ignored (and logged), as specific extension points take precedence. Besides keeping most of the config in one spot, this sample does a few things: @@ -380,14 +380,14 @@ kind: KubeSchedulerConfiguration profiles: - schedulerName: multipoint-scheduler plugins: - + # Disable the default QueueSort plugin queueSort: enabled: - name: 'CustomQueueSort' disabled: - name: 'DefaultQueueSort' - + # Enable custom Filter plugins filter: enabled: @@ -396,7 +396,7 @@ profiles: - name: 'DefaultPlugin2' disabled: - name: 'DefaultPlugin1' - + # Enable and reorder custom score plugins score: enabled: @@ -406,7 +406,7 @@ profiles: weight: 3 ``` -While this is a complicated example, it demonstrates the flexibility of `MultiPoint` config +While this is a complicated example, it demonstrates the flexibility of `MultiPoint` config as well as its seamless integration with the existing methods for configuring extension points. ## Scheduler configuration migrations From e7a80c00563e7e3e0c83574960297966bc17af57 Mon Sep 17 00:00:00 2001 From: kerthcet Date: Wed, 3 Aug 2022 14:40:26 +0800 Subject: [PATCH 27/77] Graduate Component Config in kube-scheduler to GA Signed-off-by: kerthcet --- .../config-api/kube-scheduler-config.v1.md | 1384 +++++++++++++++++ .../en/docs/reference/scheduling/config.md | 40 +- 2 files changed, 1408 insertions(+), 16 deletions(-) create mode 100644 content/en/docs/reference/config-api/kube-scheduler-config.v1.md diff --git a/content/en/docs/reference/config-api/kube-scheduler-config.v1.md b/content/en/docs/reference/config-api/kube-scheduler-config.v1.md new file mode 100644 index 0000000000000..a2fb331d9ecfa --- /dev/null +++ b/content/en/docs/reference/config-api/kube-scheduler-config.v1.md @@ -0,0 +1,1384 @@ +--- +title: kube-scheduler Configuration (v1) +content_type: tool-reference +package: kubescheduler.config.k8s.io/v1 +auto_generated: true +--- + + +## Resource Types + + +- [DefaultPreemptionArgs](#kubescheduler-config-k8s-io-v1-DefaultPreemptionArgs) +- [InterPodAffinityArgs](#kubescheduler-config-k8s-io-v1-InterPodAffinityArgs) +- [KubeSchedulerConfiguration](#kubescheduler-config-k8s-io-v1-KubeSchedulerConfiguration) +- [NodeAffinityArgs](#kubescheduler-config-k8s-io-v1-NodeAffinityArgs) +- [NodeResourcesBalancedAllocationArgs](#kubescheduler-config-k8s-io-v1-NodeResourcesBalancedAllocationArgs) +- [NodeResourcesFitArgs](#kubescheduler-config-k8s-io-v1-NodeResourcesFitArgs) +- [PodTopologySpreadArgs](#kubescheduler-config-k8s-io-v1-PodTopologySpreadArgs) +- [VolumeBindingArgs](#kubescheduler-config-k8s-io-v1-VolumeBindingArgs) + + + +## `ClientConnectionConfiguration` {#ClientConnectionConfiguration} + + +**Appears in:** + +- [KubeSchedulerConfiguration](#kubescheduler-config-k8s-io-v1beta3-KubeSchedulerConfiguration) + +- [KubeSchedulerConfiguration](#kubescheduler-config-k8s-io-v1-KubeSchedulerConfiguration) + + +

ClientConnectionConfiguration contains details for constructing a client.

+ + + + + + + + + + + + + + + + + + + + + + + +
FieldDescription
kubeconfig [Required]
+string +
+

kubeconfig is the path to a KubeConfig file.

+
acceptContentTypes [Required]
+string +
+

acceptContentTypes defines the Accept header sent by clients when connecting to a server, overriding the +default value of 'application/json'. This field will control all connections to the server used by a particular +client.

+
contentType [Required]
+string +
+

contentType is the content type used when sending data to the server from this client.

+
qps [Required]
+float32 +
+

qps controls the number of queries per second allowed for this connection.

+
burst [Required]
+int32 +
+

burst allows extra queries to accumulate when a client is exceeding its rate.

+
+ +## `DebuggingConfiguration` {#DebuggingConfiguration} + + +**Appears in:** + +- [KubeSchedulerConfiguration](#kubescheduler-config-k8s-io-v1beta3-KubeSchedulerConfiguration) +- [KubeSchedulerConfiguration](#kubescheduler-config-k8s-io-v1-KubeSchedulerConfiguration) + + + +

DebuggingConfiguration holds configuration for Debugging related features.

+ + + + + + + + + + + + + + +
FieldDescription
enableProfiling [Required]
+bool +
+

enableProfiling enables profiling via web interface host:port/debug/pprof/

+
enableContentionProfiling [Required]
+bool +
+

enableContentionProfiling enables lock contention profiling, if +enableProfiling is true.

+
+ +## `FormatOptions` {#FormatOptions} + + +**Appears in:** + +- [LoggingConfiguration](#LoggingConfiguration) + + +

FormatOptions contains options for the different logging formats.

+ + + + + + + + + + + +
FieldDescription
json [Required]
+JSONOptions +
+

[Experimental] JSON contains options for logging format "json".

+
+ +## `JSONOptions` {#JSONOptions} + + +**Appears in:** + +- [FormatOptions](#FormatOptions) + + +

JSONOptions contains options for logging format "json".

+ + + + + + + + + + + + + + +
FieldDescription
splitStream [Required]
+bool +
+

[Experimental] SplitStream redirects error messages to stderr while +info messages go to stdout, with buffering. The default is to write +both to stdout, without buffering.

+
infoBufferSize [Required]
+k8s.io/apimachinery/pkg/api/resource.QuantityValue +
+

[Experimental] InfoBufferSize sets the size of the info stream when +using split streams. The default is zero, which disables buffering.

+
+ +## `LeaderElectionConfiguration` {#LeaderElectionConfiguration} + + +**Appears in:** + +- [KubeSchedulerConfiguration](#kubescheduler-config-k8s-io-v1beta3-KubeSchedulerConfiguration) + +- [KubeSchedulerConfiguration](#kubescheduler-config-k8s-io-v1-KubeSchedulerConfiguration) + + +

LeaderElectionConfiguration defines the configuration of leader election +clients for components that can run with leader election enabled.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FieldDescription
leaderElect [Required]
+bool +
+

leaderElect enables a leader election client to gain leadership +before executing the main loop. Enable this when running replicated +components for high availability.

+
leaseDuration [Required]
+meta/v1.Duration +
+

leaseDuration is the duration that non-leader candidates will wait +after observing a leadership renewal until attempting to acquire +leadership of a led but unrenewed leader slot. This is effectively the +maximum duration that a leader can be stopped before it is replaced +by another candidate. This is only applicable if leader election is +enabled.

+
renewDeadline [Required]
+meta/v1.Duration +
+

renewDeadline is the interval between attempts by the acting master to +renew a leadership slot before it stops leading. This must be less +than or equal to the lease duration. This is only applicable if leader +election is enabled.

+
retryPeriod [Required]
+meta/v1.Duration +
+

retryPeriod is the duration the clients should wait between attempting +acquisition and renewal of a leadership. This is only applicable if +leader election is enabled.

+
resourceLock [Required]
+string +
+

resourceLock indicates the resource object type that will be used to lock +during leader election cycles.

+
resourceName [Required]
+string +
+

resourceName indicates the name of resource object that will be used to lock +during leader election cycles.

+
resourceNamespace [Required]
+string +
+

resourceName indicates the namespace of resource object that will be used to lock +during leader election cycles.

+
+ +## `LoggingConfiguration` {#LoggingConfiguration} + + +**Appears in:** + +- [KubeletConfiguration](#kubelet-config-k8s-io-v1beta1-KubeletConfiguration) + + +

LoggingConfiguration contains logging options +Refer Logs Options for more information.

+ + + + + + + + + + + + + + + + + + + + + + + +
FieldDescription
format [Required]
+string +
+

Format Flag specifies the structure of log messages. +default value of format is text

+
flushFrequency [Required]
+time.Duration +
+

Maximum number of nanoseconds (i.e. 1s = 1000000000) between log +flushes. Ignored if the selected logging backend writes log +messages without buffering.

+
verbosity [Required]
+uint32 +
+

Verbosity is the threshold that determines which log messages are +logged. Default is zero which logs only the most important +messages. Higher values enable additional messages. Error messages +are always logged.

+
vmodule [Required]
+VModuleConfiguration +
+

VModule overrides the verbosity threshold for individual files. +Only supported for "text" log format.

+
options [Required]
+FormatOptions +
+

[Experimental] Options holds additional parameters that are specific +to the different logging formats. Only the options for the selected +format get used, but all of them get validated.

+
+ +## `VModuleConfiguration` {#VModuleConfiguration} + +(Alias of `[]k8s.io/component-base/config/v1alpha1.VModuleItem`) + +**Appears in:** + +- [LoggingConfiguration](#LoggingConfiguration) + + +

VModuleConfiguration is a collection of individual file names or patterns +and the corresponding verbosity threshold.

+ + + + + + +## `DefaultPreemptionArgs` {#kubescheduler-config-k8s-io-v1-DefaultPreemptionArgs} + + + +

DefaultPreemptionArgs holds arguments used to configure the +DefaultPreemption plugin.

+ + + + + + + + + + + + + + + + + +
FieldDescription
apiVersion
string
kubescheduler.config.k8s.io/v1
kind
string
DefaultPreemptionArgs
minCandidateNodesPercentage [Required]
+int32 +
+

MinCandidateNodesPercentage is the minimum number of candidates to +shortlist when dry running preemption as a percentage of number of nodes. +Must be in the range [0, 100]. Defaults to 10% of the cluster size if +unspecified.

+
minCandidateNodesAbsolute [Required]
+int32 +
+

MinCandidateNodesAbsolute is the absolute minimum number of candidates to +shortlist. The likely number of candidates enumerated for dry running +preemption is given by the formula: +numCandidates = max(numNodes * minCandidateNodesPercentage, minCandidateNodesAbsolute) +We say "likely" because there are other factors such as PDB violations +that play a role in the number of candidates shortlisted. Must be at least +0 nodes. Defaults to 100 nodes if unspecified.

+
+ +## `InterPodAffinityArgs` {#kubescheduler-config-k8s-io-v1-InterPodAffinityArgs} + + + +

InterPodAffinityArgs holds arguments used to configure the InterPodAffinity plugin.

+ + + + + + + + + + + + + + +
FieldDescription
apiVersion
string
kubescheduler.config.k8s.io/v1
kind
string
InterPodAffinityArgs
hardPodAffinityWeight [Required]
+int32 +
+

HardPodAffinityWeight is the scoring weight for existing pods with a +matching hard affinity to the incoming pod.

+
+ +## `KubeSchedulerConfiguration` {#kubescheduler-config-k8s-io-v1-KubeSchedulerConfiguration} + + + +

KubeSchedulerConfiguration configures a scheduler

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FieldDescription
apiVersion
string
kubescheduler.config.k8s.io/v1
kind
string
KubeSchedulerConfiguration
parallelism [Required]
+int32 +
+

Parallelism defines the amount of parallelism in algorithms for scheduling a Pods. Must be greater than 0. Defaults to 16

+
leaderElection [Required]
+LeaderElectionConfiguration +
+

LeaderElection defines the configuration of leader election client.

+
clientConnection [Required]
+ClientConnectionConfiguration +
+

ClientConnection specifies the kubeconfig file and client connection +settings for the proxy server to use when communicating with the apiserver.

+
DebuggingConfiguration [Required]
+DebuggingConfiguration +
(Members of DebuggingConfiguration are embedded into this type.) +

DebuggingConfiguration holds configuration for Debugging related features +TODO: We might wanna make this a substruct like Debugging componentbaseconfigv1alpha1.DebuggingConfiguration

+
percentageOfNodesToScore [Required]
+int32 +
+

PercentageOfNodesToScore is the percentage of all nodes that once found feasible +for running a pod, the scheduler stops its search for more feasible nodes in +the cluster. This helps improve scheduler's performance. Scheduler always tries to find +at least "minFeasibleNodesToFind" feasible nodes no matter what the value of this flag is. +Example: if the cluster size is 500 nodes and the value of this flag is 30, +then scheduler stops finding further feasible nodes once it finds 150 feasible ones. +When the value is 0, default percentage (5%--50% based on the size of the cluster) of the +nodes will be scored.

+
podInitialBackoffSeconds [Required]
+int64 +
+

PodInitialBackoffSeconds is the initial backoff for unschedulable pods. +If specified, it must be greater than 0. If this value is null, the default value (1s) +will be used.

+
podMaxBackoffSeconds [Required]
+int64 +
+

PodMaxBackoffSeconds is the max backoff for unschedulable pods. +If specified, it must be greater than podInitialBackoffSeconds. If this value is null, +the default value (10s) will be used.

+
profiles [Required]
+[]KubeSchedulerProfile +
+

Profiles are scheduling profiles that kube-scheduler supports. Pods can +choose to be scheduled under a particular profile by setting its associated +scheduler name. Pods that don't specify any scheduler name are scheduled +with the "default-scheduler" profile, if present here.

+
extenders [Required]
+[]Extender +
+

Extenders are the list of scheduler extenders, each holding the values of how to communicate +with the extender. These extenders are shared by all scheduler profiles.

+
+ +## `NodeAffinityArgs` {#kubescheduler-config-k8s-io-v1-NodeAffinityArgs} + + + +

NodeAffinityArgs holds arguments to configure the NodeAffinity plugin.

+ + + + + + + + + + + + + + +
FieldDescription
apiVersion
string
kubescheduler.config.k8s.io/v1
kind
string
NodeAffinityArgs
addedAffinity
+core/v1.NodeAffinity +
+

AddedAffinity is applied to all Pods additionally to the NodeAffinity +specified in the PodSpec. That is, Nodes need to satisfy AddedAffinity +AND .spec.NodeAffinity. AddedAffinity is empty by default (all Nodes +match). +When AddedAffinity is used, some Pods with affinity requirements that match +a specific Node (such as Daemonset Pods) might remain unschedulable.

+
+ +## `NodeResourcesBalancedAllocationArgs` {#kubescheduler-config-k8s-io-v1-NodeResourcesBalancedAllocationArgs} + + + +

NodeResourcesBalancedAllocationArgs holds arguments used to configure NodeResourcesBalancedAllocation plugin.

+ + + + + + + + + + + + + + +
FieldDescription
apiVersion
string
kubescheduler.config.k8s.io/v1
kind
string
NodeResourcesBalancedAllocationArgs
resources [Required]
+[]ResourceSpec +
+

Resources to be managed, the default is "cpu" and "memory" if not specified.

+
+ +## `NodeResourcesFitArgs` {#kubescheduler-config-k8s-io-v1-NodeResourcesFitArgs} + + + +

NodeResourcesFitArgs holds arguments used to configure the NodeResourcesFit plugin.

+ + + + + + + + + + + + + + + + + + + + +
FieldDescription
apiVersion
string
kubescheduler.config.k8s.io/v1
kind
string
NodeResourcesFitArgs
ignoredResources [Required]
+[]string +
+

IgnoredResources is the list of resources that NodeResources fit filter +should ignore. This doesn't apply to scoring.

+
ignoredResourceGroups [Required]
+[]string +
+

IgnoredResourceGroups defines the list of resource groups that NodeResources fit filter should ignore. +e.g. if group is ["example.com"], it will ignore all resource names that begin +with "example.com", such as "example.com/aaa" and "example.com/bbb". +A resource group name can't contain '/'. This doesn't apply to scoring.

+
scoringStrategy [Required]
+ScoringStrategy +
+

ScoringStrategy selects the node resource scoring strategy. +The default strategy is LeastAllocated with an equal "cpu" and "memory" weight.

+
+ +## `PodTopologySpreadArgs` {#kubescheduler-config-k8s-io-v1-PodTopologySpreadArgs} + + + +

PodTopologySpreadArgs holds arguments used to configure the PodTopologySpread plugin.

+ + + + + + + + + + + + + + + + + +
FieldDescription
apiVersion
string
kubescheduler.config.k8s.io/v1
kind
string
PodTopologySpreadArgs
defaultConstraints
+[]core/v1.TopologySpreadConstraint +
+

DefaultConstraints defines topology spread constraints to be applied to +Pods that don't define any in pod.spec.topologySpreadConstraints. +.defaultConstraints[*].labelSelectors must be empty, as they are +deduced from the Pod's membership to Services, ReplicationControllers, +ReplicaSets or StatefulSets. +When not empty, .defaultingType must be "List".

+
defaultingType
+PodTopologySpreadConstraintsDefaulting +
+

DefaultingType determines how .defaultConstraints are deduced. Can be one +of "System" or "List".

+
    +
  • "System": Use kubernetes defined constraints that spread Pods among +Nodes and Zones.
  • +
  • "List": Use constraints defined in .defaultConstraints.
  • +
+

Defaults to "System".

+
+ +## `VolumeBindingArgs` {#kubescheduler-config-k8s-io-v1-VolumeBindingArgs} + + + +

VolumeBindingArgs holds arguments used to configure the VolumeBinding plugin.

+ + + + + + + + + + + + + + + + + +
FieldDescription
apiVersion
string
kubescheduler.config.k8s.io/v1
kind
string
VolumeBindingArgs
bindTimeoutSeconds [Required]
+int64 +
+

BindTimeoutSeconds is the timeout in seconds in volume binding operation. +Value must be non-negative integer. The value zero indicates no waiting. +If this value is nil, the default value (600) will be used.

+
shape
+[]UtilizationShapePoint +
+

Shape specifies the points defining the score function shape, which is +used to score nodes based on the utilization of statically provisioned +PVs. The utilization is calculated by dividing the total requested +storage of the pod by the total capacity of feasible PVs on each node. +Each point contains utilization (ranges from 0 to 100) and its +associated score (ranges from 0 to 10). You can turn the priority by +specifying different scores for different utilization numbers. +The default shape points are:

+
    +
  1. 0 for 0 utilization
  2. +
  3. 10 for 100 utilization +All points must be sorted in increasing order by utilization.
  4. +
+
+ +## `Extender` {#kubescheduler-config-k8s-io-v1-Extender} + + +**Appears in:** + +- [KubeSchedulerConfiguration](#kubescheduler-config-k8s-io-v1-KubeSchedulerConfiguration) + + +

Extender holds the parameters used to communicate with the extender. If a verb is unspecified/empty, +it is assumed that the extender chose not to provide that extension.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FieldDescription
urlPrefix [Required]
+string +
+

URLPrefix at which the extender is available

+
filterVerb [Required]
+string +
+

Verb for the filter call, empty if not supported. This verb is appended to the URLPrefix when issuing the filter call to extender.

+
preemptVerb [Required]
+string +
+

Verb for the preempt call, empty if not supported. This verb is appended to the URLPrefix when issuing the preempt call to extender.

+
prioritizeVerb [Required]
+string +
+

Verb for the prioritize call, empty if not supported. This verb is appended to the URLPrefix when issuing the prioritize call to extender.

+
weight [Required]
+int64 +
+

The numeric multiplier for the node scores that the prioritize call generates. +The weight should be a positive integer

+
bindVerb [Required]
+string +
+

Verb for the bind call, empty if not supported. This verb is appended to the URLPrefix when issuing the bind call to extender. +If this method is implemented by the extender, it is the extender's responsibility to bind the pod to apiserver. Only one extender +can implement this function.

+
enableHTTPS [Required]
+bool +
+

EnableHTTPS specifies whether https should be used to communicate with the extender

+
tlsConfig [Required]
+ExtenderTLSConfig +
+

TLSConfig specifies the transport layer security config

+
httpTimeout [Required]
+meta/v1.Duration +
+

HTTPTimeout specifies the timeout duration for a call to the extender. Filter timeout fails the scheduling of the pod. Prioritize +timeout is ignored, k8s/other extenders priorities are used to select the node.

+
nodeCacheCapable [Required]
+bool +
+

NodeCacheCapable specifies that the extender is capable of caching node information, +so the scheduler should only send minimal information about the eligible nodes +assuming that the extender already cached full details of all nodes in the cluster

+
managedResources
+[]ExtenderManagedResource +
+

ManagedResources is a list of extended resources that are managed by +this extender.

+
    +
  • A pod will be sent to the extender on the Filter, Prioritize and Bind +(if the extender is the binder) phases iff the pod requests at least +one of the extended resources in this list. If empty or unspecified, +all pods will be sent to this extender.
  • +
  • If IgnoredByScheduler is set to true for a resource, kube-scheduler +will skip checking the resource in predicates.
  • +
+
ignorable [Required]
+bool +
+

Ignorable specifies if the extender is ignorable, i.e. scheduling should not +fail when the extender returns an error or is not reachable.

+
+ +## `ExtenderManagedResource` {#kubescheduler-config-k8s-io-v1-ExtenderManagedResource} + + +**Appears in:** + +- [Extender](#kubescheduler-config-k8s-io-v1-Extender) + + +

ExtenderManagedResource describes the arguments of extended resources +managed by an extender.

+ + + + + + + + + + + + + + +
FieldDescription
name [Required]
+string +
+

Name is the extended resource name.

+
ignoredByScheduler [Required]
+bool +
+

IgnoredByScheduler indicates whether kube-scheduler should ignore this +resource when applying predicates.

+
+ +## `ExtenderTLSConfig` {#kubescheduler-config-k8s-io-v1-ExtenderTLSConfig} + + +**Appears in:** + +- [Extender](#kubescheduler-config-k8s-io-v1-Extender) + + +

ExtenderTLSConfig contains settings to enable TLS with extender

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FieldDescription
insecure [Required]
+bool +
+

Server should be accessed without verifying the TLS certificate. For testing only.

+
serverName [Required]
+string +
+

ServerName is passed to the server for SNI and is used in the client to check server +certificates against. If ServerName is empty, the hostname used to contact the +server is used.

+
certFile [Required]
+string +
+

Server requires TLS client certificate authentication

+
keyFile [Required]
+string +
+

Server requires TLS client certificate authentication

+
caFile [Required]
+string +
+

Trusted root certificates for server

+
certData [Required]
+[]byte +
+

CertData holds PEM-encoded bytes (typically read from a client certificate file). +CertData takes precedence over CertFile

+
keyData [Required]
+[]byte +
+

KeyData holds PEM-encoded bytes (typically read from a client certificate key file). +KeyData takes precedence over KeyFile

+
caData [Required]
+[]byte +
+

CAData holds PEM-encoded bytes (typically read from a root certificates bundle). +CAData takes precedence over CAFile

+
+ +## `KubeSchedulerProfile` {#kubescheduler-config-k8s-io-v1-KubeSchedulerProfile} + + +**Appears in:** + +- [KubeSchedulerConfiguration](#kubescheduler-config-k8s-io-v1-KubeSchedulerConfiguration) + + +

KubeSchedulerProfile is a scheduling profile.

+ + + + + + + + + + + + + + + + + +
FieldDescription
schedulerName [Required]
+string +
+

SchedulerName is the name of the scheduler associated to this profile. +If SchedulerName matches with the pod's "spec.schedulerName", then the pod +is scheduled with this profile.

+
plugins [Required]
+Plugins +
+

Plugins specify the set of plugins that should be enabled or disabled. +Enabled plugins are the ones that should be enabled in addition to the +default plugins. Disabled plugins are any of the default plugins that +should be disabled. +When no enabled or disabled plugin is specified for an extension point, +default plugins for that extension point will be used if there is any. +If a QueueSort plugin is specified, the same QueueSort Plugin and +PluginConfig must be specified for all profiles.

+
pluginConfig [Required]
+[]PluginConfig +
+

PluginConfig is an optional set of custom plugin arguments for each plugin. +Omitting config args for a plugin is equivalent to using the default config +for that plugin.

+
+ +## `Plugin` {#kubescheduler-config-k8s-io-v1-Plugin} + + +**Appears in:** + +- [PluginSet](#kubescheduler-config-k8s-io-v1-PluginSet) + + +

Plugin specifies a plugin name and its weight when applicable. Weight is used only for Score plugins.

+ + + + + + + + + + + + + + +
FieldDescription
name [Required]
+string +
+

Name defines the name of plugin

+
weight [Required]
+int32 +
+

Weight defines the weight of plugin, only used for Score plugins.

+
+ +## `PluginConfig` {#kubescheduler-config-k8s-io-v1-PluginConfig} + + +**Appears in:** + +- [KubeSchedulerProfile](#kubescheduler-config-k8s-io-v1-KubeSchedulerProfile) + + +

PluginConfig specifies arguments that should be passed to a plugin at the time of initialization. +A plugin that is invoked at multiple extension points is initialized once. Args can have arbitrary structure. +It is up to the plugin to process these Args.

+ + + + + + + + + + + + + + +
FieldDescription
name [Required]
+string +
+

Name defines the name of plugin being configured

+
args [Required]
+k8s.io/apimachinery/pkg/runtime.RawExtension +
+

Args defines the arguments passed to the plugins at the time of initialization. Args can have arbitrary structure.

+
+ +## `PluginSet` {#kubescheduler-config-k8s-io-v1-PluginSet} + + +**Appears in:** + +- [Plugins](#kubescheduler-config-k8s-io-v1-Plugins) + + +

PluginSet specifies enabled and disabled plugins for an extension point. +If an array is empty, missing, or nil, default plugins at that extension point will be used.

+ + + + + + + + + + + + + + +
FieldDescription
enabled [Required]
+[]Plugin +
+

Enabled specifies plugins that should be enabled in addition to default plugins. +If the default plugin is also configured in the scheduler config file, the weight of plugin will +be overridden accordingly. +These are called after default plugins and in the same order specified here.

+
disabled [Required]
+[]Plugin +
+

Disabled specifies default plugins that should be disabled. +When all default plugins need to be disabled, an array containing only one "*" should be provided.

+
+ +## `Plugins` {#kubescheduler-config-k8s-io-v1-Plugins} + + +**Appears in:** + +- [KubeSchedulerProfile](#kubescheduler-config-k8s-io-v1-KubeSchedulerProfile) + + +

Plugins include multiple extension points. When specified, the list of plugins for +a particular extension point are the only ones enabled. If an extension point is +omitted from the config, then the default set of plugins is used for that extension point. +Enabled plugins are called in the order specified here, after default plugins. If they need to +be invoked before default plugins, default plugins must be disabled and re-enabled here in desired order.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FieldDescription
queueSort [Required]
+PluginSet +
+

QueueSort is a list of plugins that should be invoked when sorting pods in the scheduling queue.

+
preFilter [Required]
+PluginSet +
+

PreFilter is a list of plugins that should be invoked at "PreFilter" extension point of the scheduling framework.

+
filter [Required]
+PluginSet +
+

Filter is a list of plugins that should be invoked when filtering out nodes that cannot run the Pod.

+
postFilter [Required]
+PluginSet +
+

PostFilter is a list of plugins that are invoked after filtering phase, but only when no feasible nodes were found for the pod.

+
preScore [Required]
+PluginSet +
+

PreScore is a list of plugins that are invoked before scoring.

+
score [Required]
+PluginSet +
+

Score is a list of plugins that should be invoked when ranking nodes that have passed the filtering phase.

+
reserve [Required]
+PluginSet +
+

Reserve is a list of plugins invoked when reserving/unreserving resources +after a node is assigned to run the pod.

+
permit [Required]
+PluginSet +
+

Permit is a list of plugins that control binding of a Pod. These plugins can prevent or delay binding of a Pod.

+
preBind [Required]
+PluginSet +
+

PreBind is a list of plugins that should be invoked before a pod is bound.

+
bind [Required]
+PluginSet +
+

Bind is a list of plugins that should be invoked at "Bind" extension point of the scheduling framework. +The scheduler call these plugins in order. Scheduler skips the rest of these plugins as soon as one returns success.

+
postBind [Required]
+PluginSet +
+

PostBind is a list of plugins that should be invoked after a pod is successfully bound.

+
multiPoint [Required]
+PluginSet +
+

MultiPoint is a simplified config section to enable plugins for all valid extension points. +Plugins enabled through MultiPoint will automatically register for every individual extension +point the plugin has implemented. Disabling a plugin through MultiPoint disables that behavior. +The same is true for disabling "*" through MultiPoint (no default plugins will be automatically registered). +Plugins can still be disabled through their individual extension points.

+

In terms of precedence, plugin config follows this basic hierarchy

+
    +
  1. Specific extension points
  2. +
  3. Explicitly configured MultiPoint plugins
  4. +
  5. The set of default plugins, as MultiPoint plugins +This implies that a higher precedence plugin will run first and overwrite any settings within MultiPoint. +Explicitly user-configured plugins also take a higher precedence over default plugins. +Within this hierarchy, an Enabled setting takes precedence over Disabled. For example, if a plugin is +set in both multiPoint.Enabled and multiPoint.Disabled, the plugin will be enabled. Similarly, +including multiPoint.Disabled = '*' and multiPoint.Enabled = pluginA will still register that specific +plugin through MultiPoint. This follows the same behavior as all other extension point configurations.
  6. +
+
+ +## `PodTopologySpreadConstraintsDefaulting` {#kubescheduler-config-k8s-io-v1-PodTopologySpreadConstraintsDefaulting} + +(Alias of `string`) + +**Appears in:** + +- [PodTopologySpreadArgs](#kubescheduler-config-k8s-io-v1-PodTopologySpreadArgs) + + +

PodTopologySpreadConstraintsDefaulting defines how to set default constraints +for the PodTopologySpread plugin.

+ + + + +## `RequestedToCapacityRatioParam` {#kubescheduler-config-k8s-io-v1-RequestedToCapacityRatioParam} + + +**Appears in:** + +- [ScoringStrategy](#kubescheduler-config-k8s-io-v1-ScoringStrategy) + + +

RequestedToCapacityRatioParam define RequestedToCapacityRatio parameters

+ + + + + + + + + + + +
FieldDescription
shape [Required]
+[]UtilizationShapePoint +
+

Shape is a list of points defining the scoring function shape.

+
+ +## `ResourceSpec` {#kubescheduler-config-k8s-io-v1-ResourceSpec} + + +**Appears in:** + +- [NodeResourcesBalancedAllocationArgs](#kubescheduler-config-k8s-io-v1-NodeResourcesBalancedAllocationArgs) + +- [ScoringStrategy](#kubescheduler-config-k8s-io-v1-ScoringStrategy) + + +

ResourceSpec represents a single resource.

+ + + + + + + + + + + + + + +
FieldDescription
name [Required]
+string +
+

Name of the resource.

+
weight [Required]
+int64 +
+

Weight of the resource.

+
+ +## `ScoringStrategy` {#kubescheduler-config-k8s-io-v1-ScoringStrategy} + + +**Appears in:** + +- [NodeResourcesFitArgs](#kubescheduler-config-k8s-io-v1-NodeResourcesFitArgs) + + +

ScoringStrategy define ScoringStrategyType for node resource plugin

+ + + + + + + + + + + + + + + + + +
FieldDescription
type [Required]
+ScoringStrategyType +
+

Type selects which strategy to run.

+
resources [Required]
+[]ResourceSpec +
+

Resources to consider when scoring. +The default resource set includes "cpu" and "memory" with an equal weight. +Allowed weights go from 1 to 100. +Weight defaults to 1 if not specified or explicitly set to 0.

+
requestedToCapacityRatio [Required]
+RequestedToCapacityRatioParam +
+

Arguments specific to RequestedToCapacityRatio strategy.

+
+ +## `ScoringStrategyType` {#kubescheduler-config-k8s-io-v1-ScoringStrategyType} + +(Alias of `string`) + +**Appears in:** + +- [ScoringStrategy](#kubescheduler-config-k8s-io-v1-ScoringStrategy) + + +

ScoringStrategyType the type of scoring strategy used in NodeResourcesFit plugin.

+ + + + +## `UtilizationShapePoint` {#kubescheduler-config-k8s-io-v1-UtilizationShapePoint} + + +**Appears in:** + +- [VolumeBindingArgs](#kubescheduler-config-k8s-io-v1-VolumeBindingArgs) + +- [RequestedToCapacityRatioParam](#kubescheduler-config-k8s-io-v1-RequestedToCapacityRatioParam) + + +

UtilizationShapePoint represents single point of priority function shape.

+ + + + + + + + + + + + + + +
FieldDescription
utilization [Required]
+int32 +
+

Utilization (x axis). Valid values are 0 to 100. Fully utilized node maps to 100.

+
score [Required]
+int32 +
+

Score assigned to given utilization (y axis). Valid values are 0 to 10.

+
diff --git a/content/en/docs/reference/scheduling/config.md b/content/en/docs/reference/scheduling/config.md index 5f1377ec43163..37888407d7bda 100644 --- a/content/en/docs/reference/scheduling/config.md +++ b/content/en/docs/reference/scheduling/config.md @@ -20,19 +20,25 @@ by implementing one or more of these extension points. You can specify scheduling profiles by running `kube-scheduler --config `, using the -KubeSchedulerConfiguration ([v1beta2](/docs/reference/config-api/kube-scheduler-config.v1beta2/) -or [v1beta3](/docs/reference/config-api/kube-scheduler-config.v1beta3/)) +KubeSchedulerConfiguration ([v1beta3](/docs/reference/config-api/kube-scheduler-config.v1beta3/) +or [v1](/docs/reference/config-api/kube-scheduler-config.v1/)) struct. A minimal configuration looks as follows: ```yaml -apiVersion: kubescheduler.config.k8s.io/v1beta2 +apiVersion: kubescheduler.config.k8s.io/v1 kind: KubeSchedulerConfiguration clientConnection: kubeconfig: /etc/srv/kubernetes/kube-scheduler/kubeconfig ``` + {{< note >}} + KubeSchedulerConfiguration [v1beta2](/docs/reference/config-api/kube-scheduler-config.v1beta2/) + is deprecated in v1.25 and will be removed in v1.26. Please migrate KubeSchedulerConfiguration to + [v1beta3](/docs/reference/config-api/kube-scheduler-config.v1beta3/) or [v1](/docs/reference/config-api/kube-scheduler-config.v1/) + before upgrading Kubernetes to v1.25. + {{< /note >}} ## Profiles A scheduling Profile allows you to configure the different stages of scheduling @@ -85,7 +91,7 @@ For each extension point, you could disable specific [default plugins](#scheduli or enable your own. For example: ```yaml -apiVersion: kubescheduler.config.k8s.io/v1beta2 +apiVersion: kubescheduler.config.k8s.io/v1 kind: KubeSchedulerConfiguration profiles: - plugins: @@ -172,11 +178,6 @@ extension points: You can also enable the following plugins, through the component config APIs, that are not enabled by default: -- `SelectorSpread`: Favors spreading across nodes for Pods that belong to - {{< glossary_tooltip text="Services" term_id="service" >}}, - {{< glossary_tooltip text="ReplicaSets" term_id="replica-set" >}} and - {{< glossary_tooltip text="StatefulSets" term_id="statefulset" >}}. - Extension points: `preScore`, `score`. - `CinderLimits`: Checks that [OpenStack Cinder](https://docs.openstack.org/cinder/) volume limits can be satisfied for the node. Extension points: `filter`. @@ -192,7 +193,7 @@ profiles: one with the default plugins and one with all scoring plugins disabled. ```yaml -apiVersion: kubescheduler.config.k8s.io/v1beta2 +apiVersion: kubescheduler.config.k8s.io/v1 kind: KubeSchedulerConfiguration profiles: - schedulerName: default-scheduler @@ -241,7 +242,7 @@ and `filter` extension points. To enable `MyPlugin` for all its available extens points, the profile config looks like: ```yaml -apiVersion: kubescheduler.config.k8s.io/v1beta3 +apiVersion: kubescheduler.config.k8s.io/v1 kind: KubeSchedulerConfiguration profiles: - schedulerName: multipoint-scheduler @@ -255,7 +256,7 @@ This would equate to manually enabling `MyPlugin` for all of its extension points, like so: ```yaml -apiVersion: kubescheduler.config.k8s.io/v1beta3 +apiVersion: kubescheduler.config.k8s.io/v1 kind: KubeSchedulerConfiguration profiles: - schedulerName: non-multipoint-scheduler @@ -284,7 +285,7 @@ plugins, non-default plugins, or with the wildcard (`'*'`) to disable all plugin An example of this, disabling `Score` and `PreScore`, would be: ```yaml -apiVersion: kubescheduler.config.k8s.io/v1beta3 +apiVersion: kubescheduler.config.k8s.io/v1 kind: KubeSchedulerConfiguration profiles: - schedulerName: non-multipoint-scheduler @@ -300,14 +301,15 @@ profiles: - name: '*' ``` -In `v1beta3`, all [default plugins](#scheduling-plugins) are enabled internally through `MultiPoint`. +Starting from `kubescheduler.config.k8s.io/v1beta3`, all [default plugins](#scheduling-plugins) +are enabled internally through `MultiPoint`. However, individual extension points are still available to allow flexible reconfiguration of the default values (such as ordering and Score weights). For example, consider two Score plugins `DefaultScore1` and `DefaultScore2`, each with a weight of `1`. They can be reordered with different weights like so: ```yaml -apiVersion: kubescheduler.config.k8s.io/v1beta3 +apiVersion: kubescheduler.config.k8s.io/v1 kind: KubeSchedulerConfiguration profiles: - schedulerName: multipoint-scheduler @@ -342,7 +344,7 @@ To demonstrate the above hierarchy, the following example is based on these plug A valid sample configuration for these plugins would be: ```yaml -apiVersion: kubescheduler.config.k8s.io/v1beta3 +apiVersion: kubescheduler.config.k8s.io/v1 kind: KubeSchedulerConfiguration profiles: - schedulerName: multipoint-scheduler @@ -451,6 +453,11 @@ as well as its seamless integration with the existing methods for configuring ex * `NodeAffinity` from 1 to 2 * `TaintToleration` from 1 to 3 {{% /tab %}} + +{{% tab name="v1beta3 → v1" %}} +* The scheduler plugin `SelectorSpread` is removed, instead, use the `PodTopologySpread` plugin(enabled by default) +to achieve similar behavior. +{{% /tab %}} {{< /tabs >}} ## {{% heading "whatsnext" %}} @@ -459,3 +466,4 @@ as well as its seamless integration with the existing methods for configuring ex * Learn about [scheduling](/docs/concepts/scheduling-eviction/kube-scheduler/) * Read the [kube-scheduler configuration (v1beta2)](/docs/reference/config-api/kube-scheduler-config.v1beta2/) reference * Read the [kube-scheduler configuration (v1beta3)](/docs/reference/config-api/kube-scheduler-config.v1beta3/) reference +* Read the [kube-scheduler configuration (v1)](/docs/reference/config-api/kube-scheduler-config.v1/) reference From e45b10af3ae6d1f8c7a6ff91f6af4c124e2d5c47 Mon Sep 17 00:00:00 2001 From: kerthcet Date: Wed, 3 Aug 2022 15:49:57 +0800 Subject: [PATCH 28/77] Graduate Component Config in kube-scheduler to GA Signed-off-by: kerthcet --- .../config-api/kube-scheduler-config.v1.md | 1384 ----------------- .../en/docs/reference/scheduling/policies.md | 3 +- 2 files changed, 1 insertion(+), 1386 deletions(-) delete mode 100644 content/en/docs/reference/config-api/kube-scheduler-config.v1.md diff --git a/content/en/docs/reference/config-api/kube-scheduler-config.v1.md b/content/en/docs/reference/config-api/kube-scheduler-config.v1.md deleted file mode 100644 index a2fb331d9ecfa..0000000000000 --- a/content/en/docs/reference/config-api/kube-scheduler-config.v1.md +++ /dev/null @@ -1,1384 +0,0 @@ ---- -title: kube-scheduler Configuration (v1) -content_type: tool-reference -package: kubescheduler.config.k8s.io/v1 -auto_generated: true ---- - - -## Resource Types - - -- [DefaultPreemptionArgs](#kubescheduler-config-k8s-io-v1-DefaultPreemptionArgs) -- [InterPodAffinityArgs](#kubescheduler-config-k8s-io-v1-InterPodAffinityArgs) -- [KubeSchedulerConfiguration](#kubescheduler-config-k8s-io-v1-KubeSchedulerConfiguration) -- [NodeAffinityArgs](#kubescheduler-config-k8s-io-v1-NodeAffinityArgs) -- [NodeResourcesBalancedAllocationArgs](#kubescheduler-config-k8s-io-v1-NodeResourcesBalancedAllocationArgs) -- [NodeResourcesFitArgs](#kubescheduler-config-k8s-io-v1-NodeResourcesFitArgs) -- [PodTopologySpreadArgs](#kubescheduler-config-k8s-io-v1-PodTopologySpreadArgs) -- [VolumeBindingArgs](#kubescheduler-config-k8s-io-v1-VolumeBindingArgs) - - - -## `ClientConnectionConfiguration` {#ClientConnectionConfiguration} - - -**Appears in:** - -- [KubeSchedulerConfiguration](#kubescheduler-config-k8s-io-v1beta3-KubeSchedulerConfiguration) - -- [KubeSchedulerConfiguration](#kubescheduler-config-k8s-io-v1-KubeSchedulerConfiguration) - - -

ClientConnectionConfiguration contains details for constructing a client.

- - - - - - - - - - - - - - - - - - - - - - - -
FieldDescription
kubeconfig [Required]
-string -
-

kubeconfig is the path to a KubeConfig file.

-
acceptContentTypes [Required]
-string -
-

acceptContentTypes defines the Accept header sent by clients when connecting to a server, overriding the -default value of 'application/json'. This field will control all connections to the server used by a particular -client.

-
contentType [Required]
-string -
-

contentType is the content type used when sending data to the server from this client.

-
qps [Required]
-float32 -
-

qps controls the number of queries per second allowed for this connection.

-
burst [Required]
-int32 -
-

burst allows extra queries to accumulate when a client is exceeding its rate.

-
- -## `DebuggingConfiguration` {#DebuggingConfiguration} - - -**Appears in:** - -- [KubeSchedulerConfiguration](#kubescheduler-config-k8s-io-v1beta3-KubeSchedulerConfiguration) -- [KubeSchedulerConfiguration](#kubescheduler-config-k8s-io-v1-KubeSchedulerConfiguration) - - - -

DebuggingConfiguration holds configuration for Debugging related features.

- - - - - - - - - - - - - - -
FieldDescription
enableProfiling [Required]
-bool -
-

enableProfiling enables profiling via web interface host:port/debug/pprof/

-
enableContentionProfiling [Required]
-bool -
-

enableContentionProfiling enables lock contention profiling, if -enableProfiling is true.

-
- -## `FormatOptions` {#FormatOptions} - - -**Appears in:** - -- [LoggingConfiguration](#LoggingConfiguration) - - -

FormatOptions contains options for the different logging formats.

- - - - - - - - - - - -
FieldDescription
json [Required]
-JSONOptions -
-

[Experimental] JSON contains options for logging format "json".

-
- -## `JSONOptions` {#JSONOptions} - - -**Appears in:** - -- [FormatOptions](#FormatOptions) - - -

JSONOptions contains options for logging format "json".

- - - - - - - - - - - - - - -
FieldDescription
splitStream [Required]
-bool -
-

[Experimental] SplitStream redirects error messages to stderr while -info messages go to stdout, with buffering. The default is to write -both to stdout, without buffering.

-
infoBufferSize [Required]
-k8s.io/apimachinery/pkg/api/resource.QuantityValue -
-

[Experimental] InfoBufferSize sets the size of the info stream when -using split streams. The default is zero, which disables buffering.

-
- -## `LeaderElectionConfiguration` {#LeaderElectionConfiguration} - - -**Appears in:** - -- [KubeSchedulerConfiguration](#kubescheduler-config-k8s-io-v1beta3-KubeSchedulerConfiguration) - -- [KubeSchedulerConfiguration](#kubescheduler-config-k8s-io-v1-KubeSchedulerConfiguration) - - -

LeaderElectionConfiguration defines the configuration of leader election -clients for components that can run with leader election enabled.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
FieldDescription
leaderElect [Required]
-bool -
-

leaderElect enables a leader election client to gain leadership -before executing the main loop. Enable this when running replicated -components for high availability.

-
leaseDuration [Required]
-meta/v1.Duration -
-

leaseDuration is the duration that non-leader candidates will wait -after observing a leadership renewal until attempting to acquire -leadership of a led but unrenewed leader slot. This is effectively the -maximum duration that a leader can be stopped before it is replaced -by another candidate. This is only applicable if leader election is -enabled.

-
renewDeadline [Required]
-meta/v1.Duration -
-

renewDeadline is the interval between attempts by the acting master to -renew a leadership slot before it stops leading. This must be less -than or equal to the lease duration. This is only applicable if leader -election is enabled.

-
retryPeriod [Required]
-meta/v1.Duration -
-

retryPeriod is the duration the clients should wait between attempting -acquisition and renewal of a leadership. This is only applicable if -leader election is enabled.

-
resourceLock [Required]
-string -
-

resourceLock indicates the resource object type that will be used to lock -during leader election cycles.

-
resourceName [Required]
-string -
-

resourceName indicates the name of resource object that will be used to lock -during leader election cycles.

-
resourceNamespace [Required]
-string -
-

resourceName indicates the namespace of resource object that will be used to lock -during leader election cycles.

-
- -## `LoggingConfiguration` {#LoggingConfiguration} - - -**Appears in:** - -- [KubeletConfiguration](#kubelet-config-k8s-io-v1beta1-KubeletConfiguration) - - -

LoggingConfiguration contains logging options -Refer Logs Options for more information.

- - - - - - - - - - - - - - - - - - - - - - - -
FieldDescription
format [Required]
-string -
-

Format Flag specifies the structure of log messages. -default value of format is text

-
flushFrequency [Required]
-time.Duration -
-

Maximum number of nanoseconds (i.e. 1s = 1000000000) between log -flushes. Ignored if the selected logging backend writes log -messages without buffering.

-
verbosity [Required]
-uint32 -
-

Verbosity is the threshold that determines which log messages are -logged. Default is zero which logs only the most important -messages. Higher values enable additional messages. Error messages -are always logged.

-
vmodule [Required]
-VModuleConfiguration -
-

VModule overrides the verbosity threshold for individual files. -Only supported for "text" log format.

-
options [Required]
-FormatOptions -
-

[Experimental] Options holds additional parameters that are specific -to the different logging formats. Only the options for the selected -format get used, but all of them get validated.

-
- -## `VModuleConfiguration` {#VModuleConfiguration} - -(Alias of `[]k8s.io/component-base/config/v1alpha1.VModuleItem`) - -**Appears in:** - -- [LoggingConfiguration](#LoggingConfiguration) - - -

VModuleConfiguration is a collection of individual file names or patterns -and the corresponding verbosity threshold.

- - - - - - -## `DefaultPreemptionArgs` {#kubescheduler-config-k8s-io-v1-DefaultPreemptionArgs} - - - -

DefaultPreemptionArgs holds arguments used to configure the -DefaultPreemption plugin.

- - - - - - - - - - - - - - - - - -
FieldDescription
apiVersion
string
kubescheduler.config.k8s.io/v1
kind
string
DefaultPreemptionArgs
minCandidateNodesPercentage [Required]
-int32 -
-

MinCandidateNodesPercentage is the minimum number of candidates to -shortlist when dry running preemption as a percentage of number of nodes. -Must be in the range [0, 100]. Defaults to 10% of the cluster size if -unspecified.

-
minCandidateNodesAbsolute [Required]
-int32 -
-

MinCandidateNodesAbsolute is the absolute minimum number of candidates to -shortlist. The likely number of candidates enumerated for dry running -preemption is given by the formula: -numCandidates = max(numNodes * minCandidateNodesPercentage, minCandidateNodesAbsolute) -We say "likely" because there are other factors such as PDB violations -that play a role in the number of candidates shortlisted. Must be at least -0 nodes. Defaults to 100 nodes if unspecified.

-
- -## `InterPodAffinityArgs` {#kubescheduler-config-k8s-io-v1-InterPodAffinityArgs} - - - -

InterPodAffinityArgs holds arguments used to configure the InterPodAffinity plugin.

- - - - - - - - - - - - - - -
FieldDescription
apiVersion
string
kubescheduler.config.k8s.io/v1
kind
string
InterPodAffinityArgs
hardPodAffinityWeight [Required]
-int32 -
-

HardPodAffinityWeight is the scoring weight for existing pods with a -matching hard affinity to the incoming pod.

-
- -## `KubeSchedulerConfiguration` {#kubescheduler-config-k8s-io-v1-KubeSchedulerConfiguration} - - - -

KubeSchedulerConfiguration configures a scheduler

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
FieldDescription
apiVersion
string
kubescheduler.config.k8s.io/v1
kind
string
KubeSchedulerConfiguration
parallelism [Required]
-int32 -
-

Parallelism defines the amount of parallelism in algorithms for scheduling a Pods. Must be greater than 0. Defaults to 16

-
leaderElection [Required]
-LeaderElectionConfiguration -
-

LeaderElection defines the configuration of leader election client.

-
clientConnection [Required]
-ClientConnectionConfiguration -
-

ClientConnection specifies the kubeconfig file and client connection -settings for the proxy server to use when communicating with the apiserver.

-
DebuggingConfiguration [Required]
-DebuggingConfiguration -
(Members of DebuggingConfiguration are embedded into this type.) -

DebuggingConfiguration holds configuration for Debugging related features -TODO: We might wanna make this a substruct like Debugging componentbaseconfigv1alpha1.DebuggingConfiguration

-
percentageOfNodesToScore [Required]
-int32 -
-

PercentageOfNodesToScore is the percentage of all nodes that once found feasible -for running a pod, the scheduler stops its search for more feasible nodes in -the cluster. This helps improve scheduler's performance. Scheduler always tries to find -at least "minFeasibleNodesToFind" feasible nodes no matter what the value of this flag is. -Example: if the cluster size is 500 nodes and the value of this flag is 30, -then scheduler stops finding further feasible nodes once it finds 150 feasible ones. -When the value is 0, default percentage (5%--50% based on the size of the cluster) of the -nodes will be scored.

-
podInitialBackoffSeconds [Required]
-int64 -
-

PodInitialBackoffSeconds is the initial backoff for unschedulable pods. -If specified, it must be greater than 0. If this value is null, the default value (1s) -will be used.

-
podMaxBackoffSeconds [Required]
-int64 -
-

PodMaxBackoffSeconds is the max backoff for unschedulable pods. -If specified, it must be greater than podInitialBackoffSeconds. If this value is null, -the default value (10s) will be used.

-
profiles [Required]
-[]KubeSchedulerProfile -
-

Profiles are scheduling profiles that kube-scheduler supports. Pods can -choose to be scheduled under a particular profile by setting its associated -scheduler name. Pods that don't specify any scheduler name are scheduled -with the "default-scheduler" profile, if present here.

-
extenders [Required]
-[]Extender -
-

Extenders are the list of scheduler extenders, each holding the values of how to communicate -with the extender. These extenders are shared by all scheduler profiles.

-
- -## `NodeAffinityArgs` {#kubescheduler-config-k8s-io-v1-NodeAffinityArgs} - - - -

NodeAffinityArgs holds arguments to configure the NodeAffinity plugin.

- - - - - - - - - - - - - - -
FieldDescription
apiVersion
string
kubescheduler.config.k8s.io/v1
kind
string
NodeAffinityArgs
addedAffinity
-core/v1.NodeAffinity -
-

AddedAffinity is applied to all Pods additionally to the NodeAffinity -specified in the PodSpec. That is, Nodes need to satisfy AddedAffinity -AND .spec.NodeAffinity. AddedAffinity is empty by default (all Nodes -match). -When AddedAffinity is used, some Pods with affinity requirements that match -a specific Node (such as Daemonset Pods) might remain unschedulable.

-
- -## `NodeResourcesBalancedAllocationArgs` {#kubescheduler-config-k8s-io-v1-NodeResourcesBalancedAllocationArgs} - - - -

NodeResourcesBalancedAllocationArgs holds arguments used to configure NodeResourcesBalancedAllocation plugin.

- - - - - - - - - - - - - - -
FieldDescription
apiVersion
string
kubescheduler.config.k8s.io/v1
kind
string
NodeResourcesBalancedAllocationArgs
resources [Required]
-[]ResourceSpec -
-

Resources to be managed, the default is "cpu" and "memory" if not specified.

-
- -## `NodeResourcesFitArgs` {#kubescheduler-config-k8s-io-v1-NodeResourcesFitArgs} - - - -

NodeResourcesFitArgs holds arguments used to configure the NodeResourcesFit plugin.

- - - - - - - - - - - - - - - - - - - - -
FieldDescription
apiVersion
string
kubescheduler.config.k8s.io/v1
kind
string
NodeResourcesFitArgs
ignoredResources [Required]
-[]string -
-

IgnoredResources is the list of resources that NodeResources fit filter -should ignore. This doesn't apply to scoring.

-
ignoredResourceGroups [Required]
-[]string -
-

IgnoredResourceGroups defines the list of resource groups that NodeResources fit filter should ignore. -e.g. if group is ["example.com"], it will ignore all resource names that begin -with "example.com", such as "example.com/aaa" and "example.com/bbb". -A resource group name can't contain '/'. This doesn't apply to scoring.

-
scoringStrategy [Required]
-ScoringStrategy -
-

ScoringStrategy selects the node resource scoring strategy. -The default strategy is LeastAllocated with an equal "cpu" and "memory" weight.

-
- -## `PodTopologySpreadArgs` {#kubescheduler-config-k8s-io-v1-PodTopologySpreadArgs} - - - -

PodTopologySpreadArgs holds arguments used to configure the PodTopologySpread plugin.

- - - - - - - - - - - - - - - - - -
FieldDescription
apiVersion
string
kubescheduler.config.k8s.io/v1
kind
string
PodTopologySpreadArgs
defaultConstraints
-[]core/v1.TopologySpreadConstraint -
-

DefaultConstraints defines topology spread constraints to be applied to -Pods that don't define any in pod.spec.topologySpreadConstraints. -.defaultConstraints[*].labelSelectors must be empty, as they are -deduced from the Pod's membership to Services, ReplicationControllers, -ReplicaSets or StatefulSets. -When not empty, .defaultingType must be "List".

-
defaultingType
-PodTopologySpreadConstraintsDefaulting -
-

DefaultingType determines how .defaultConstraints are deduced. Can be one -of "System" or "List".

-
    -
  • "System": Use kubernetes defined constraints that spread Pods among -Nodes and Zones.
  • -
  • "List": Use constraints defined in .defaultConstraints.
  • -
-

Defaults to "System".

-
- -## `VolumeBindingArgs` {#kubescheduler-config-k8s-io-v1-VolumeBindingArgs} - - - -

VolumeBindingArgs holds arguments used to configure the VolumeBinding plugin.

- - - - - - - - - - - - - - - - - -
FieldDescription
apiVersion
string
kubescheduler.config.k8s.io/v1
kind
string
VolumeBindingArgs
bindTimeoutSeconds [Required]
-int64 -
-

BindTimeoutSeconds is the timeout in seconds in volume binding operation. -Value must be non-negative integer. The value zero indicates no waiting. -If this value is nil, the default value (600) will be used.

-
shape
-[]UtilizationShapePoint -
-

Shape specifies the points defining the score function shape, which is -used to score nodes based on the utilization of statically provisioned -PVs. The utilization is calculated by dividing the total requested -storage of the pod by the total capacity of feasible PVs on each node. -Each point contains utilization (ranges from 0 to 100) and its -associated score (ranges from 0 to 10). You can turn the priority by -specifying different scores for different utilization numbers. -The default shape points are:

-
    -
  1. 0 for 0 utilization
  2. -
  3. 10 for 100 utilization -All points must be sorted in increasing order by utilization.
  4. -
-
- -## `Extender` {#kubescheduler-config-k8s-io-v1-Extender} - - -**Appears in:** - -- [KubeSchedulerConfiguration](#kubescheduler-config-k8s-io-v1-KubeSchedulerConfiguration) - - -

Extender holds the parameters used to communicate with the extender. If a verb is unspecified/empty, -it is assumed that the extender chose not to provide that extension.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
FieldDescription
urlPrefix [Required]
-string -
-

URLPrefix at which the extender is available

-
filterVerb [Required]
-string -
-

Verb for the filter call, empty if not supported. This verb is appended to the URLPrefix when issuing the filter call to extender.

-
preemptVerb [Required]
-string -
-

Verb for the preempt call, empty if not supported. This verb is appended to the URLPrefix when issuing the preempt call to extender.

-
prioritizeVerb [Required]
-string -
-

Verb for the prioritize call, empty if not supported. This verb is appended to the URLPrefix when issuing the prioritize call to extender.

-
weight [Required]
-int64 -
-

The numeric multiplier for the node scores that the prioritize call generates. -The weight should be a positive integer

-
bindVerb [Required]
-string -
-

Verb for the bind call, empty if not supported. This verb is appended to the URLPrefix when issuing the bind call to extender. -If this method is implemented by the extender, it is the extender's responsibility to bind the pod to apiserver. Only one extender -can implement this function.

-
enableHTTPS [Required]
-bool -
-

EnableHTTPS specifies whether https should be used to communicate with the extender

-
tlsConfig [Required]
-ExtenderTLSConfig -
-

TLSConfig specifies the transport layer security config

-
httpTimeout [Required]
-meta/v1.Duration -
-

HTTPTimeout specifies the timeout duration for a call to the extender. Filter timeout fails the scheduling of the pod. Prioritize -timeout is ignored, k8s/other extenders priorities are used to select the node.

-
nodeCacheCapable [Required]
-bool -
-

NodeCacheCapable specifies that the extender is capable of caching node information, -so the scheduler should only send minimal information about the eligible nodes -assuming that the extender already cached full details of all nodes in the cluster

-
managedResources
-[]ExtenderManagedResource -
-

ManagedResources is a list of extended resources that are managed by -this extender.

-
    -
  • A pod will be sent to the extender on the Filter, Prioritize and Bind -(if the extender is the binder) phases iff the pod requests at least -one of the extended resources in this list. If empty or unspecified, -all pods will be sent to this extender.
  • -
  • If IgnoredByScheduler is set to true for a resource, kube-scheduler -will skip checking the resource in predicates.
  • -
-
ignorable [Required]
-bool -
-

Ignorable specifies if the extender is ignorable, i.e. scheduling should not -fail when the extender returns an error or is not reachable.

-
- -## `ExtenderManagedResource` {#kubescheduler-config-k8s-io-v1-ExtenderManagedResource} - - -**Appears in:** - -- [Extender](#kubescheduler-config-k8s-io-v1-Extender) - - -

ExtenderManagedResource describes the arguments of extended resources -managed by an extender.

- - - - - - - - - - - - - - -
FieldDescription
name [Required]
-string -
-

Name is the extended resource name.

-
ignoredByScheduler [Required]
-bool -
-

IgnoredByScheduler indicates whether kube-scheduler should ignore this -resource when applying predicates.

-
- -## `ExtenderTLSConfig` {#kubescheduler-config-k8s-io-v1-ExtenderTLSConfig} - - -**Appears in:** - -- [Extender](#kubescheduler-config-k8s-io-v1-Extender) - - -

ExtenderTLSConfig contains settings to enable TLS with extender

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
FieldDescription
insecure [Required]
-bool -
-

Server should be accessed without verifying the TLS certificate. For testing only.

-
serverName [Required]
-string -
-

ServerName is passed to the server for SNI and is used in the client to check server -certificates against. If ServerName is empty, the hostname used to contact the -server is used.

-
certFile [Required]
-string -
-

Server requires TLS client certificate authentication

-
keyFile [Required]
-string -
-

Server requires TLS client certificate authentication

-
caFile [Required]
-string -
-

Trusted root certificates for server

-
certData [Required]
-[]byte -
-

CertData holds PEM-encoded bytes (typically read from a client certificate file). -CertData takes precedence over CertFile

-
keyData [Required]
-[]byte -
-

KeyData holds PEM-encoded bytes (typically read from a client certificate key file). -KeyData takes precedence over KeyFile

-
caData [Required]
-[]byte -
-

CAData holds PEM-encoded bytes (typically read from a root certificates bundle). -CAData takes precedence over CAFile

-
- -## `KubeSchedulerProfile` {#kubescheduler-config-k8s-io-v1-KubeSchedulerProfile} - - -**Appears in:** - -- [KubeSchedulerConfiguration](#kubescheduler-config-k8s-io-v1-KubeSchedulerConfiguration) - - -

KubeSchedulerProfile is a scheduling profile.

- - - - - - - - - - - - - - - - - -
FieldDescription
schedulerName [Required]
-string -
-

SchedulerName is the name of the scheduler associated to this profile. -If SchedulerName matches with the pod's "spec.schedulerName", then the pod -is scheduled with this profile.

-
plugins [Required]
-Plugins -
-

Plugins specify the set of plugins that should be enabled or disabled. -Enabled plugins are the ones that should be enabled in addition to the -default plugins. Disabled plugins are any of the default plugins that -should be disabled. -When no enabled or disabled plugin is specified for an extension point, -default plugins for that extension point will be used if there is any. -If a QueueSort plugin is specified, the same QueueSort Plugin and -PluginConfig must be specified for all profiles.

-
pluginConfig [Required]
-[]PluginConfig -
-

PluginConfig is an optional set of custom plugin arguments for each plugin. -Omitting config args for a plugin is equivalent to using the default config -for that plugin.

-
- -## `Plugin` {#kubescheduler-config-k8s-io-v1-Plugin} - - -**Appears in:** - -- [PluginSet](#kubescheduler-config-k8s-io-v1-PluginSet) - - -

Plugin specifies a plugin name and its weight when applicable. Weight is used only for Score plugins.

- - - - - - - - - - - - - - -
FieldDescription
name [Required]
-string -
-

Name defines the name of plugin

-
weight [Required]
-int32 -
-

Weight defines the weight of plugin, only used for Score plugins.

-
- -## `PluginConfig` {#kubescheduler-config-k8s-io-v1-PluginConfig} - - -**Appears in:** - -- [KubeSchedulerProfile](#kubescheduler-config-k8s-io-v1-KubeSchedulerProfile) - - -

PluginConfig specifies arguments that should be passed to a plugin at the time of initialization. -A plugin that is invoked at multiple extension points is initialized once. Args can have arbitrary structure. -It is up to the plugin to process these Args.

- - - - - - - - - - - - - - -
FieldDescription
name [Required]
-string -
-

Name defines the name of plugin being configured

-
args [Required]
-k8s.io/apimachinery/pkg/runtime.RawExtension -
-

Args defines the arguments passed to the plugins at the time of initialization. Args can have arbitrary structure.

-
- -## `PluginSet` {#kubescheduler-config-k8s-io-v1-PluginSet} - - -**Appears in:** - -- [Plugins](#kubescheduler-config-k8s-io-v1-Plugins) - - -

PluginSet specifies enabled and disabled plugins for an extension point. -If an array is empty, missing, or nil, default plugins at that extension point will be used.

- - - - - - - - - - - - - - -
FieldDescription
enabled [Required]
-[]Plugin -
-

Enabled specifies plugins that should be enabled in addition to default plugins. -If the default plugin is also configured in the scheduler config file, the weight of plugin will -be overridden accordingly. -These are called after default plugins and in the same order specified here.

-
disabled [Required]
-[]Plugin -
-

Disabled specifies default plugins that should be disabled. -When all default plugins need to be disabled, an array containing only one "*" should be provided.

-
- -## `Plugins` {#kubescheduler-config-k8s-io-v1-Plugins} - - -**Appears in:** - -- [KubeSchedulerProfile](#kubescheduler-config-k8s-io-v1-KubeSchedulerProfile) - - -

Plugins include multiple extension points. When specified, the list of plugins for -a particular extension point are the only ones enabled. If an extension point is -omitted from the config, then the default set of plugins is used for that extension point. -Enabled plugins are called in the order specified here, after default plugins. If they need to -be invoked before default plugins, default plugins must be disabled and re-enabled here in desired order.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
FieldDescription
queueSort [Required]
-PluginSet -
-

QueueSort is a list of plugins that should be invoked when sorting pods in the scheduling queue.

-
preFilter [Required]
-PluginSet -
-

PreFilter is a list of plugins that should be invoked at "PreFilter" extension point of the scheduling framework.

-
filter [Required]
-PluginSet -
-

Filter is a list of plugins that should be invoked when filtering out nodes that cannot run the Pod.

-
postFilter [Required]
-PluginSet -
-

PostFilter is a list of plugins that are invoked after filtering phase, but only when no feasible nodes were found for the pod.

-
preScore [Required]
-PluginSet -
-

PreScore is a list of plugins that are invoked before scoring.

-
score [Required]
-PluginSet -
-

Score is a list of plugins that should be invoked when ranking nodes that have passed the filtering phase.

-
reserve [Required]
-PluginSet -
-

Reserve is a list of plugins invoked when reserving/unreserving resources -after a node is assigned to run the pod.

-
permit [Required]
-PluginSet -
-

Permit is a list of plugins that control binding of a Pod. These plugins can prevent or delay binding of a Pod.

-
preBind [Required]
-PluginSet -
-

PreBind is a list of plugins that should be invoked before a pod is bound.

-
bind [Required]
-PluginSet -
-

Bind is a list of plugins that should be invoked at "Bind" extension point of the scheduling framework. -The scheduler call these plugins in order. Scheduler skips the rest of these plugins as soon as one returns success.

-
postBind [Required]
-PluginSet -
-

PostBind is a list of plugins that should be invoked after a pod is successfully bound.

-
multiPoint [Required]
-PluginSet -
-

MultiPoint is a simplified config section to enable plugins for all valid extension points. -Plugins enabled through MultiPoint will automatically register for every individual extension -point the plugin has implemented. Disabling a plugin through MultiPoint disables that behavior. -The same is true for disabling "*" through MultiPoint (no default plugins will be automatically registered). -Plugins can still be disabled through their individual extension points.

-

In terms of precedence, plugin config follows this basic hierarchy

-
    -
  1. Specific extension points
  2. -
  3. Explicitly configured MultiPoint plugins
  4. -
  5. The set of default plugins, as MultiPoint plugins -This implies that a higher precedence plugin will run first and overwrite any settings within MultiPoint. -Explicitly user-configured plugins also take a higher precedence over default plugins. -Within this hierarchy, an Enabled setting takes precedence over Disabled. For example, if a plugin is -set in both multiPoint.Enabled and multiPoint.Disabled, the plugin will be enabled. Similarly, -including multiPoint.Disabled = '*' and multiPoint.Enabled = pluginA will still register that specific -plugin through MultiPoint. This follows the same behavior as all other extension point configurations.
  6. -
-
- -## `PodTopologySpreadConstraintsDefaulting` {#kubescheduler-config-k8s-io-v1-PodTopologySpreadConstraintsDefaulting} - -(Alias of `string`) - -**Appears in:** - -- [PodTopologySpreadArgs](#kubescheduler-config-k8s-io-v1-PodTopologySpreadArgs) - - -

PodTopologySpreadConstraintsDefaulting defines how to set default constraints -for the PodTopologySpread plugin.

- - - - -## `RequestedToCapacityRatioParam` {#kubescheduler-config-k8s-io-v1-RequestedToCapacityRatioParam} - - -**Appears in:** - -- [ScoringStrategy](#kubescheduler-config-k8s-io-v1-ScoringStrategy) - - -

RequestedToCapacityRatioParam define RequestedToCapacityRatio parameters

- - - - - - - - - - - -
FieldDescription
shape [Required]
-[]UtilizationShapePoint -
-

Shape is a list of points defining the scoring function shape.

-
- -## `ResourceSpec` {#kubescheduler-config-k8s-io-v1-ResourceSpec} - - -**Appears in:** - -- [NodeResourcesBalancedAllocationArgs](#kubescheduler-config-k8s-io-v1-NodeResourcesBalancedAllocationArgs) - -- [ScoringStrategy](#kubescheduler-config-k8s-io-v1-ScoringStrategy) - - -

ResourceSpec represents a single resource.

- - - - - - - - - - - - - - -
FieldDescription
name [Required]
-string -
-

Name of the resource.

-
weight [Required]
-int64 -
-

Weight of the resource.

-
- -## `ScoringStrategy` {#kubescheduler-config-k8s-io-v1-ScoringStrategy} - - -**Appears in:** - -- [NodeResourcesFitArgs](#kubescheduler-config-k8s-io-v1-NodeResourcesFitArgs) - - -

ScoringStrategy define ScoringStrategyType for node resource plugin

- - - - - - - - - - - - - - - - - -
FieldDescription
type [Required]
-ScoringStrategyType -
-

Type selects which strategy to run.

-
resources [Required]
-[]ResourceSpec -
-

Resources to consider when scoring. -The default resource set includes "cpu" and "memory" with an equal weight. -Allowed weights go from 1 to 100. -Weight defaults to 1 if not specified or explicitly set to 0.

-
requestedToCapacityRatio [Required]
-RequestedToCapacityRatioParam -
-

Arguments specific to RequestedToCapacityRatio strategy.

-
- -## `ScoringStrategyType` {#kubescheduler-config-k8s-io-v1-ScoringStrategyType} - -(Alias of `string`) - -**Appears in:** - -- [ScoringStrategy](#kubescheduler-config-k8s-io-v1-ScoringStrategy) - - -

ScoringStrategyType the type of scoring strategy used in NodeResourcesFit plugin.

- - - - -## `UtilizationShapePoint` {#kubescheduler-config-k8s-io-v1-UtilizationShapePoint} - - -**Appears in:** - -- [VolumeBindingArgs](#kubescheduler-config-k8s-io-v1-VolumeBindingArgs) - -- [RequestedToCapacityRatioParam](#kubescheduler-config-k8s-io-v1-RequestedToCapacityRatioParam) - - -

UtilizationShapePoint represents single point of priority function shape.

- - - - - - - - - - - - - - -
FieldDescription
utilization [Required]
-int32 -
-

Utilization (x axis). Valid values are 0 to 100. Fully utilized node maps to 100.

-
score [Required]
-int32 -
-

Score assigned to given utilization (y axis). Valid values are 0 to 10.

-
diff --git a/content/en/docs/reference/scheduling/policies.md b/content/en/docs/reference/scheduling/policies.md index 5a28b41769112..d64d5f1bf58df 100644 --- a/content/en/docs/reference/scheduling/policies.md +++ b/content/en/docs/reference/scheduling/policies.md @@ -16,5 +16,4 @@ This scheduling policy is not supported since Kubernetes v1.23. Associated flags * Learn about [scheduling](/docs/concepts/scheduling-eviction/kube-scheduler/) * Learn about [kube-scheduler Configuration](/docs/reference/scheduling/config/) -* Read the [kube-scheduler configuration reference (v1beta3)](/docs/reference/config-api/kube-scheduler-config.v1beta3/) - +* Read the [kube-scheduler configuration reference (v1)](/docs/reference/config-api/kube-scheduler-config.v1/) From 3ad2a7885bdd0b1a7cf760a6bb3d9fe8a6cd1fa5 Mon Sep 17 00:00:00 2001 From: Lee Verberne Date: Fri, 29 Jul 2022 15:29:15 +0200 Subject: [PATCH 29/77] Mark EphemeralContainers as GA in 1.25 Co-authored-by: Tim Bannister --- .../en/docs/concepts/workloads/pods/ephemeral-containers.md | 2 +- .../reference/command-line-tools-reference/feature-gates.md | 5 +++-- .../docs/tasks/debug/debug-application/debug-running-pod.md | 2 +- 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/content/en/docs/concepts/workloads/pods/ephemeral-containers.md b/content/en/docs/concepts/workloads/pods/ephemeral-containers.md index 0a70fedd6f2bb..cc894debafd8c 100644 --- a/content/en/docs/concepts/workloads/pods/ephemeral-containers.md +++ b/content/en/docs/concepts/workloads/pods/ephemeral-containers.md @@ -9,7 +9,7 @@ weight: 80 -{{< feature-state state="beta" for_k8s_version="v1.23" >}} +{{< feature-state state="stable" for_k8s_version="v1.25" >}} This page provides an overview of ephemeral containers: a special type of container that runs temporarily in an existing {{< glossary_tooltip term_id="pod" >}} to diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 0771736f4fa4c..60ac46e0ad3e8 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -105,8 +105,6 @@ different Kubernetes components. | `DownwardAPIHugePages` | `true` | Beta | 1.22 | | | `EndpointSliceTerminatingCondition` | `false` | Alpha | 1.20 | 1.21 | | `EndpointSliceTerminatingCondition` | `true` | Beta | 1.22 | | -| `EphemeralContainers` | `false` | Alpha | 1.16 | 1.22 | -| `EphemeralContainers` | `true` | Beta | 1.23 | | | `ExpandedDNSConfig` | `false` | Alpha | 1.22 | | | `ExperimentalHostUserNamespaceDefaulting` | `false` | Beta | 1.5 | | | `GracefulNodeShutdown` | `false` | Alpha | 1.20 | 1.20 | @@ -334,6 +332,9 @@ different Kubernetes components. | `EndpointSliceProxying` | `false` | Alpha | 1.18 | 1.18 | | `EndpointSliceProxying` | `true` | Beta | 1.19 | 1.21 | | `EndpointSliceProxying` | `true` | GA | 1.22 | - | +| `EphemeralContainers` | `false` | Alpha | 1.16 | 1.22 | +| `EphemeralContainers` | `true` | Beta | 1.23 | 1.24 | +| `EphemeralContainers` | `true` | GA | 1.25 | - | | `EvenPodsSpread` | `false` | Alpha | 1.16 | 1.17 | | `EvenPodsSpread` | `true` | Beta | 1.18 | 1.18 | | `EvenPodsSpread` | `true` | GA | 1.19 | - | diff --git a/content/en/docs/tasks/debug/debug-application/debug-running-pod.md b/content/en/docs/tasks/debug/debug-application/debug-running-pod.md index a810c60efad26..34da36d33a04c 100644 --- a/content/en/docs/tasks/debug/debug-application/debug-running-pod.md +++ b/content/en/docs/tasks/debug/debug-application/debug-running-pod.md @@ -378,7 +378,7 @@ For more details, see [Get a Shell to a Running Container]( ## Debugging with an ephemeral debug container {#ephemeral-container} -{{< feature-state state="beta" for_k8s_version="v1.23" >}} +{{< feature-state state="stable" for_k8s_version="v1.25" >}} {{< glossary_tooltip text="Ephemeral containers" term_id="ephemeral-container" >}} are useful for interactive troubleshooting when `kubectl exec` is insufficient From 9d7efb1a734f621f743a8afbd19f8fed5be993bc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Filip=20K=C5=99epinsk=C3=BD?= Date: Fri, 29 Jul 2022 10:14:06 +0200 Subject: [PATCH 30/77] Promote StatefulSet MinReadySeconds to GA --- .../concepts/workloads/controllers/statefulset.md | 12 +++++------- .../command-line-tools-reference/feature-gates.md | 5 +++-- 2 files changed, 8 insertions(+), 9 deletions(-) diff --git a/content/en/docs/concepts/workloads/controllers/statefulset.md b/content/en/docs/concepts/workloads/controllers/statefulset.md index 1687399abd376..5c7270dc28f80 100644 --- a/content/en/docs/concepts/workloads/controllers/statefulset.md +++ b/content/en/docs/concepts/workloads/controllers/statefulset.md @@ -138,15 +138,13 @@ Provisioner. ### Minimum ready seconds -{{< feature-state for_k8s_version="v1.23" state="beta" >}} +{{< feature-state for_k8s_version="v1.25" state="stable" >}} `.spec.minReadySeconds` is an optional field that specifies the minimum number of seconds for which a newly -created Pod should be ready without any of its containers crashing, for it to be considered available. -Please note that this feature is beta and enabled by default. Please opt out by unsetting the -StatefulSetMinReadySeconds flag, if you don't -want this feature to be enabled. This field defaults to 0 (the Pod will be considered -available as soon as it is ready). To learn more about when a Pod is considered ready, see -[Container Probes](/docs/concepts/workloads/pods/pod-lifecycle/#container-probes). +created Pod should be running and ready without any of its containers crashing, for it to be considered available. +This is used to check progression of a rollout when using a [Rolling Update](#rolling-updates) strategy. +This field defaults to 0 (the Pod will be considered available as soon as it is ready). To learn more about when +a Pod is considered ready, see [Container Probes](/docs/concepts/workloads/pods/pod-lifecycle/#container-probes). ## Pod Identity diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 2500db659e05a..3d0c3e3580587 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -187,8 +187,6 @@ different Kubernetes components. | `SizeMemoryBackedVolumes` | `false` | Alpha | 1.20 | 1.21 | | `SizeMemoryBackedVolumes` | `true` | Beta | 1.22 | | | `StatefulSetAutoDeletePVC` | `false` | Alpha | 1.22 | | -| `StatefulSetMinReadySeconds` | `false` | Alpha | 1.22 | 1.22 | -| `StatefulSetMinReadySeconds` | `true` | Beta | 1.23 | | | `StorageVersionAPI` | `false` | Alpha | 1.20 | | | `StorageVersionHash` | `false` | Alpha | 1.14 | 1.14 | | `StorageVersionHash` | `true` | Beta | 1.15 | | @@ -486,6 +484,9 @@ different Kubernetes components. | `StartupProbe` | `false` | Alpha | 1.16 | 1.17 | | `StartupProbe` | `true` | Beta | 1.18 | 1.19 | | `StartupProbe` | `true` | GA | 1.20 | - | +| `StatefulSetMinReadySeconds` | `false` | Alpha | 1.22 | 1.22 | +| `StatefulSetMinReadySeconds` | `true` | Beta | 1.23 | 1.24 | +| `StatefulSetMinReadySeconds` | `true` | GA | 1.25 | - | | `StorageObjectInUseProtection` | `true` | Beta | 1.10 | 1.10 | | `StorageObjectInUseProtection` | `true` | GA | 1.11 | - | | `StreamingProxyRedirects` | `false` | Beta | 1.5 | 1.5 | From 29d9fa5a5f6cfac4364d59df83dd6d044675bff0 Mon Sep 17 00:00:00 2001 From: Tim Allclair Date: Fri, 5 Aug 2022 14:39:39 -0700 Subject: [PATCH 31/77] Remove prerequisites --- content/en/docs/concepts/security/pod-security-admission.md | 5 ----- 1 file changed, 5 deletions(-) diff --git a/content/en/docs/concepts/security/pod-security-admission.md b/content/en/docs/concepts/security/pod-security-admission.md index fb4d8de0c2330..57e4fd98003b5 100644 --- a/content/en/docs/concepts/security/pod-security-admission.md +++ b/content/en/docs/concepts/security/pod-security-admission.md @@ -8,7 +8,6 @@ description: > Standards. content_type: concept weight: 20 -min-kubernetes-server-version: v1.22 --- @@ -24,10 +23,6 @@ term_id="admission-controller" >}} to enforce the Pod Security Standards. Pod se are applied at the {{< glossary_tooltip text="namespace" term_id="namespace" >}} level when pods are created. -## {{% heading "prerequisites" %}} - -To use this mechanism, your cluster must enforce Pod Security admission. - ### Built-in Pod Security admission enforcement This page is part of the documentation for Kubernetes v{{< skew currentVersion >}}. From ecc7ed5a74bc45883215e8a40e1cd766e2679911 Mon Sep 17 00:00:00 2001 From: David Porter Date: Tue, 19 Jul 2022 12:50:18 -0700 Subject: [PATCH 32/77] Add cgroupv2 docs Signed-off-by: David Porter --- .../en/docs/concepts/architecture/cgroups.md | 134 ++++++++++++++++++ .../container-runtimes.md | 110 ++++++++------ 2 files changed, 198 insertions(+), 46 deletions(-) create mode 100644 content/en/docs/concepts/architecture/cgroups.md diff --git a/content/en/docs/concepts/architecture/cgroups.md b/content/en/docs/concepts/architecture/cgroups.md new file mode 100644 index 0000000000000..0cccf8c9dd84e --- /dev/null +++ b/content/en/docs/concepts/architecture/cgroups.md @@ -0,0 +1,134 @@ +--- +title: Cgroup V2 +content_type: concept +weight: 50 +--- + + + +On Linux, {{< glossary_tooltip text="control groups" term_id="cgroup" >}} +are used to constrain resources that are allocated to processes. + +{{< glossary_tooltip text="kubelet" term_id="kubelet" >}} and the +underlying container runtime need to interface with control groups to enforce +[resource mangement for pods and +containers](/docs/concepts/configuration/manage-resources-containers/) and set +resources such as cpu/memory requests and limits. + +There are two versions of cgroups in linux: cgroupv1 and cgroupv2. Cgroupv2 is +the new generation of the cgroup API. + + + + +## Cgroup version 2 {#cgroup-v2} +{{< feature-state for_k8s_version="v1.25" state="stable" >}} + +Cgroup v2 is the next version of the cgroup Linux API. Cgroup v2 provides a +unified control system, which provides enhanced resource management +capabilities. + +The new version offers several improvements over cgroup v1, some of these improvements are: + +- cleaner and easier to use API with a unified hierarchy +- safe sub-tree delegation to containers +- newer features like Pressure Stall Information +- enhanced accounting and isolation across multiple resources + - accounting for network memory + + +Some kubernetes features exclusively rely on on cgroupv2 for enhanced resource +management and isolation. For example, the +[MemoryQoS](/blog/2021/11/26/qos-memory-resources/) feature improves memory QoS +and relies on cgroupv2 primitives. New upcoming resource management +capabilities in kubelet will depend on cgroupv2 as well. + + +## Using cgroupv2 + +To use cgroupv2, it is recommended to use a Linux distribution which enables +cgroupv2 out of the box. Most new modern linux distributions have switched over +to cgroupv2 by default. + +To check if your distribution is using cgroupv2, follow the steps [below](#check-cgroup-version). + +To use cgroupv2 the following requirements must be met: + +* OS distribution enables cgroupv2 +* Linux Kernel version is >= 5.8 +* Container runtime supports cgroupv2 + * [containerd](https://containerd.io/) since 1.4 + * [cri-o](https://cri-o.io/) since 1.20 +* Kubelet and container runtime are configured to use the [systemd cgroup driver](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver) + +### Linux Distribution cgroupv2 support + +Many Linux Distributions have already switched over to use cgroupv2 by default, for example: + + +* Container Optimized OS M97 +* Ubuntu (since 21.10, 22.04+ recommended) +* Debian GNU/Linux (since Debian 11 buster) +* Fedora (since 31) +* Arch Linux (since April 2021) +* RHEL and RHEL-like distributions (since 9) + +To check if your distribution is using cgroupv2, refer to your distribution's +documentation or follow the steps [below](#check-cgroup-version) to verify the +configuration. + +You can also enable cgroupv2 manually on your Linux distribution by modifying +the kernel boot arguments in the GRUB command line, and setting +`systemd.unified_cgroup_hierarchy=1`, however it's recommended to use a +distribution that already enables cgroupv2 by default. + + +### Migrating to cgroupv2 + +To migrate to cgroupv2, update to a newer kernel version that enables cgroupv2 +by default, ensure your container runtime supports cgroupv2, and configure +kubelet and container runtime are configured to use the [systemd cgroup +driver](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver). + +Kubelet will automatically detect that the OS is running on cgroupv2 and will +perform accordingly, no additional configuration is required. + +There should not be any noticeable difference in the user experience when +switching to cgroup v2, unless users are accessing the cgroup file system +directly, either on the node or from within the containers. + +Cgroup V2 uses a new API as compared to cgroup V1, so if there are any +applications that directly access the cgroup file system, they need to be +updated to newer versions that support cgroupv2. For example: + +* Some third party monitoring and security agents may be dependent on cgroup filesystem. + Update them to the latest versions that support cgroupv2 +* If you are running [cAdvisor](https://github.com/google/cadvisor) as a + daemonset for monitoring pods and containers, update it to latest version (v0.45.0) +* If you use JDK (Java workload), prefer to use JDK 11.0.16 and later or JDK 15 + and later, which [fully support + cgroupv2](https://bugs.openjdk.org/browse/JDK-8230305) + + +## Identifying cgroup version used on Linux Nodes {#check-cgroup-version} + +The cgroup version is dependent on the Linux distribution being used and the +default cgroup version configured on the OS. To check which cgroup version your +OS Distro is using, you can run the `stat -fc %T /sys/fs/cgroup/` command on +the node and check if the output is `cgroup2fs`: + +```shell +# On a cgroupv2 node: +$ stat -fc %T /sys/fs/cgroup/ +cgroup2fs + +# On a cgroupv1 node: +$ stat -fc %T /sys/fs/cgroup/ +tmpfs +``` + +## {{% heading "whatsnext" %}} + +- Learn more about [cgroups](https://man7.org/linux/man-pages/man7/cgroups.7.html) +- Learn more about [container runtime](/docs/concepts/architecture/cri) +- Learn more about [cgroup drivers](/docs/setup/production-environment/container-runtimes#cgroup-drivers) diff --git a/content/en/docs/setup/production-environment/container-runtimes.md b/content/en/docs/setup/production-environment/container-runtimes.md index 44f098ccfcf69..82f08adec7e12 100644 --- a/content/en/docs/setup/production-environment/container-runtimes.md +++ b/content/en/docs/setup/production-environment/container-runtimes.md @@ -87,22 +87,72 @@ sudo sysctl --system On Linux, {{< glossary_tooltip text="control groups" term_id="cgroup" >}} are used to constrain resources that are allocated to processes. +Both {{< glossary_tooltip text="kubelet" term_id="kubelet" >}} and the +underlying container runtime need to interface with control groups to enforce +[resource mangement for pods and +containers](/docs/concepts/configuration/manage-resources-containers/) and set +resources such as cpu/memory requests and limits. To interface with control +groups, kubelet and container runtime need to use a "cgroup driver". It's +critical that both kubelet and the container runtime cgroup driver match and +are configured the same. + +There are two cgroup drivers available: + +* [`cgroupfs`](#cgroupfs-cgroup-driver) +* [`systemd`](#systemd-cgroup-driver) + +### cgroupfs driver {#cgroupfs-cgroup-driver} + +The `cgroupfs` driver is the default cgroup driver in kubelet. When `cgroupfs` +driver is used, kubelet and the container runtime will directly interface with +the cgroup filesystem to configure cgroups. + +The `cgroupfs` is **not** recommended to be used when +[systemd](https://www.freedesktop.org/wiki/Software/systemd/) is choosen as the +init system since systemd expects there to only be a single cgroup manager on +the system. Additionally, if [cgroupv2](/docs/concepts/architecture/cgroups) is +used, it's also recommended to use the `systemd` cgroup driver instead of +`cgroupfs`. + +### systemd cgroup driver {#systemd-cgroup-driver} + When [systemd](https://www.freedesktop.org/wiki/Software/systemd/) is chosen as the init system for a Linux distribution, the init process generates and consumes a root control group (`cgroup`) and acts as a cgroup manager. -Systemd has a tight integration with cgroups and allocates a cgroup per systemd unit. It's possible -to configure your container runtime and the kubelet to use `cgroupfs`. Using `cgroupfs` alongside -systemd means that there will be two different cgroup managers. -A single cgroup manager simplifies the view of what resources are being allocated -and will by default have a more consistent view of the available and in-use resources. -When there are two cgroup managers on a system, you end up with two views of those resources. -In the field, people have reported cases where nodes that are configured to use `cgroupfs` -for the kubelet and Docker, but `systemd` for the rest of the processes, become unstable under -resource pressure. +Systemd has a tight integration with cgroups and allocates a cgroup per systemd +unit. As a result, when using `systemd` as the init system, but `cgroupfs` +driver, there will be two different cpu managers on the system which is +undesirable. + +A single cgroup manager simplifies the view of what resources are being +allocated and will by default have a more consistent view of the available and +in-use resources. When there are two cgroup managers on a system, you end up +with two views of those resources. In the field, people have reported cases +where nodes that are configured to use `cgroupfs` for the kubelet and container +runtime, but `systemd` for the rest of the processes, become unstable under +resource pressure. Changing the settings such that your container runtime and +kubelet use `systemd` as the cgroup driver stabilized the system. + +Additionally, if your OS distribution is using [cgroupv2](/docs/concepts/architecture/cgroups), it is highly +recommended to use the `systemd` cgroup driver. + +To set `systemd` as the cgroup driver edit the +[`KubeletConfiguration`](/docs/tasks/administer-cluster/kubelet-config-file/) +option of `cgroupDriver` and set it to `systemd`. For example: + +```yaml +apiVersion: kubelet.config.k8s.io/v1beta1 +kind: KubeletConfiguration +... rest of config ... +cgroupDriver: systemd +``` -Changing the settings such that your container runtime and kubelet use `systemd` as the cgroup driver -stabilized the system. To configure this for Docker, set `native.cgroupdriver=systemd`. +If kubelet is configured with `systemd` as cgroupDriver, the container runtime +must also be configured to use the `systemd` as the cgroup driver. If using +containerd, it can be configured to use systemd cgroup driver as described +[here](#containerd-systemd). [CRI-O](#cri-o) already defaults to systemd cgroup +driver. For other container runtimes, refer to their specific documentation. {{< caution >}} Changing the cgroup driver of a Node that has joined a cluster is a sensitive operation. @@ -114,41 +164,6 @@ If you have automation that makes it feasible, replace the node with another usi configuration, or reinstall it using automation. {{< /caution >}} -### Cgroup version 2 {#cgroup-v2} - -Cgroup v2 is the next version of the cgroup Linux API. Differently than cgroup v1, there is a single -hierarchy instead of a different one for each controller. - -The new version offers several improvements over cgroup v1, some of these improvements are: - -- cleaner and easier to use API -- safe sub-tree delegation to containers -- newer features like Pressure Stall Information - -Even if the kernel supports a hybrid configuration where some controllers are managed by cgroup v1 -and some others by cgroup v2, Kubernetes supports only the same cgroup version to manage all the -controllers. - -If systemd doesn't use cgroup v2 by default, you can configure the system to use it by adding -`systemd.unified_cgroup_hierarchy=1` to the kernel command line. - -```shell -# This example is for a Linux OS that uses the DNF package manager -# Your system might use a different method for setting the command line -# that the Linux kernel uses. -sudo dnf install -y grubby && \ - sudo grubby \ - --update-kernel=ALL \ - --args="systemd.unified_cgroup_hierarchy=1" -``` - -If you change the command line for the kernel, you must reboot the node before your -change takes effect. - -There should not be any noticeable difference in the user experience when switching to cgroup v2, unless -users are accessing the cgroup file system directly, either on the node or from within the containers. - -In order to use it, cgroup v2 must be supported by the CRI runtime as well. ### Migrating to the `systemd` driver in kubeadm managed clusters @@ -197,6 +212,9 @@ To use the `systemd` cgroup driver in `/etc/containerd/config.toml` with `runc`, [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] SystemdCgroup = true ``` + +`systemd` cgroup driver is recommended to set if using [cgroupv2](/docs/concepts/architecture/cgroups). + {{< note >}} If you installed containerd from a package (for example, RPM or `.deb`), you may find that the CRI integration plugin is disabled by default. From b26ae49d9a0b8724913cfa42942353457a0a3c63 Mon Sep 17 00:00:00 2001 From: Kensei Nakada Date: Sat, 6 Aug 2022 15:29:27 +0900 Subject: [PATCH 33/77] Update the doc for minDomains to graduate minDomains to beta --- .../scheduling-eviction/topology-spread-constraints.md | 7 +++---- .../command-line-tools-reference/feature-gates.md | 3 ++- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md b/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md index 77f4d1ea55362..f3d2892e90bea 100644 --- a/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md +++ b/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md @@ -60,7 +60,7 @@ spec: # Configure a topology spread constraint topologySpreadConstraints: - maxSkew: - minDomains: # optional; alpha since v1.24 + minDomains: # optional; beta since v1.25 topologyKey: whenUnsatisfiable: labelSelector: @@ -93,9 +93,8 @@ your cluster. Those fields are: nodes match the node selector. {{< note >}} - The `minDomains` field is an alpha field added in 1.24. You have to enable the - `MinDomainsInPodToplogySpread` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) - in order to use it. + The `minDomains` field is a beta field and enabled by default in 1.25. You can disable it by disabling the + `MinDomainsInPodToplogySpread` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/). {{< /note >}} - The value of `minDomains` must be greater than 0, when specified. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 8262abf7dd89c..f50863e1280a0 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -148,7 +148,8 @@ different Kubernetes components. | `MemoryManager` | `false` | Alpha | 1.21 | 1.21 | | `MemoryManager` | `true` | Beta | 1.22 | | | `MemoryQoS` | `false` | Alpha | 1.22 | | -| `MinDomainsInPodTopologySpread` | `false` | Alpha | 1.24 | | +| `MinDomainsInPodTopologySpread` | `false` | Alpha | 1.24 | 1.24 | +| `MinDomainsInPodTopologySpread` | `true` | Beta | 1.25 | | | `MixedProtocolLBService` | `false` | Alpha | 1.20 | 1.23 | | `MixedProtocolLBService` | `true` | Beta | 1.24 | | | `NetworkPolicyEndPort` | `false` | Alpha | 1.21 | 1.21 | From f5153aa41d875da517ca64868d53c5e005cf20ca Mon Sep 17 00:00:00 2001 From: Divyen Patel Date: Fri, 29 Jul 2022 13:16:07 -0700 Subject: [PATCH 34/77] update migration requirement for in-tree vSphere volumes Co-authored-by: Tim Bannister --- content/en/docs/concepts/storage/volumes.md | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/content/en/docs/concepts/storage/volumes.md b/content/en/docs/concepts/storage/volumes.md index fd994c97fa7fb..1dba0781664f6 100644 --- a/content/en/docs/concepts/storage/volumes.md +++ b/content/en/docs/concepts/storage/volumes.md @@ -996,20 +996,22 @@ For more information, see the [vSphere volume](https://github.com/kubernetes/exa {{< feature-state for_k8s_version="v1.19" state="beta" >}} -The `CSIMigration` feature for `vsphereVolume`, when enabled, redirects all plugin operations -from the existing in-tree plugin to the `csi.vsphere.vmware.com` {{< glossary_tooltip text="CSI" term_id="csi" >}} driver. In order to use this feature, the +The `CSIMigrationvSphere` feature for `vsphereVolume` is enabled by default as of Kubernetes v1.25. +All plugin operations from the in-tree `vspherevolume` will be redirected to the `csi.vsphere.vmware.com` {{< glossary_tooltip text="CSI" term_id="csi" >}} driver unless `CSIMigrationvSphere` feature gate is disabled. + + [vSphere CSI driver](https://github.com/kubernetes-sigs/vsphere-csi-driver) -must be installed on the cluster and the `CSIMigration` and `CSIMigrationvSphere` -[feature gates](/docs/reference/command-line-tools-reference/feature-gates/) must be enabled. -You can find additional advice on how to migrate in VMware's +must be installed on the cluster. You can find additional advice on how to migrate in-tree `vsphereVolume` in VMware's documentation page [Migrating In-Tree vSphere Volumes to vSphere Container Storage Plug-in](https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/2.0/vmware-vsphere-csp-getting-started/GUID-968D421F-D464-4E22-8127-6CB9FF54423F.html). -Kubernetes v{{< skew currentVersion >}} requires that you are using vSphere 7.0u2 or later -in order to migrate to the out-of-tree CSI driver. +As of Kubernetes v1.25, vSphere releases less than 7.0u2 are not supported for the +(deprecated) in-tree vSphere storage driver. You must run vSphere 7.0u2 or later +in order to either continue using the deprecated driver, or to migrate to +the replacement CSI driver. + If you are running a version of Kubernetes other than v{{< skew currentVersion >}}, consult the documentation for that version of Kubernetes. -If you are running Kubernetes v{{< skew currentVersion >}} and an older version of vSphere, -consider upgrading to at least vSphere 7.0u2. + {{< note >}} The following StorageClass parameters from the built-in `vsphereVolume` plugin are not supported by the vSphere CSI driver: From c4658531ded500939752c3f0f9528d6bc1042121 Mon Sep 17 00:00:00 2001 From: Ravi Gudimetla Date: Sun, 7 Aug 2022 10:50:55 -0400 Subject: [PATCH 35/77] Update content/en/docs/reference/command-line-tools-reference/feature-gates.md Co-authored-by: Qiming Teng --- .../reference/command-line-tools-reference/feature-gates.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 6777161ad047d..c1922c7fe8ebe 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -118,9 +118,6 @@ different Kubernetes components. | `HonorPVReclaimPolicy` | `false` | Alpha | 1.23 | | | `HPAContainerMetrics` | `false` | Alpha | 1.20 | | | `HPAScaleToZero` | `false` | Alpha | 1.16 | | -| `IdentifyPodOS` | `false` | Alpha | 1.23 | 1.23 | -| `IdentifyPodOS` | `true` | Beta | 1.24 | | -| `IdentifyPodOS` | `true` | GA | 1.25 | | | `InTreePluginAWSUnregister` | `false` | Alpha | 1.21 | | | `InTreePluginAzureDiskUnregister` | `false` | Alpha | 1.21 | | | `InTreePluginAzureFileUnregister` | `false` | Alpha | 1.21 | | @@ -364,6 +361,9 @@ different Kubernetes components. | `HugePages` | `true` | GA | 1.14 | - | | `HyperVContainer` | `false` | Alpha | 1.10 | 1.19 | | `HyperVContainer` | `false` | Deprecated | 1.20 | - | +| `IdentifyPodOS` | `false` | Alpha | 1.23 | 1.23 | +| `IdentifyPodOS` | `true` | Beta | 1.24 | 1.24 | +| `IdentifyPodOS` | `true` | GA | 1.25 | - | | `IPv6DualStack` | `false` | Alpha | 1.15 | 1.20 | | `IPv6DualStack` | `true` | Beta | 1.21 | 1.22 | | `IPv6DualStack` | `true` | GA | 1.23 | - | From c8bedc86a375b6bd33e4b171d2441bc830d1a423 Mon Sep 17 00:00:00 2001 From: Ravi Gudimetla Date: Sun, 7 Aug 2022 10:51:07 -0400 Subject: [PATCH 36/77] Update content/en/docs/concepts/windows/user-guide.md Co-authored-by: Tim Bannister --- content/en/docs/concepts/windows/user-guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/docs/concepts/windows/user-guide.md b/content/en/docs/concepts/windows/user-guide.md index c40f6e7e68843..9e64057000bf2 100644 --- a/content/en/docs/concepts/windows/user-guide.md +++ b/content/en/docs/concepts/windows/user-guide.md @@ -158,7 +158,7 @@ schedule Linux and Windows workloads to their respective OS-specific nodes. The recommended approach is outlined below, with one of its main goals being that this approach should not break compatibility for existing Linux workloads. -Starting from 1.25, please set `.spec.os.name` for a Pod to indicate the operating system +Starting from 1.25, you can (and should) set `.spec.os.name` for each Pod, to indicate the operating system that the containers in that Pod are designed for. For Pods that run Linux containers, set `.spec.os.name` to `linux`. For Pods that run Windows containers, set `.spec.os.name` to `windows`. From b3fbe713742c41e9045ce9a58b4e0bf78eda3faf Mon Sep 17 00:00:00 2001 From: Mengjiao Liu Date: Fri, 5 Aug 2022 14:44:13 +0800 Subject: [PATCH 37/77] Update docs for setting Sysctls for a Pod to support setting sysctls with slashes. --- content/en/docs/tasks/administer-cluster/sysctl-cluster.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/content/en/docs/tasks/administer-cluster/sysctl-cluster.md b/content/en/docs/tasks/administer-cluster/sysctl-cluster.md index b07c8e0689ff2..1a560cd8e65a7 100644 --- a/content/en/docs/tasks/administer-cluster/sysctl-cluster.md +++ b/content/en/docs/tasks/administer-cluster/sysctl-cluster.md @@ -15,14 +15,13 @@ interface. {{< note >}} Starting from Kubernetes version 1.23, the kubelet supports the use of either `/` or `.` -as separators for sysctl names. +as separators for sysctl names. +Starting from Kubernetes version 1.25, setting Sysctls for a Pod supports setting sysctls with slashes. For example, you can represent the same sysctl name as `kernel.shm_rmid_forced` using a period as the separator, or as `kernel/shm_rmid_forced` using a slash as a separator. For more sysctl parameter conversion method details, please refer to the page [sysctl.d(5)](https://man7.org/linux/man-pages/man5/sysctl.d.5.html) from the Linux man-pages project. -Setting Sysctls for a Pod and PodSecurityPolicy features do not yet support -setting sysctls with slashes. {{< /note >}} ## {{% heading "prerequisites" %}} From cce49fe7e65b316bd188095d6b0a5c3a87f95098 Mon Sep 17 00:00:00 2001 From: Alex Wang Date: Tue, 9 Aug 2022 10:59:16 +0800 Subject: [PATCH 38/77] add doc for MatchLabelKeysInPodTopologySpread Signed-off-by: Alex Wang --- .../topology-spread-constraints.md | 22 +++++++++++++++++++ .../feature-gates.md | 3 +++ 2 files changed, 25 insertions(+) diff --git a/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md b/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md index 77f4d1ea55362..2a56eafdf866c 100644 --- a/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md +++ b/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md @@ -64,6 +64,7 @@ spec: topologyKey: whenUnsatisfiable: labelSelector: + matchLabelKeys: # optional; alpha since v1.25 ### other Pod fields go here ``` @@ -123,6 +124,27 @@ your cluster. Those fields are: See [Label Selectors](/docs/concepts/overview/working-with-objects/labels/#label-selectors) for more details. +- **matchLabelKeys** is a list of pod label keys to select the pods over which + spreading will be calculated. The keys are used to lookup values from the pod labels, those key-value labels are ANDed with `labelSelector` to select the group of existing pods over which spreading will be calculated for the incoming pod. Keys that don't exist in the pod labels will be ignored. A null or empty list means only match against the `labelSelector`. + + With `matchLabelKeys`, users don't need to update the `pod.spec` between different revisions. The controller/operator just needs to set different values to the same `label` key for different revisions. The scheduler will assume the values automatically based on `matchLabelKeys`. For example, if users use Deployment, they can use the label keyed with `pod-template-hash`, which is added automatically by the Deployment controller, to distinguish between different revisions in a single Deployment. + + ```yaml + topologySpreadConstraints: + - maxSkew: 1 + topologyKey: kubernetes.io/hostname + whenUnsatisfiable: DoNotSchedule + matchLabelKeys: + - app + - pod-template-hash + ``` + + {{< note >}} + The `matchLabelKeys` field is an alpha field added in 1.25. You have to enable the + `MatchLabelKeysInPodTopologySpread` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) + in order to use it. + {{< /note >}} + When a Pod defines more than one `topologySpreadConstraint`, those constraints are combined using a logical AND operation: the kube-scheduler looks for a node for the incoming Pod that satisfies all the configured constraints. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 0771736f4fa4c..b155cb05ce4d8 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -146,6 +146,7 @@ different Kubernetes components. | `LocalStorageCapacityIsolationFSQuotaMonitoring` | `false` | Alpha | 1.15 | | | `LogarithmicScaleDown` | `false` | Alpha | 1.21 | 1.21 | | `LogarithmicScaleDown` | `true` | Beta | 1.22 | | +| `MatchLabelKeysInPodTopologySpread` | `false` | Alpha | 1.25 | | | `MaxUnavailableStatefulSet` | `false` | Alpha | 1.24 | | | `MemoryManager` | `false` | Alpha | 1.21 | 1.21 | | `MemoryManager` | `true` | Beta | 1.22 | | @@ -987,6 +988,8 @@ Each feature gate is designed for enabling/disabling a specific feature: filesystem walk for better performance and accuracy. - `LogarithmicScaleDown`: Enable semi-random selection of pods to evict on controller scaledown based on logarithmic bucketing of pod timestamps. +- `MatchLabelKeysInPodTopologySpread`: Enable the `matchLabelKeys` field for + [Pod topology spread constraints](/docs/concepts/scheduling-eviction/topology-spread-constraints/). - `MaxUnavailableStatefulSet`: Enables setting the `maxUnavailable` field for the [rolling update strategy](/docs/concepts/workloads/controllers/statefulset/#rolling-updates) of a StatefulSet. The field specifies the maximum number of Pods From 287dff788fb115af32a771805316ce534b065371 Mon Sep 17 00:00:00 2001 From: Antonio Ojea Date: Tue, 9 Aug 2022 00:59:09 +0200 Subject: [PATCH 39/77] set ServiceIPStaticSubrange to beta --- content/en/docs/concepts/services-networking/service.md | 2 +- .../reference/command-line-tools-reference/feature-gates.md | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/content/en/docs/concepts/services-networking/service.md b/content/en/docs/concepts/services-networking/service.md index eda3b6b9b33d8..4be77163db3f1 100644 --- a/content/en/docs/concepts/services-networking/service.md +++ b/content/en/docs/concepts/services-networking/service.md @@ -1325,7 +1325,7 @@ IP addresses that are no longer used by any Services. #### IP address ranges for `type: ClusterIP` Services {#service-ip-static-sub-range} -{{< feature-state for_k8s_version="v1.24" state="alpha" >}} +{{< feature-state for_k8s_version="v1.25" state="beta" >}} However, there is a problem with this `ClusterIP` allocation strategy, because a user can also [choose their own address for the service](#choosing-your-own-ip-address). This could result in a conflict if the internal allocator selects the same IP address diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index f50863e1280a0..44ec32bd96914 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -182,7 +182,8 @@ different Kubernetes components. | `ServerSideFieldValidation` | `false` | Alpha | 1.23 | - | | `ServiceInternalTrafficPolicy` | `false` | Alpha | 1.21 | 1.21 | | `ServiceInternalTrafficPolicy` | `true` | Beta | 1.22 | | -| `ServiceIPStaticSubrange` | `false` | Alpha | 1.24 | | +| `ServiceIPStaticSubrange` | `false` | Alpha | 1.24 | 1.24 | +| `ServiceIPStaticSubrange` | `true` | Beta | 1.25 | | | `SizeMemoryBackedVolumes` | `false` | Alpha | 1.20 | 1.21 | | `SizeMemoryBackedVolumes` | `true` | Beta | 1.22 | | | `StatefulSetAutoDeletePVC` | `false` | Alpha | 1.22 | | From 19096a8f47a761712080830f42a08d0c27dcb471 Mon Sep 17 00:00:00 2001 From: Antonio Ojea Date: Tue, 9 Aug 2022 13:06:03 +0200 Subject: [PATCH 40/77] ServiceIPStaticSubrange en:Service mconcept entions ait is bled by default --- content/en/docs/concepts/services-networking/service.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/content/en/docs/concepts/services-networking/service.md b/content/en/docs/concepts/services-networking/service.md index 4be77163db3f1..db0b32baa27b0 100644 --- a/content/en/docs/concepts/services-networking/service.md +++ b/content/en/docs/concepts/services-networking/service.md @@ -1331,9 +1331,8 @@ can also [choose their own address for the service](#choosing-your-own-ip-addres This could result in a conflict if the internal allocator selects the same IP address for another Service. -If you enable the `ServiceIPStaticSubrange` -[feature gate](/docs/reference/command-line-tools-reference/feature-gates/), -the allocation strategy divides the `ClusterIP` range into two bands, based on +The `ServiceIPStaticSubrange` +[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled by default in v1.25 and later, using an allocation strategy that divides the `ClusterIP` range into two bands, based on the size of the configured `service-cluster-ip-range` by using the following formula `min(max(16, cidrSize / 16), 256)`, described as _never less than 16 or more than 256, with a graduated step function between them_. Dynamic IP allocations will be preferentially From 7b75c2f3165023d0d79028523cc7be6d672e7e50 Mon Sep 17 00:00:00 2001 From: Arpit Singh Date: Sat, 30 Jul 2022 00:55:07 -0700 Subject: [PATCH 41/77] Add docs for KEP-3327 Add CPUManager policy option to align CPUs by Socket instead of by NUMA node --- .../cpu-management-policies.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/content/en/docs/tasks/administer-cluster/cpu-management-policies.md b/content/en/docs/tasks/administer-cluster/cpu-management-policies.md index 0a9183f276861..6400f40957e99 100644 --- a/content/en/docs/tasks/administer-cluster/cpu-management-policies.md +++ b/content/en/docs/tasks/administer-cluster/cpu-management-policies.md @@ -256,6 +256,7 @@ You will still have to enable each option using the `CPUManagerPolicyOptions` ku The following policy options exist for the static `CPUManager` policy: * `full-pcpus-only` (beta, visible by default) * `distribute-cpus-across-numa` (alpha, hidden by default) +* `align-by-socket` (alpha, hidden by default) If the `full-pcpus-only` policy option is specified, the static policy will always allocate full physical cores. By default, without this option, the static policy allocates CPUs using a topology-aware best-fit allocation. @@ -279,6 +280,19 @@ By distributing CPUs evenly across NUMA nodes, application developers can more easily ensure that no single worker suffers from NUMA effects more than any other, improving the overall performance of these types of applications. +If the `align-by-socket` policy option is specified, CPUs will be considered +aligned at the socket boundary when deciding how to allocate CPUs to a +container. By default, the `CPUManager` aligns CPU allocations at the NUMA +boundary, which could result in performance degradation if CPUs need to be +pulled from more than one NUMA node to satisfy the allocation. Although it +tries to ensure that all CPUs are allocated from the _minimum_ number of NUMA +nodes, there is no guarantee that those NUMA nodes will be on the same socket. +By directing the `CPUManager` to explicitly align CPUs at the socket boundary +rather than the NUMA boundary, we are able to avoid such issues. Note, this +policy option is not compatible with `TopologyManager` `single-numa-node` +policy and does not apply to hardware where the number of sockets is greater +than number of NUMA nodes. + The `full-pcpus-only` option can be enabled by adding `full-pcups-only=true` to the CPUManager policy options. Likewise, the `distribute-cpus-across-numa` option can be enabled by adding @@ -286,3 +300,6 @@ Likewise, the `distribute-cpus-across-numa` option can be enabled by adding When both are set, they are "additive" in the sense that CPUs will be distributed across NUMA nodes in chunks of full-pcpus rather than individual cores. +The `align-by-socket` policy option can be enabled by adding `align-by-socket=true` +to the `CPUManager` policy options. It is also additive to the `full-pcpus-only` +and `distribute-cpus-across-numa` policy options. From b35f70486e13936b5471793f2d81e8c2e475e40c Mon Sep 17 00:00:00 2001 From: Rey Lejano Date: Tue, 9 Aug 2022 15:05:04 -0700 Subject: [PATCH 42/77] update docs for NetworkPolicy port range to GA for 1.25 --- .../docs/concepts/services-networking/network-policies.md | 8 ++------ .../command-line-tools-reference/feature-gates.md | 5 +++-- 2 files changed, 5 insertions(+), 8 deletions(-) diff --git a/content/en/docs/concepts/services-networking/network-policies.md b/content/en/docs/concepts/services-networking/network-policies.md index 9a97abfb5b1bf..b156e0fd881c6 100644 --- a/content/en/docs/concepts/services-networking/network-policies.md +++ b/content/en/docs/concepts/services-networking/network-policies.md @@ -193,9 +193,9 @@ When the feature gate is enabled, you can set the `protocol` field of a NetworkP You must be using a {{< glossary_tooltip text="CNI" term_id="cni" >}} plugin that supports SCTP protocol NetworkPolicies. {{< /note >}} -## Targeting a range of Ports +## Targeting a range of ports -{{< feature-state for_k8s_version="v1.22" state="beta" >}} +{{< feature-state for_k8s_version="v1.25" state="stable" >}} When writing a NetworkPolicy, you can target a range of ports instead of a single port. @@ -228,10 +228,6 @@ with any IP within the range `10.0.0.0/24` over TCP, provided that the target port is between the range 32000 and 32768. The following restrictions apply when using this field: -* As a beta feature, this is enabled by default. To disable the `endPort` field -at a cluster level, you (or your cluster administrator) need to disable the -`NetworkPolicyEndPort` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) -for the API server with `--feature-gates=NetworkPolicyEndPort=false,…`. * The `endPort` field must be equal to or greater than the `port` field. * `endPort` can only be defined if `port` is also defined. * Both ports must be numeric. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index f50863e1280a0..7914a07bf1872 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -152,8 +152,6 @@ different Kubernetes components. | `MinDomainsInPodTopologySpread` | `true` | Beta | 1.25 | | | `MixedProtocolLBService` | `false` | Alpha | 1.20 | 1.23 | | `MixedProtocolLBService` | `true` | Beta | 1.24 | | -| `NetworkPolicyEndPort` | `false` | Alpha | 1.21 | 1.21 | -| `NetworkPolicyEndPort` | `true` | Beta | 1.22 | | | `NetworkPolicyStatus` | `false` | Alpha | 1.24 | | | `NodeSwap` | `false` | Alpha | 1.22 | | | `NodeOutOfServiceVolumeDetach` | `false` | Alpha | 1.24 | | @@ -395,6 +393,9 @@ different Kubernetes components. | `MountPropagation` | `true` | GA | 1.12 | - | | `NamespaceDefaultLabelName` | `true` | Beta | 1.21 | 1.21 | | `NamespaceDefaultLabelName` | `true` | GA | 1.22 | - | +| `NetworkPolicyEndPort` | `false` | Alpha | 1.21 | 1.21 | +| `NetworkPolicyEndPort` | `true` | Beta | 1.22 | | +| `NetworkPolicyEndPort` | `true` | GA | 1.25 | - | | `NodeDisruptionExclusion` | `false` | Alpha | 1.16 | 1.18 | | `NodeDisruptionExclusion` | `true` | Beta | 1.19 | 1.20 | | `NodeDisruptionExclusion` | `true` | GA | 1.21 | - | From 9dee6a04913f27241d3de6301fa97bba6db00b27 Mon Sep 17 00:00:00 2001 From: David Porter Date: Mon, 8 Aug 2022 12:39:08 -0700 Subject: [PATCH 43/77] Apply suggestions from code review Co-authored-by: Shannon Kularathna Signed-off-by: David Porter --- .../en/docs/concepts/architecture/cgroups.md | 140 +++++++++--------- .../container-runtimes.md | 63 ++++---- 2 files changed, 95 insertions(+), 108 deletions(-) diff --git a/content/en/docs/concepts/architecture/cgroups.md b/content/en/docs/concepts/architecture/cgroups.md index 0cccf8c9dd84e..b571d2484d57c 100644 --- a/content/en/docs/concepts/architecture/cgroups.md +++ b/content/en/docs/concepts/architecture/cgroups.md @@ -1,5 +1,5 @@ --- -title: Cgroup V2 +title: About cgroup v2 content_type: concept weight: 50 --- @@ -7,126 +7,118 @@ weight: 50 On Linux, {{< glossary_tooltip text="control groups" term_id="cgroup" >}} -are used to constrain resources that are allocated to processes. +constrain resources that are allocated to processes. -{{< glossary_tooltip text="kubelet" term_id="kubelet" >}} and the -underlying container runtime need to interface with control groups to enforce -[resource mangement for pods and -containers](/docs/concepts/configuration/manage-resources-containers/) and set -resources such as cpu/memory requests and limits. +The {{< glossary_tooltip text="kubelet" term_id="kubelet" >}} and the +underlying container runtime need to interface with cgroups to enforce +[resource mangement for pods and containers](/docs/concepts/configuration/manage-resources-containers/) which +includes cpu/memory requests and limits for containerized workloads. -There are two versions of cgroups in linux: cgroupv1 and cgroupv2. Cgroupv2 is -the new generation of the cgroup API. +There are two versions of cgroups in Linux: cgroup v1 and cgroup v2. cgroup v2 is +the new generation of the `cgroup` API. -## Cgroup version 2 {#cgroup-v2} +## What is cgroup v2? {#cgroup-v2} {{< feature-state for_k8s_version="v1.25" state="stable" >}} -Cgroup v2 is the next version of the cgroup Linux API. Cgroup v2 provides a -unified control system, which provides enhanced resource management +cgroup v2 is the next version of the Linux `cgroup` API. cgroup v2 provides a +unified control system with enhanced resource management capabilities. -The new version offers several improvements over cgroup v1, some of these improvements are: +cgroup v2 offers several improvements over cgroup v1, such as the following: -- cleaner and easier to use API with a unified hierarchy -- safe sub-tree delegation to containers -- newer features like Pressure Stall Information -- enhanced accounting and isolation across multiple resources - - accounting for network memory +- Single unified hierarchy design in API +- Safer sub-tree delegation to containers +- Newer features like [Pressure Stall Information](https://www.kernel.org/doc/html/latest/accounting/psi.html) +- Enhanced resource allocation management and isolation across multiple resources + - Unified accounting for different types of memory allocations (network memory, kernel memory, etc) + - Accounting for non-immediate resource changes such as page cache write backs - -Some kubernetes features exclusively rely on on cgroupv2 for enhanced resource +Some Kubernetes features exclusively use cgroup v2 for enhanced resource management and isolation. For example, the [MemoryQoS](/blog/2021/11/26/qos-memory-resources/) feature improves memory QoS -and relies on cgroupv2 primitives. New upcoming resource management -capabilities in kubelet will depend on cgroupv2 as well. +and relies on cgroup v2 primitives. + +## Using cgroup v2 {#using-cgroupv2} -## Using cgroupv2 +The recommended way to use cgroup v2 is to use a Linux distribution that +enables and uses cgroup v2 by default. -To use cgroupv2, it is recommended to use a Linux distribution which enables -cgroupv2 out of the box. Most new modern linux distributions have switched over -to cgroupv2 by default. +To check if your distribution uses cgroup v2, refer to [Identify cgroup version on Linux nodes](#check-cgroup-version). -To check if your distribution is using cgroupv2, follow the steps [below](#check-cgroup-version). +### Requirements -To use cgroupv2 the following requirements must be met: +cgroup v2 has the following requirements: -* OS distribution enables cgroupv2 -* Linux Kernel version is >= 5.8 -* Container runtime supports cgroupv2 - * [containerd](https://containerd.io/) since 1.4 - * [cri-o](https://cri-o.io/) since 1.20 -* Kubelet and container runtime are configured to use the [systemd cgroup driver](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver) +* OS distribution enables cgroup v2 +* Linux Kernel version is 5.8 or later +* Container runtime supports cgroup v2. For example: + * [containerd](https://containerd.io/) v1.4 and later + * [cri-o](https://cri-o.io/) v1.20 and later +* The kubelet and the container runtime are configured to use the [systemd cgroup driver](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver) -### Linux Distribution cgroupv2 support +### Linux Distribution cgroup v2 support -Many Linux Distributions have already switched over to use cgroupv2 by default, for example: +For a list of Linux distributions that use cgroup v2, refer to the [cgroup v2 documentation](https://github.com/opencontainers/runc/blob/main/docs/cgroup-v2.md) -* Container Optimized OS M97 +* Container Optimized OS (since M97) * Ubuntu (since 21.10, 22.04+ recommended) -* Debian GNU/Linux (since Debian 11 buster) +* Debian GNU/Linux (since Debian 11 bullseye) * Fedora (since 31) * Arch Linux (since April 2021) * RHEL and RHEL-like distributions (since 9) -To check if your distribution is using cgroupv2, refer to your distribution's -documentation or follow the steps [below](#check-cgroup-version) to verify the -configuration. +To check if your distribution is using cgroup v2, refer to your distribution's +documentation or follow the instructions in [Identify the cgroup version on Linux nodes](#check-cgroup-version). -You can also enable cgroupv2 manually on your Linux distribution by modifying -the kernel boot arguments in the GRUB command line, and setting -`systemd.unified_cgroup_hierarchy=1`, however it's recommended to use a -distribution that already enables cgroupv2 by default. +You can also enable cgroup v2 manually on your Linux distribution by modifying +the kernel cmdline boot arguments. If your distribution uses GRUB, +`systemd.unified_cgroup_hierarchy=1` should be added in `GRUB_CMDLINE_LINUX` +under `/etc/default/grub`, followed by `sudo update-grub`. However, the +recommended approach is to use a distribution that already enables cgroup v2 by +default. +### Migrating to cgroup v2 {#migrating-cgroupv2} -### Migrating to cgroupv2 +To migrate to cgroup v2, ensure that you meet the [requirements](#requirements), then upgrade +to a kernel version that enables cgroup v2 by default. -To migrate to cgroupv2, update to a newer kernel version that enables cgroupv2 -by default, ensure your container runtime supports cgroupv2, and configure -kubelet and container runtime are configured to use the [systemd cgroup -driver](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver). - -Kubelet will automatically detect that the OS is running on cgroupv2 and will -perform accordingly, no additional configuration is required. +The kubelet automatically detects that the OS is running on cgroup v2 and +performs accordingly with no additional configuration required. There should not be any noticeable difference in the user experience when switching to cgroup v2, unless users are accessing the cgroup file system directly, either on the node or from within the containers. -Cgroup V2 uses a new API as compared to cgroup V1, so if there are any +cgroup v2 uses a different API than cgroup v1, so if there are any applications that directly access the cgroup file system, they need to be -updated to newer versions that support cgroupv2. For example: - -* Some third party monitoring and security agents may be dependent on cgroup filesystem. - Update them to the latest versions that support cgroupv2 -* If you are running [cAdvisor](https://github.com/google/cadvisor) as a - daemonset for monitoring pods and containers, update it to latest version (v0.45.0) -* If you use JDK (Java workload), prefer to use JDK 11.0.16 and later or JDK 15 - and later, which [fully support - cgroupv2](https://bugs.openjdk.org/browse/JDK-8230305) +updated to newer versions that support cgroup v2. For example: +* Some third-party monitoring and security agents may depend on the cgroup filesystem. + Update these agents to versions that support cgroup v2. +* If you run [cAdvisor](https://github.com/google/cadvisor) as a stand-alone + DaemonSet for monitoring pods and containers, update it to v0.43.0 or later. +* If you use JDK, prefer to use JDK 11.0.16 and later or JDK 15 and later, which [fully support cgroup v2](https://bugs.openjdk.org/browse/JDK-8230305). -## Identifying cgroup version used on Linux Nodes {#check-cgroup-version} +## Identify the cgroup version on Linux Nodes {#check-cgroup-version} -The cgroup version is dependent on the Linux distribution being used and the +The cgroup version depends on on the Linux distribution being used and the default cgroup version configured on the OS. To check which cgroup version your -OS Distro is using, you can run the `stat -fc %T /sys/fs/cgroup/` command on -the node and check if the output is `cgroup2fs`: +distribution uses, run the `stat -fc %T /sys/fs/cgroup/` command on +the node: ```shell -# On a cgroupv2 node: -$ stat -fc %T /sys/fs/cgroup/ -cgroup2fs - -# On a cgroupv1 node: -$ stat -fc %T /sys/fs/cgroup/ -tmpfs +stat -fc %T /sys/fs/cgroup/ ``` +For cgroup v2, the output is `cgroup2fs`. + +For cgroup v1, the output is `tmpfs.` + ## {{% heading "whatsnext" %}} - Learn more about [cgroups](https://man7.org/linux/man-pages/man7/cgroups.7.html) diff --git a/content/en/docs/setup/production-environment/container-runtimes.md b/content/en/docs/setup/production-environment/container-runtimes.md index 82f08adec7e12..d3f3e4ee0f4f5 100644 --- a/content/en/docs/setup/production-environment/container-runtimes.md +++ b/content/en/docs/setup/production-environment/container-runtimes.md @@ -89,12 +89,11 @@ are used to constrain resources that are allocated to processes. Both {{< glossary_tooltip text="kubelet" term_id="kubelet" >}} and the underlying container runtime need to interface with control groups to enforce -[resource mangement for pods and -containers](/docs/concepts/configuration/manage-resources-containers/) and set +[resource management for pods and containers](/docs/concepts/configuration/manage-resources-containers/) and set resources such as cpu/memory requests and limits. To interface with control -groups, kubelet and container runtime need to use a "cgroup driver". It's -critical that both kubelet and the container runtime cgroup driver match and -are configured the same. +groups, the kubelet and the container runtime need to use a *cgroup driver*. +It's critical that the kubelet and the container runtime uses the same cgroup +driver and are configured the same. There are two cgroup drivers available: @@ -103,15 +102,15 @@ There are two cgroup drivers available: ### cgroupfs driver {#cgroupfs-cgroup-driver} -The `cgroupfs` driver is the default cgroup driver in kubelet. When `cgroupfs` -driver is used, kubelet and the container runtime will directly interface with +The `cgroupfs` driver is the default cgroup driver in the kubelet. When the `cgroupfs` +driver is used, the kubelet and the container runtime directly interface with the cgroup filesystem to configure cgroups. -The `cgroupfs` is **not** recommended to be used when -[systemd](https://www.freedesktop.org/wiki/Software/systemd/) is choosen as the -init system since systemd expects there to only be a single cgroup manager on -the system. Additionally, if [cgroupv2](/docs/concepts/architecture/cgroups) is -used, it's also recommended to use the `systemd` cgroup driver instead of +The `cgroupfs` driver is **not** recommended when +[systemd](https://www.freedesktop.org/wiki/Software/systemd/) is the +init system because systemd expects a single cgroup manager on +the system. Additionally, if you use [cgroup v2](/docs/concepts/architecture/cgroups) +, use the `systemd` cgroup driver instead of `cgroupfs`. ### systemd cgroup driver {#systemd-cgroup-driver} @@ -120,39 +119,35 @@ When [systemd](https://www.freedesktop.org/wiki/Software/systemd/) is chosen as system for a Linux distribution, the init process generates and consumes a root control group (`cgroup`) and acts as a cgroup manager. -Systemd has a tight integration with cgroups and allocates a cgroup per systemd -unit. As a result, when using `systemd` as the init system, but `cgroupfs` -driver, there will be two different cpu managers on the system which is -undesirable. +systemd has a tight integration with cgroups and allocates a cgroup per systemd +unit. As a result, if you use `systemd` as the init system with the `cgroupfs` +driver, the system gets two different cgroup managers. -A single cgroup manager simplifies the view of what resources are being -allocated and will by default have a more consistent view of the available and -in-use resources. When there are two cgroup managers on a system, you end up -with two views of those resources. In the field, people have reported cases -where nodes that are configured to use `cgroupfs` for the kubelet and container -runtime, but `systemd` for the rest of the processes, become unstable under -resource pressure. Changing the settings such that your container runtime and -kubelet use `systemd` as the cgroup driver stabilized the system. +Two cgroup managers result in two views of the available and in-use resources in +the system. In some cases, nodes that are configured to use `cgroupfs` for the +kubelet and container runtime, but use `systemd` for the rest of the processes become +unstable under resource pressure. -Additionally, if your OS distribution is using [cgroupv2](/docs/concepts/architecture/cgroups), it is highly -recommended to use the `systemd` cgroup driver. +The approach to mitigate this instability is to use `systemd` as the cgroup driver for +the kubelet and the container runtime when systemd is the selected init system. -To set `systemd` as the cgroup driver edit the +To set `systemd` as the cgroup driver, edit the [`KubeletConfiguration`](/docs/tasks/administer-cluster/kubelet-config-file/) option of `cgroupDriver` and set it to `systemd`. For example: ```yaml apiVersion: kubelet.config.k8s.io/v1beta1 kind: KubeletConfiguration -... rest of config ... +... cgroupDriver: systemd ``` -If kubelet is configured with `systemd` as cgroupDriver, the container runtime -must also be configured to use the `systemd` as the cgroup driver. If using -containerd, it can be configured to use systemd cgroup driver as described -[here](#containerd-systemd). [CRI-O](#cri-o) already defaults to systemd cgroup -driver. For other container runtimes, refer to their specific documentation. +If you configure `systemd` as the cgroup driver for the kubelet, you must also +configure `systemd` as the cgroup driver for the container runtime. Refer to +the documentation for your container runtime for instructions. For example: + +* [containerd](#containerd-systemd) +* [CRI-O](#cri-o) {{< caution >}} Changing the cgroup driver of a Node that has joined a cluster is a sensitive operation. @@ -213,7 +208,7 @@ To use the `systemd` cgroup driver in `/etc/containerd/config.toml` with `runc`, SystemdCgroup = true ``` -`systemd` cgroup driver is recommended to set if using [cgroupv2](/docs/concepts/architecture/cgroups). +The `systemd` cgroup driver is recommended if you use [cgroup v2](/docs/concepts/architecture/cgroups). {{< note >}} If you installed containerd from a package (for example, RPM or `.deb`), you may find From b7ae62fc273f177d4da628085104bea798b8b79a Mon Sep 17 00:00:00 2001 From: Deep Debroy Date: Thu, 4 Aug 2022 07:19:20 -0700 Subject: [PATCH 44/77] Docs for PodHasNetwork condition Signed-off-by: Deep Debroy --- .../concepts/workloads/pods/pod-lifecycle.md | 39 ++++++++++++++++++- .../feature-gates.md | 8 ++-- 2 files changed, 43 insertions(+), 4 deletions(-) diff --git a/content/en/docs/concepts/workloads/pods/pod-lifecycle.md b/content/en/docs/concepts/workloads/pods/pod-lifecycle.md index 596d835d73f5e..fd7fc41a7bfe3 100644 --- a/content/en/docs/concepts/workloads/pods/pod-lifecycle.md +++ b/content/en/docs/concepts/workloads/pods/pod-lifecycle.md @@ -154,9 +154,12 @@ without any problems, the kubelet resets the restart backoff timer for that cont A Pod has a PodStatus, which has an array of [PodConditions](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#podcondition-v1-core) -through which the Pod has or has not passed: +through which the Pod has or has not passed. Kubelet manages the following +PodConditions: * `PodScheduled`: the Pod has been scheduled to a node. +* `PodHasNetwork`: (alpha feature; must be [enabled explicitly](#pod-has-network)) the + Pod sandbox has been successfully created and networking configured. * `ContainersReady`: all containers in the Pod are ready. * `Initialized`: all [init containers](/docs/concepts/workloads/pods/init-containers/) have completed successfully. @@ -231,6 +234,40 @@ when both the following statements apply: When a Pod's containers are Ready but at least one custom condition is missing or `False`, the kubelet sets the Pod's [condition](#pod-conditions) to `ContainersReady`. +### Pod network readiness {#pod-has-network} + +{{< feature-state for_k8s_version="v1.25" state="alpha" >}} + +After a Pod gets scheduled on a node, it needs to be admitted by the Kubelet and +have any volumes mounted. Once these phases are complete, the Kubelet works with +a container runtime (using {{< glossary_tooltip term_id="cri" >}}) to set up a +runtime sandbox and configure networking for the Pod. If the +`PodHasNetworkCondition` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled, +Kubelet reports whether a pod has reached this initialization milestone through +the `PodHasNetwork` condition in the `status.conditions` field of a Pod. + +The `PodHasNetwork` condition is set to `False` by the Kubelet when it detects a +Pod does not have a runtime sandbox with networking configured. This occurs in +the following scenarios: +* Early in the lifecycle of the Pod, when the kubelet has not yet begun to set up a sandbox for the Pod using the container runtime. +* Later in the lifecycle of the Pod, when the Pod sandbox has been destroyed due + to either: + * the node rebooting, without the Pod getting evicted + * for container runtimes that use virtual machines for isolation, the Pod + sandbox virtual machine rebooting, which then requires creating a new sandbox and fresh container network configuration. + +The `PodHasNetwork` condition is set to `True` by the kubelet after the +successful completion of sandbox creation and network configuration for the Pod +by the runtime plugin. The kubelet can start pulling container images and create +containers after `PodHasNetwork` condition has been set to `True`. + +For a Pod with init containers, the kubelet sets the `Initialized` condition to +`True` after the init containers have successfully completed (which happens +after successful sandbox creation and network configuration by the runtime +plugin). For a Pod without init containers, the kubelet sets the `Initialized` +condition to `True` before sandbox creation and network configuration starts. + + ## Container probes A _probe_ is a diagnostic diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 0771736f4fa4c..50369dcb9c59d 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -165,6 +165,7 @@ different Kubernetes components. | `PodAndContainerStatsFromCRI` | `false` | Alpha | 1.23 | | | `PodDeletionCost` | `false` | Alpha | 1.21 | 1.21 | | `PodDeletionCost` | `true` | Beta | 1.22 | | +| `PodHasNetworkCondition` | `false` | Alpha | 1.25 | | | `PodSecurity` | `false` | Alpha | 1.22 | 1.22 | | `PodSecurity` | `true` | Beta | 1.23 | | | `ProbeTerminationGracePeriod` | `false` | Alpha | 1.21 | 1.21 | @@ -1026,13 +1027,14 @@ Each feature gate is designed for enabling/disabling a specific feature: feature which allows users to influence ReplicaSet downscaling order. - `PersistentLocalVolumes`: Enable the usage of `local` volume type in Pods. Pod affinity has to be specified if requesting a `local` volume. -- `PodAndContainerStatsFromCRI`: Configure the kubelet to gather container and - pod stats from the CRI container runtime rather than gathering them from cAdvisor. -- `PodDisruptionBudget`: Enable the [PodDisruptionBudget](/docs/tasks/run-application/configure-pdb/) feature. - `PodAffinityNamespaceSelector`: Enable the [Pod Affinity Namespace Selector](/docs/concepts/scheduling-eviction/assign-pod-node/#namespace-selector) and [CrossNamespacePodAffinity](/docs/concepts/policy/resource-quotas/#cross-namespace-pod-affinity-quota) quota scope features. +- `PodAndContainerStatsFromCRI`: Configure the kubelet to gather container and + pod stats from the CRI container runtime rather than gathering them from cAdvisor. +- `PodDisruptionBudget`: Enable the [PodDisruptionBudget](/docs/tasks/run-application/configure-pdb/) feature. +- `PodHasNetworkCondition`: Enable the kubelet to mark the [PodHasNetwork](/docs/concepts/workloads/pods/pod-lifecycle/#pod-has-network) condition on pods. - `PodOverhead`: Enable the [PodOverhead](/docs/concepts/scheduling-eviction/pod-overhead/) feature to account for pod overheads. - `PodPriority`: Enable the descheduling and preemption of Pods based on their From 40a3c6810bdb3c2ebf56e89b5f4e81036c5b745c Mon Sep 17 00:00:00 2001 From: Antonio Ojea Date: Sun, 14 Aug 2022 12:04:17 +0200 Subject: [PATCH 45/77] Update content/en/docs/concepts/services-networking/service.md Co-authored-by: Qiming Teng --- content/en/docs/concepts/services-networking/service.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/content/en/docs/concepts/services-networking/service.md b/content/en/docs/concepts/services-networking/service.md index db0b32baa27b0..3f98ee2823b3f 100644 --- a/content/en/docs/concepts/services-networking/service.md +++ b/content/en/docs/concepts/services-networking/service.md @@ -1332,7 +1332,8 @@ This could result in a conflict if the internal allocator selects the same IP ad for another Service. The `ServiceIPStaticSubrange` -[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled by default in v1.25 and later, using an allocation strategy that divides the `ClusterIP` range into two bands, based on +[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled by default in v1.25 +and later, using an allocation strategy that divides the `ClusterIP` range into two bands, based on the size of the configured `service-cluster-ip-range` by using the following formula `min(max(16, cidrSize / 16), 256)`, described as _never less than 16 or more than 256, with a graduated step function between them_. Dynamic IP allocations will be preferentially From a63eeb780e507917ebdf2e9ac2dbe196f96a5e4b Mon Sep 17 00:00:00 2001 From: Tim Bannister Date: Sun, 14 Aug 2022 20:16:45 +0100 Subject: [PATCH 46/77] Fix bad merge committed in error --- .../setup-tools/kubeadm/implementation-details.md | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/content/en/docs/reference/setup-tools/kubeadm/implementation-details.md b/content/en/docs/reference/setup-tools/kubeadm/implementation-details.md index a8d0492e29294..e8d594e3db7f7 100644 --- a/content/en/docs/reference/setup-tools/kubeadm/implementation-details.md +++ b/content/en/docs/reference/setup-tools/kubeadm/implementation-details.md @@ -371,12 +371,7 @@ static Pod manifest file for creating a local etcd instance running in a Pod wit Please note that: -<<<<<<< HEAD -1. The etcd image will be pulled from `registry.k8s.io` by default. See [using custom images](/docs/reference/setup-tools/kubeadm/kubeadm-init/#custom-images) for customizing the image repository -2. in case of kubeadm is executed in the `--dry-run` mode, the etcd static Pod manifest is written in a temporary folder -3. Static Pod manifest generation for local etcd can be invoked individually with the [`kubeadm init phase etcd local`](/docs/reference/setup-tools/kubeadm/kubeadm-init-phase/#cmd-phase-etcd) command -======= -1. The etcd image will be pulled from `k8s.gcr.io` by default. See +1. The etcd container image will be pulled from `registry.gcr.io` by default. See [using custom images](/docs/reference/setup-tools/kubeadm/kubeadm-init/#custom-images) for customizing the image repository 2. In case of kubeadm is executed in the `--dry-run` mode, the etcd static Pod manifest is written @@ -384,7 +379,6 @@ Please note that: 3. Static Pod manifest generation for local etcd can be invoked individually with the [`kubeadm init phase etcd local`](/docs/reference/setup-tools/kubeadm/kubeadm-init-phase/#cmd-phase-etcd) command. ->>>>>>> upstream/main ### Wait for the control plane to come up From 6b3caddc717ec23cf59dde439015b1765560e1c9 Mon Sep 17 00:00:00 2001 From: Tim Bannister Date: Sun, 14 Aug 2022 20:17:54 +0100 Subject: [PATCH 47/77] =?UTF-8?q?Improve=20=E2=80=9CGenerate=20static=20Po?= =?UTF-8?q?d=20manifest=20for=20local=20etcd=E2=80=9C?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Extra tidying --- .../setup-tools/kubeadm/implementation-details.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/content/en/docs/reference/setup-tools/kubeadm/implementation-details.md b/content/en/docs/reference/setup-tools/kubeadm/implementation-details.md index e8d594e3db7f7..97db9829db537 100644 --- a/content/en/docs/reference/setup-tools/kubeadm/implementation-details.md +++ b/content/en/docs/reference/setup-tools/kubeadm/implementation-details.md @@ -362,7 +362,7 @@ The static Pod manifest for the scheduler is not affected by parameters provided ### Generate static Pod manifest for local etcd -If the user specified an external etcd this step will be skipped, otherwise kubeadm generates a +If you specified an external etcd this step will be skipped, otherwise kubeadm generates a static Pod manifest file for creating a local etcd instance running in a Pod with following attributes: - listen on `localhost:2379` and use `HostNetwork=true` @@ -373,10 +373,10 @@ Please note that: 1. The etcd container image will be pulled from `registry.gcr.io` by default. See [using custom images](/docs/reference/setup-tools/kubeadm/kubeadm-init/#custom-images) - for customizing the image repository -2. In case of kubeadm is executed in the `--dry-run` mode, the etcd static Pod manifest is written - in a temporary folder. -3. Static Pod manifest generation for local etcd can be invoked individually with the + for customizing the image repository. +2. If you run kubeadm in `--dry-run` mode, the etcd static Pod manifest is written + into a temporary folder. +3. You can directly invoke static Pod manifest generation for local etcd, using the [`kubeadm init phase etcd local`](/docs/reference/setup-tools/kubeadm/kubeadm-init-phase/#cmd-phase-etcd) command. From d0e8a08ab1c1d551b19f65faacd6b79c6ff3e2d0 Mon Sep 17 00:00:00 2001 From: kerthcet Date: Mon, 15 Aug 2022 10:37:52 +0800 Subject: [PATCH 48/77] update the documents of podTopologySpread Signed-off-by: kerthcet --- .../topology-spread-constraints.md | 25 +++++++++++-------- .../feature-gates.md | 4 +-- 2 files changed, 17 insertions(+), 12 deletions(-) diff --git a/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md b/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md index 77f4d1ea55362..4ac02385f3024 100644 --- a/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md +++ b/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md @@ -48,7 +48,8 @@ Pod topology spread constraints offer you a declarative way to configure that. ## `topologySpreadConstraints` field -The Pod API includes a field, `spec.topologySpreadConstraints`. Here is an example: +The Pod API includes a field, `spec.topologySpreadConstraints`. The usage of this field looks like +the following: ```yaml --- @@ -67,7 +68,8 @@ spec: ### other Pod fields go here ``` -You can read more about this field by running `kubectl explain Pod.spec.topologySpreadConstraints`. +You can read more about this field by running `kubectl explain Pod.spec.topologySpreadConstraints` or refer to the +documents in [scheduling](/docs/reference/kubernetes-api/workload-resources/pod-v1/#scheduling). ### Spread constraint definition @@ -82,9 +84,9 @@ your cluster. Those fields are: - if you select `whenUnsatisfiable: DoNotSchedule`, then `maxSkew` defines the maximum permitted difference between the number of matching pods in the target topology and the _global minimum_ - (the minimum number of pods that match the label selector in a topology domain). - For example, if you have 3 zones with 2, 4 and 5 matching pods respectively, - then the global minimum is 2 and `maxSkew` is compared relative to that number. + (the minimum number of matching pods in an eligible domain or zero if the number of eligible domains is less than MinDomains). + For example, if you have 3 zones with 2, 2 and 1 matching pods respectively, + `MaxSkew` is set to 1 then the global minimum is 1. - if you select `whenUnsatisfiable: ScheduleAnyway`, the scheduler gives higher precedence to topologies that would help reduce the skew. @@ -108,10 +110,13 @@ your cluster. Those fields are: `minDomains`, this value has no effect on scheduling. - If you do not specify `minDomains`, the constraint behaves as if `minDomains` is 1. -- **topologyKey** is the key of [node labels](#node-labels). If two Nodes are labelled - with this key and have identical values for that label, the scheduler treats both - Nodes as being in the same topology. The scheduler tries to place a balanced number - of Pods into each topology domain. +- **topologyKey** is the key of [node labels](#node-labels). Nodes that have a label with this key + and identical values are considered to be in the same topology. + We consider each as a "bucket", and try to put balanced number + of pods into each bucket. + We define a domain as a particular instance of a topology. + Also, we define an eligible domain as a domain whose nodes meet the requirements of + nodeAffinityPolicy and nodeTaintsPolicy. - **whenUnsatisfiable** indicates how to deal with a Pod if it doesn't satisfy the spread constraint: - `DoNotSchedule` (default) tells the scheduler not to schedule it. @@ -556,7 +561,7 @@ section of the enhancement proposal about Pod topology spread constraints. cluster. This could lead to a problem in autoscaled clusters, when a node pool (or node group) is scaled to zero nodes, and you're expecting the cluster to scale up, because, in this case, those topology domains won't be considered until there is - at least one node in them. + at least one node in them. You can work around this by using an cluster autoscaling tool that is aware of Pod topology spread constraints and is also aware of the overall set of topology domains. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 0771736f4fa4c..42843c1af9441 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -995,8 +995,8 @@ Each feature gate is designed for enabling/disabling a specific feature: NUMA topology. - `MemoryQoS`: Enable memory protection and usage throttle on pod / container using cgroup v2 memory controller. -- `MinDomainsInPodTopologySpread`: Enable `minDomains` in Pod - [topology spread constraints](/docs/concepts/scheduling-eviction/topology-spread-constraints/). +- `MinDomainsInPodTopologySpread`: Enable `minDomains` in + [Pod topology spread constraints](/docs/concepts/scheduling-eviction/topology-spread-constraints/). - `MixedProtocolLBService`: Enable using different protocols in the same `LoadBalancer` type Service instance. - `MountContainers`: Enable using utility containers on host as the volume mounter. From c47a02571329a46ec1b359f463ae42ba9a694415 Mon Sep 17 00:00:00 2001 From: Michal Wozniak Date: Mon, 15 Aug 2022 13:22:37 +0200 Subject: [PATCH 49/77] Add docs for KEP-3329 Retriable and non-retriable Pod failures for Jobs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Code review remarks and suggested commit updates are co-authored Co-authored-by: Tim Bannister Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com> Co-authored-by: Paola Cortés <51036950+cortespao@users.noreply.github.com> # Conflicts: # content/en/docs/reference/command-line-tools-reference/feature-gates.md --- .../concepts/workloads/controllers/job.md | 86 +++++++++++ .../concepts/workloads/pods/disruptions.md | 29 ++++ .../feature-gates.md | 4 + .../en/docs/tasks/job/pod-failure-policy.md | 139 ++++++++++++++++++ .../job-pod-failure-policy-example.yaml | 28 ++++ .../job-pod-failure-policy-failjob.yaml | 25 ++++ .../job-pod-failure-policy-ignore.yaml | 23 +++ 7 files changed, 334 insertions(+) create mode 100644 content/en/docs/tasks/job/pod-failure-policy.md create mode 100644 content/en/examples/controllers/job-pod-failure-policy-example.yaml create mode 100644 content/en/examples/controllers/job-pod-failure-policy-failjob.yaml create mode 100644 content/en/examples/controllers/job-pod-failure-policy-ignore.yaml diff --git a/content/en/docs/concepts/workloads/controllers/job.md b/content/en/docs/concepts/workloads/controllers/job.md index cb1b72fd03dfe..639a31136cbe0 100644 --- a/content/en/docs/concepts/workloads/controllers/job.md +++ b/content/en/docs/concepts/workloads/controllers/job.md @@ -695,6 +695,90 @@ The new Job itself will have a different uid from `a8f3d00d-c6d2-11e5-9f87-42010 `manualSelector: true` tells the system that you know what you are doing and to allow this mismatch. +### Pod failure policy {#pod-failure-policy} + +{{< feature-state for_k8s_version="v1.25" state="alpha" >}} + +{{< note >}} +You can only configure a Pod failure policy for a Job if you have the +`JobPodFailurePolicy` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) +enabled in your cluster. Additionally, it is recommended +to enable the `PodDisruptionsCondition` feature gate in order to be able to detect and handle +Pod disruption conditions in the Pod failure policy (see also: +[Pod disruption conditions](/docs/concepts/workloads/pods/disruptions#pod-disruption-conditions)). Both feature gates are +available in Kubernetes v1.25. +{{< /note >}} + +A Pod failure policy, defined with the `.spec.podFailurePolicy` field, enables +your cluster to handle Pod failures based on the container exit codes and the +Pod conditions. + +In some situations, you may want to have a better control when handling Pod +failures than the control provided by the default policy, which is based on the +Job's (`.spec.backoffLimit`](#pod-backoff-failure-policy)). These are some +examples of use cases: +* To optimize costs of running workloads by avoiding unnecessary Pod restarts, + you can terminate a Job as soon as one of its Pods fails with an exit code + indicating a software bug. +* To guarantee that your Job finishes even if there are disruptions, you can + ignore Pod failures caused by disruptions (such {{< glossary_tooltip text="preemption" term_id="preemption" >}}, + {{< glossary_tooltip text="API-initiated eviction" term_id="api-eviction" >}} + or {{< glossary_tooltip text="taint" term_id="taint" >}}-based eviction) so + that they don't count towards the `.spec.backoffLimit` limit of retries. + +You can configure a Pod failure policy, in the `.spec.podFailurePolicy` field, +to meet the above use cases. This policy can handle Pod failures based on the +container exit codes and the Pod conditions. + +Here is a manifest for a Job that defines a `podFailurePolicy`: + +{{< codenew file="/controllers/job-pod-failure-policy-example.yaml" >}} + +In the example above, the first rule of the Pod failure policy specifies that +the Job should be marked failed if the `main` container fails with the 42 exit +code. The following are the rules for the `main` container specifically: + +- an exit code of 0 means that the container succeeded +- an exit code of 42 means that the **entire Job** failed +- any other exit code represents that the container failed, and hence the entire + Pod. The Pod will be re-created if the total number of restarts is + below `backoffLimit`. If the `backoffLimit` is reached the **entire Job** failed. + +{{< note >}} +Because the Pod template specifies a `restartPolicy: Never`, +the kubelet does not restart the `main` container in that particular Pod. +{{< /note >}} + +The second rule of the Pod failure policy, specifying the `Ignore` action for +failed Pods with condition `DisruptionTarget` excludes Pod disruptions from +being counted towards the `.spec.backoffLimit` limit of retries. + +{{< note >}} +If the Job failed, either by the Pod failure policy or Pod backoff +failure policy, and the Job is running multiple Pods, Kubernetes terminates all +the Pods in that Job that are still Pending or Running. +{{< /note >}} + +These are some requirements and semantics of the API: +- if you want to use a `.spec.podFailurePolicy` field for a Job, you must + also define that Job's pod template with `.spec.restartPolicy` set to `Never`. +- the Pod failure policy rules you specify under `spec.podFailurePolicy.rules` + are evaluated in order. Once a rule matches a Pod failure, the remaining rules + are ignored. When no rule matches the Pod failure, the default + handling applies. +- you may want to restrict a rule to a specific container by specifing its name + in`spec.podFailurePolicy.rules[*].containerName`. When not specified the rule + applies to all containers. When specified, it should match one the container + or `initContainer` names in the Pod template. +- you may specify the action taken when a Pod failure policy is matched by + `spec.podFailurePolicy.rules[*].action`. Possible values are: + - `FailJob`: use to indicate that the Pod's job should be marked as Failed and + all running Pods should be terminated. + - `Ignore`: use to indicate that the counter towards the `.spec.backoffLimit` + should not be incremented and a replacement Pod should be created. + - `Count`: use to indicate that the Pod should be handled in the default way. + The counter towards the `.spec.backoffLimit` should be incremented. + ### Job tracking with finalizers {{< feature-state for_k8s_version="v1.23" state="beta" >}} @@ -783,3 +867,5 @@ object, but maintains complete control over what Pods are created and how work i * Read about [`CronJob`](/docs/concepts/workloads/controllers/cron-jobs/), which you can use to define a series of Jobs that will run based on a schedule, similar to the UNIX tool `cron`. +* Practice how to configure handling of retriable and non-retriable pod failures + using `podFailurePolicy`, based on the step-by-step [examples](/docs/tasks/job/pod-failure-policy/). diff --git a/content/en/docs/concepts/workloads/pods/disruptions.md b/content/en/docs/concepts/workloads/pods/disruptions.md index 055fc0a65d160..a9e1a93e0d608 100644 --- a/content/en/docs/concepts/workloads/pods/disruptions.md +++ b/content/en/docs/concepts/workloads/pods/disruptions.md @@ -227,6 +227,35 @@ can happen, according to: - the type of controller - the cluster's resource capacity +## Pod disruption conditions {#pod-disruption-conditions} + +{{< feature-state for_k8s_version="v1.25" state="alpha" >}} + +{{< note >}} +In order to use this behavior, you must enable `PodDisruptionsCondition` +[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) +in your cluster. +{{< /note >}} + +When enabled, a dedicated Pod `DisruptionTarget` condition is added to indicate +an imminent disruption of a Pod. The `reason` field of the condition additionally +indicates one of the following reasons for the Pod termination: +- `PreemptionByKubeScheduler`: Pod preempted by kube-scheduler to accommodate a Pod with higher priority. For more information, see [Pod priority preemption](/docs/concepts/scheduling-eviction/pod-priority-preemption/). +- `DeletionByTaintManager`: Pod deleted by taint manager due to NoExecute taint, see more [here](/docs/concepts/scheduling-eviction/taint-and-toleration/#taint-based-evictions). +- `EvictionByEvictionAPI`: Pod evicted by [Eviction API](/docs/concepts/scheduling-eviction/api-eviction/). +- `DeletionByPodGC`: an orphaned Pod deleted by [PodGC](/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection). + +{{< note >}} +A Pod disruption might be interrupted. The control plane might re-attempt to +continue the disruption of the same Pod, but it is not guaranteed. As a result, +the `DisruptionTarget` condition might be added to Pod, but the Pod might not be +deleted. In such a situation, after some time, the +Pod disruption condition will be cleared. +{{< /note >}} + +When using a Job, you may want to use these Pod disruption conditions you defined in your +[Pod failure policy](/docs/concepts/workloads/controllers/job#pod-failure-policy). + ## Separating Cluster Owner and Application Owner Roles Often, it is useful to think of the Cluster Manager diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index dfcee748bd027..e73cc6d51b7a1 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -380,6 +380,7 @@ different Kubernetes components. | `IngressClassNamespacedParams` | `true` | GA | 1.23 | - | | `Initializers` | `false` | Alpha | 1.7 | 1.13 | | `Initializers` | - | Deprecated | 1.14 | - | +| `JobPodFailurePolicy` | `false` | Alpha | 1.25 | - | | `KubeletConfigFile` | `false` | Alpha | 1.8 | 1.9 | | `KubeletConfigFile` | - | Deprecated | 1.10 | - | | `KubeletPluginsWatcher` | `false` | Alpha | 1.11 | 1.11 | @@ -416,6 +417,7 @@ different Kubernetes components. | `PodDisruptionBudget` | `false` | Alpha | 1.3 | 1.4 | | `PodDisruptionBudget` | `true` | Beta | 1.5 | 1.20 | | `PodDisruptionBudget` | `true` | GA | 1.21 | - | +| `PodDisruptionConditions` | `false` | Alpha | 1.25 | - | | `PodOverhead` | `false` | Alpha | 1.16 | 1.17 | | `PodOverhead` | `true` | Beta | 1.18 | 1.23 | | `PodOverhead` | `true` | GA | 1.24 | - | @@ -947,6 +949,7 @@ Each feature gate is designed for enabling/disabling a specific feature: support for IPv6. - `JobMutableNodeSchedulingDirectives`: Allows updating node scheduling directives in the pod template of [Job](/docs/concepts/workloads/controllers/job). +- `JobPodFailurePolicy`: Allow users to specify handling of pod failures based on container exit codes and pod conditions. - `JobReadyPods`: Enables tracking the number of Pods that have a `Ready` [condition](/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions). The count of `Ready` pods is recorded in the @@ -1039,6 +1042,7 @@ Each feature gate is designed for enabling/disabling a specific feature: - `PodAndContainerStatsFromCRI`: Configure the kubelet to gather container and pod stats from the CRI container runtime rather than gathering them from cAdvisor. - `PodDisruptionBudget`: Enable the [PodDisruptionBudget](/docs/tasks/run-application/configure-pdb/) feature. +- `PodDisruptionConditions`: Enables support for appending a dedicated pod condition indicating that the pod is being deleted due to a disruption. - `PodHasNetworkCondition`: Enable the kubelet to mark the [PodHasNetwork](/docs/concepts/workloads/pods/pod-lifecycle/#pod-has-network) condition on pods. - `PodOverhead`: Enable the [PodOverhead](/docs/concepts/scheduling-eviction/pod-overhead/) feature to account for pod overheads. diff --git a/content/en/docs/tasks/job/pod-failure-policy.md b/content/en/docs/tasks/job/pod-failure-policy.md new file mode 100644 index 0000000000000..3ba337fcea3bd --- /dev/null +++ b/content/en/docs/tasks/job/pod-failure-policy.md @@ -0,0 +1,139 @@ +--- +title: Handling retriable and non-retriable pod failures with Pod failure policy +content_type: task +min-kubernetes-server-version: v1.25 +weight: 60 +--- + +{{< feature-state for_k8s_version="v1.25" state="alpha" >}} + + + +This document shows you how to use the +[Pod failure policy](/docs/concepts/workloads/controllers/job#pod-failure-policy), +in combination with the default +[Pod backoff failure policy](/docs/concepts/workloads/controllers/job#pod-backoff-failure-policy), +to improve the control over the handling of container- or Pod-level failure +within a {{}}. + +The definition of Pod failure policy may help you to better utilize the computational +resources by avoiding unnecessary Pod retries. This policy also lets you avoid Job +failures due to Pod disruptions (such {{< glossary_tooltip text="preemption" term_id="preemption" >}}, +{{< glossary_tooltip text="API-initiated eviction" term_id="api-eviction" >}} +or {{< glossary_tooltip text="taint" term_id="taint" >}}-based eviction). + +## {{% heading "prerequisites" %}} + +You should already be familiar with the basic use of [Job](/docs/concepts/workloads/controllers/job/). + +{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}} + + + +{{< note >}} +As the features are in Alpha, prepare the Kubernetes cluster with the two +[feature gates](/docs/reference/command-line-tools-reference/feature-gates/) +enabled: `JobPodFailurePolicy` and `PodDisruptionsCondition`. +{{< /note >}} + +## Using Pod failure policy to avoid unnecessary Pod retries + +With the following example, you can learn how to use Pod failure policy to +avoid unnecessary Pod restarts when a Pod failure indicates a non-retriable +software bug. + +First, create a Job based on the config: + +{{< codenew file="/controllers/job-pod-failure-policy-failjob.yaml" >}} + +by running: + +```sh +kubectl create -f job-pod-failure-policy-failjob.yaml +``` + +After around 30s the entire Job should be terminated. Inspect the status of the Job by running: +```sh +kubectl get jobs -l job-name=job-pod-failure-policy-failjob -o yaml +``` + +In the Job status, see a job `Failed` condition with the field `reason` +equal `PodFailurePolicy`. Additionally, the `message` field contains a +more detailed information about the Job termination, such as: +`Container main for pod default/job-pod-failure-policy-failjob-8ckj8 failed with exit code 42 matching FailJob rule at index 0`. + +For comparison, if the Pod failure policy was disabled it would take 6 retries +of the Pod, taking at least 2 minutes. + +### Clean up + +Delete the Job you created: +```sh +kubectl delete jobs/job-pod-failure-policy-failjob +``` +The cluster automatically cleans up the Pods. + +## Using Pod failure policy to ignore Pod disruptions + +With the following example, you can learn how to use Pod failure policy to +ignore Pod disruptions from incrementing the Pod retry counter towards the +`.spec.backoffLimit` limit. + +{{< caution >}} +Timing is important for this example, so you may want to read the steps before +execution. In order to trigger a Pod disruption it is important to drain the +node while the Pod is running on it (within 90s since the Pod is scheduled). +{{< /caution >}} + +1. Create a Job based on the config: + +{{< codenew file="/controllers/job-pod-failure-policy-ignore.yaml" >}} + +by running: + +```sh +kubectl create -f job-pod-failure-policy-ignore.yaml +``` + +2. Run this command to check the `nodeName` the Pod is scheduled to: + +```sh +nodeName=$(kubectl get pods -l job-name=job-pod-failure-policy-ignore -o jsonpath='{.items[0].spec.nodeName}') +``` + +3. Drain the node to evict the Pod before it completes (within 90s): +```sh +kubectl drain nodes/$nodeName --ignore-daemonsets --grace-period=0 +``` + +4. Inspect the `.status.failed` to check the counter for the Job is not incremented: +```sh +kubectl get jobs -l job-name=job-pod-failure-policy-ignore -o yaml +``` + +5. Uncordon the node: +```sh +kubectl uncordon nodes/$nodeName +``` + +The Job resumes and succeeds. + +For comparison, if the Pod failure policy was disabled the Pod disruption would +result in terminating the entire Job (as the `.spec.backoffLimit` is set to 0). + +### Cleaning up + +Delete the Job you created: +```sh +kubectl delete jobs/job-pod-failure-policy-ignore +``` +The cluster automatically cleans up the Pods. + +## Alternatives + +You could rely solely on the +[Pod backoff failure policy](/docs/concepts/workloads/controllers/job#pod-backoff-failure-policy), +by specifying the Job's `.spec.backoffLimit` field. However, in many situations +it is problematic to find a balance between setting the a low value for `.spec.backoffLimit` + to avoid unnecessary Pod retries, yet high enough to make sure the Job would +not be terminated by Pod disruptions. diff --git a/content/en/examples/controllers/job-pod-failure-policy-example.yaml b/content/en/examples/controllers/job-pod-failure-policy-example.yaml new file mode 100644 index 0000000000000..f75d4d6bb14cf --- /dev/null +++ b/content/en/examples/controllers/job-pod-failure-policy-example.yaml @@ -0,0 +1,28 @@ +apiVersion: batch/v1 +kind: Job +metadata: + name: job-pod-failure-policy-example +spec: + completions: 12 + parallelism: 3 + template: + spec: + restartPolicy: Never + containers: + - name: main + image: docker.io/library/bash:5 + command: ["bash"] # example command simulating a bug which triggers the FailJob action + args: + - -c + - echo "Hello world!" && sleep 5 && exit 42 + backoffLimit: 6 + podFailurePolicy: + rules: + - action: FailJob + onExitCodes: + containerName: main # optional + operator: In # one of: In, NotIn + values: [42] + - action: Ignore # one of: Ignore, FailJob, Count + onPodConditions: + - type: DisruptionTarget # indicates Pod disruption diff --git a/content/en/examples/controllers/job-pod-failure-policy-failjob.yaml b/content/en/examples/controllers/job-pod-failure-policy-failjob.yaml new file mode 100644 index 0000000000000..a83abe84c1c5a --- /dev/null +++ b/content/en/examples/controllers/job-pod-failure-policy-failjob.yaml @@ -0,0 +1,25 @@ +apiVersion: batch/v1 +kind: Job +metadata: + name: job-pod-failure-policy-failjob +spec: + completions: 8 + parallelism: 2 + template: + spec: + restartPolicy: Never + containers: + - name: main + image: docker.io/library/bash:5 + command: ["bash"] + args: + - -c + - echo "Hello world! I'm going to exit with 42 to simulate a software bug." && sleep 30 && exit 42 + backoffLimit: 6 + podFailurePolicy: + rules: + - action: FailJob + onExitCodes: + containerName: main + operator: In + values: [42] diff --git a/content/en/examples/controllers/job-pod-failure-policy-ignore.yaml b/content/en/examples/controllers/job-pod-failure-policy-ignore.yaml new file mode 100644 index 0000000000000..9747644ff2cf2 --- /dev/null +++ b/content/en/examples/controllers/job-pod-failure-policy-ignore.yaml @@ -0,0 +1,23 @@ +apiVersion: batch/v1 +kind: Job +metadata: + name: job-pod-failure-policy-ignore +spec: + completions: 4 + parallelism: 2 + template: + spec: + restartPolicy: Never + containers: + - name: main + image: docker.io/library/bash:5 + command: ["bash"] + args: + - -c + - echo "Hello world! I'm going to exit with 0 (success)." && sleep 90 && exit 0 + backoffLimit: 0 + podFailurePolicy: + rules: + - action: Ignore + onPodConditions: + - type: DisruptionTarget From 449ef99fe393b222e68504e19a9600f3e6a34e88 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Wo=C5=BAniak?= Date: Fri, 12 Aug 2022 20:18:51 +0200 Subject: [PATCH 50/77] Update content/en/docs/tasks/job/pod-failure-policy.md Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com> Co-authored-by: Tim Bannister --- .../concepts/workloads/pods/disruptions.md | 27 ++++++++++++------- .../en/docs/tasks/job/pod-failure-policy.md | 10 +++---- 2 files changed, 23 insertions(+), 14 deletions(-) diff --git a/content/en/docs/concepts/workloads/pods/disruptions.md b/content/en/docs/concepts/workloads/pods/disruptions.md index a9e1a93e0d608..09122df8fa59b 100644 --- a/content/en/docs/concepts/workloads/pods/disruptions.md +++ b/content/en/docs/concepts/workloads/pods/disruptions.md @@ -232,28 +232,37 @@ can happen, according to: {{< feature-state for_k8s_version="v1.25" state="alpha" >}} {{< note >}} -In order to use this behavior, you must enable `PodDisruptionsCondition` +In order to use this behavior, you must enable the `PodDisruptionsCondition` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) in your cluster. {{< /note >}} -When enabled, a dedicated Pod `DisruptionTarget` condition is added to indicate -an imminent disruption of a Pod. The `reason` field of the condition additionally +When enabled, a dedicated Pod `DisruptionTarget` [condition](/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions) is added to indicate +that the Pod is about to be deleted due to a {{}}. +The `reason` field of the condition additionally indicates one of the following reasons for the Pod termination: -- `PreemptionByKubeScheduler`: Pod preempted by kube-scheduler to accommodate a Pod with higher priority. For more information, see [Pod priority preemption](/docs/concepts/scheduling-eviction/pod-priority-preemption/). -- `DeletionByTaintManager`: Pod deleted by taint manager due to NoExecute taint, see more [here](/docs/concepts/scheduling-eviction/taint-and-toleration/#taint-based-evictions). -- `EvictionByEvictionAPI`: Pod evicted by [Eviction API](/docs/concepts/scheduling-eviction/api-eviction/). -- `DeletionByPodGC`: an orphaned Pod deleted by [PodGC](/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection). + +`PreemptionByKubeScheduler` +: Pod has been {{}} by a scheduler in order to accommodate a new Pod with a higher priority. For more information, see [Pod priority preemption](/docs/concepts/scheduling-eviction/pod-priority-preemption/). + +`DeletionByTaintManager` +: Pod is due to be deleted by Taint Manager due to to a `NoExecute` taint that the Pod does not tolerate; see {{}}-based evictions. + +`EvictionByEvictionAPI` +: Pod has been marked for {{}} . + +`DeletionByPodGC` +: an orphaned Pod deleted by [Pod garbage collection](/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection). {{< note >}} A Pod disruption might be interrupted. The control plane might re-attempt to continue the disruption of the same Pod, but it is not guaranteed. As a result, -the `DisruptionTarget` condition might be added to Pod, but the Pod might not be +the `DisruptionTarget` condition might be added to a Pod, but that Pod might then not actually be deleted. In such a situation, after some time, the Pod disruption condition will be cleared. {{< /note >}} -When using a Job, you may want to use these Pod disruption conditions you defined in your +When using a Job (or CronJob), you may want to use these Pod disruption conditions as part of your Job's [Pod failure policy](/docs/concepts/workloads/controllers/job#pod-failure-policy). ## Separating Cluster Owner and Application Owner Roles diff --git a/content/en/docs/tasks/job/pod-failure-policy.md b/content/en/docs/tasks/job/pod-failure-policy.md index 3ba337fcea3bd..f6243f73ef947 100644 --- a/content/en/docs/tasks/job/pod-failure-policy.md +++ b/content/en/docs/tasks/job/pod-failure-policy.md @@ -16,11 +16,11 @@ in combination with the default to improve the control over the handling of container- or Pod-level failure within a {{}}. -The definition of Pod failure policy may help you to better utilize the computational -resources by avoiding unnecessary Pod retries. This policy also lets you avoid Job -failures due to Pod disruptions (such {{< glossary_tooltip text="preemption" term_id="preemption" >}}, -{{< glossary_tooltip text="API-initiated eviction" term_id="api-eviction" >}} -or {{< glossary_tooltip text="taint" term_id="taint" >}}-based eviction). +The definition of Pod failure policy may help you to: +* better utilize the computational resources by avoiding unnecessary Pod retries. +* avoid Job failures due to Pod disruptions (such {{}}, +{{}} +or {{}}-based eviction). ## {{% heading "prerequisites" %}} From 391ed0294a2e69485e43626ff62c0adcaa993ebf Mon Sep 17 00:00:00 2001 From: kerthcet Date: Mon, 15 Aug 2022 20:13:58 +0800 Subject: [PATCH 51/77] revert the chang of remove newline Signed-off-by: kerthcet --- .../scheduling-eviction/topology-spread-constraints.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md b/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md index 4ac02385f3024..8413f68b91297 100644 --- a/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md +++ b/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md @@ -68,8 +68,8 @@ spec: ### other Pod fields go here ``` -You can read more about this field by running `kubectl explain Pod.spec.topologySpreadConstraints` or refer to the -documents in [scheduling](/docs/reference/kubernetes-api/workload-resources/pod-v1/#scheduling). +You can read more about this field by running `kubectl explain Pod.spec.topologySpreadConstraints` or +refer to [scheduling](/docs/reference/kubernetes-api/workload-resources/pod-v1/#scheduling) section of the API reference for Pod. ### Spread constraint definition @@ -562,6 +562,7 @@ section of the enhancement proposal about Pod topology spread constraints. node group) is scaled to zero nodes, and you're expecting the cluster to scale up, because, in this case, those topology domains won't be considered until there is at least one node in them. + You can work around this by using an cluster autoscaling tool that is aware of Pod topology spread constraints and is also aware of the overall set of topology domains. From 08a5a5bbc4b12526dc4804b30ac4f240230044d8 Mon Sep 17 00:00:00 2001 From: Rey Lejano Date: Mon, 15 Aug 2022 08:43:04 -0700 Subject: [PATCH 52/77] Update content/en/docs/reference/command-line-tools-reference/feature-gates.md Co-authored-by: Ricardo Katz --- .../reference/command-line-tools-reference/feature-gates.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 7914a07bf1872..d483298cff193 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -394,7 +394,7 @@ different Kubernetes components. | `NamespaceDefaultLabelName` | `true` | Beta | 1.21 | 1.21 | | `NamespaceDefaultLabelName` | `true` | GA | 1.22 | - | | `NetworkPolicyEndPort` | `false` | Alpha | 1.21 | 1.21 | -| `NetworkPolicyEndPort` | `true` | Beta | 1.22 | | +| `NetworkPolicyEndPort` | `true` | Beta | 1.22 | 1.24 | | `NetworkPolicyEndPort` | `true` | GA | 1.25 | - | | `NodeDisruptionExclusion` | `false` | Alpha | 1.16 | 1.18 | | `NodeDisruptionExclusion` | `true` | Beta | 1.19 | 1.20 | From d0a4713d18afc643dc80d4d216cd7db5ccaaa183 Mon Sep 17 00:00:00 2001 From: jinxu Date: Mon, 15 Aug 2022 13:10:17 -0700 Subject: [PATCH 53/77] Update LocalStorageCapacityIsolation GA update feature GA --- .../configuration/manage-resources-containers.md | 9 ++------- .../command-line-tools-reference/feature-gates.md | 5 +++-- 2 files changed, 5 insertions(+), 9 deletions(-) diff --git a/content/en/docs/concepts/configuration/manage-resources-containers.md b/content/en/docs/concepts/configuration/manage-resources-containers.md index 9428da09e3124..ab73213cb26f6 100644 --- a/content/en/docs/concepts/configuration/manage-resources-containers.md +++ b/content/en/docs/concepts/configuration/manage-resources-containers.md @@ -236,7 +236,7 @@ directly or from your monitoring tools. ## Local ephemeral storage -{{< feature-state for_k8s_version="v1.10" state="beta" >}} +{{< feature-state for_k8s_version="v1.25" state="stable" >}} Nodes have local ephemeral storage, backed by locally-attached writeable devices or, sometimes, by RAM. @@ -306,12 +306,7 @@ as you like. {{< /tabs >}} The kubelet can measure how much local storage it is using. It does this provided -that: - -- the `LocalStorageCapacityIsolation` - [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) - is enabled (the feature is on by default), and -- you have set up the node using one of the supported configurations +that you have set up the node using one of the supported configurations for local ephemeral storage. If you have a different configuration, then the kubelet does not apply resource diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index ab0900d7edb56..9360adecaaca9 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -137,8 +137,6 @@ different Kubernetes components. | `KubeletPodResources` | `true` | Beta | 1.15 | | | `KubeletPodResourcesGetAllocatable` | `false` | Alpha | 1.21 | 1.22 | | `KubeletPodResourcesGetAllocatable` | `true` | Beta | 1.23 | | -| `LocalStorageCapacityIsolation` | `false` | Alpha | 1.7 | 1.9 | -| `LocalStorageCapacityIsolation` | `true` | Beta | 1.10 | | | `LocalStorageCapacityIsolationFSQuotaMonitoring` | `false` | Alpha | 1.15 | | | `LogarithmicScaleDown` | `false` | Alpha | 1.21 | 1.21 | | `LogarithmicScaleDown` | `true` | Beta | 1.22 | | @@ -387,6 +385,9 @@ different Kubernetes components. | `LegacyNodeRoleBehavior` | `true` | Beta | 1.19 | 1.20 | | `LegacyNodeRoleBehavior` | `false` | GA | 1.21 | - | | `LegacyServiceAccountTokenNoAutoGeneration` | `true` | Beta | 1.24 | | +| `LocalStorageCapacityIsolation` | `false` | Alpha | 1.7 | 1.9 | +| `LocalStorageCapacityIsolation` | `true` | Beta | 1.10 | 1.24 | +| `LocalStorageCapacityIsolation` | `true` | GA | 1.25 | - | | `MountContainers` | `false` | Alpha | 1.9 | 1.16 | | `MountContainers` | `false` | Deprecated | 1.17 | - | | `MountPropagation` | `false` | Alpha | 1.8 | 1.9 | From b167938367223810880c815bddf2c68c8cf03d3e Mon Sep 17 00:00:00 2001 From: Jordan Liggitt Date: Thu, 5 May 2022 11:10:28 -0400 Subject: [PATCH 54/77] Scrub PSP docs for 1.25 --- .../concepts/security/pod-security-policy.md | 775 +----------------- .../security/pod-security-standards.md | 6 - .../docs/contribute/style/write-new-topic.md | 4 +- .../admission-controllers.md | 19 +- .../access-authn-authz/authorization.md | 2 - .../psp-to-pod-security-standards.md | 2 +- .../reference/glossary/pod-security-policy.md | 3 +- .../administer-cluster/sysctl-cluster.md | 52 -- .../security-context.md | 2 +- .../configure-pod-container/static-pod.md | 2 +- .../en/docs/tutorials/security/apparmor.md | 45 +- 11 files changed, 26 insertions(+), 886 deletions(-) diff --git a/content/en/docs/concepts/security/pod-security-policy.md b/content/en/docs/concepts/security/pod-security-policy.md index c296f31cc2c74..5b4b692a42e4d 100644 --- a/content/en/docs/concepts/security/pod-security-policy.md +++ b/content/en/docs/concepts/security/pod-security-policy.md @@ -1,6 +1,6 @@ --- reviewers: -- pweil- +- liggitt - tallclair title: Pod Security Policies content_type: concept @@ -11,770 +11,19 @@ weight: 30 {{< feature-state for_k8s_version="v1.21" state="deprecated" >}} -{{< caution >}} -PodSecurityPolicy is deprecated as of Kubernetes v1.21, and **will be removed in v1.25**. We recommend migrating to -[Pod Security Admission](/docs/concepts/security/pod-security-admission/), or a 3rd party admission plugin. -For a migration guide, see [Migrate from PodSecurityPolicy to the Built-In PodSecurity Admission Controller](/docs/tasks/configure-pod-container/migrate-from-psp/). -For more information on the deprecation, -see [PodSecurityPolicy Deprecation: Past, Present, and Future](/blog/2021/04/06/podsecuritypolicy-deprecation-past-present-and-future/). -{{< /caution >}} - -Pod Security Policies enable fine-grained authorization of pod creation and -updates. - - - -## What is a Pod Security Policy? - -A _Pod Security Policy_ is a cluster-level resource that controls security -sensitive aspects of the pod specification. The -[PodSecurityPolicy](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#podsecuritypolicy-v1beta1-policy) objects -define a set of conditions that a pod must run with in order to be accepted into -the system, as well as defaults for the related fields. They allow an -administrator to control the following: - -| Control Aspect | Field Names | -| ----------------------------------------------------| ------------------------------------------- | -| Running of privileged containers | [`privileged`](#privileged) | -| Usage of host namespaces | [`hostPID`, `hostIPC`](#host-namespaces) | -| Usage of host networking and ports | [`hostNetwork`, `hostPorts`](#host-namespaces) | -| Usage of volume types | [`volumes`](#volumes-and-file-systems) | -| Usage of the host filesystem | [`allowedHostPaths`](#volumes-and-file-systems) | -| Allow specific FlexVolume drivers | [`allowedFlexVolumes`](#flexvolume-drivers) | -| Allocating an FSGroup that owns the pod's volumes | [`fsGroup`](#volumes-and-file-systems) | -| Requiring the use of a read only root file system | [`readOnlyRootFilesystem`](#volumes-and-file-systems) | -| The user and group IDs of the container | [`runAsUser`, `runAsGroup`, `supplementalGroups`](#users-and-groups) | -| Restricting escalation to root privileges | [`allowPrivilegeEscalation`, `defaultAllowPrivilegeEscalation`](#privilege-escalation) | -| Linux capabilities | [`defaultAddCapabilities`, `requiredDropCapabilities`, `allowedCapabilities`](#capabilities) | -| The SELinux context of the container | [`seLinux`](#selinux) | -| The Allowed Proc Mount types for the container | [`allowedProcMountTypes`](#allowedprocmounttypes) | -| The AppArmor profile used by containers | [annotations](#apparmor) | -| The seccomp profile used by containers | [annotations](#seccomp) | -| The sysctl profile used by containers | [`forbiddenSysctls`,`allowedUnsafeSysctls`](#sysctl) | - - -## Enabling Pod Security Policies - -Pod security policy control is implemented as an optional -[admission controller](/docs/reference/access-authn-authz/admission-controllers/#podsecuritypolicy). -PodSecurityPolicies are enforced by -[enabling the admission controller](/docs/reference/access-authn-authz/admission-controllers/#how-do-i-turn-on-an-admission-control-plug-in), -but doing so without authorizing any policies **will prevent any pods from being created** in the -cluster. - -Since the pod security policy API (`policy/v1beta1/podsecuritypolicy`) is -enabled independently of the admission controller, for existing clusters it is -recommended that policies are added and authorized before enabling the admission -controller. - -## Authorizing Policies - -When a PodSecurityPolicy resource is created, it does nothing. In order to use -it, the requesting user or target pod's -[service account](/docs/tasks/configure-pod-container/configure-service-account/) -must be authorized to use the policy, by allowing the `use` verb on the policy. - -Most Kubernetes pods are not created directly by users. Instead, they are -typically created indirectly as part of a -[Deployment](/docs/concepts/workloads/controllers/deployment/), -[ReplicaSet](/docs/concepts/workloads/controllers/replicaset/), or other -templated controller via the controller manager. Granting the controller access -to the policy would grant access for *all* pods created by that controller, -so the preferred method for authorizing policies is to grant access to the -pod's service account (see [example](#run-another-pod)). - -### Via RBAC - -[RBAC](/docs/reference/access-authn-authz/rbac/) is a standard Kubernetes -authorization mode, and can easily be used to authorize use of policies. - -First, a `Role` or `ClusterRole` needs to grant access to `use` the desired -policies. The rules to grant access look like this: - -```yaml -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - name: -rules: -- apiGroups: ['policy'] - resources: ['podsecuritypolicies'] - verbs: ['use'] - resourceNames: - - -``` - -Then the `(Cluster)Role` is bound to the authorized user(s): - -```yaml -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRoleBinding -metadata: - name: -roleRef: - kind: ClusterRole - name: - apiGroup: rbac.authorization.k8s.io -subjects: -# Authorize all service accounts in a namespace (recommended): -- kind: Group - apiGroup: rbac.authorization.k8s.io - name: system:serviceaccounts: -# Authorize specific service accounts (not recommended): -- kind: ServiceAccount - name: - namespace: -# Authorize specific users (not recommended): -- kind: User - apiGroup: rbac.authorization.k8s.io - name: -``` - -If a `RoleBinding` (not a `ClusterRoleBinding`) is used, it will only grant -usage for pods being run in the same namespace as the binding. This can be -paired with system groups to grant access to all pods run in the namespace: - -```yaml -# Authorize all service accounts in a namespace: -- kind: Group - apiGroup: rbac.authorization.k8s.io - name: system:serviceaccounts -# Or equivalently, all authenticated users in a namespace: -- kind: Group - apiGroup: rbac.authorization.k8s.io - name: system:authenticated -``` - -For more examples of RBAC bindings, see -[RoleBinding examples](/docs/reference/access-authn-authz/rbac#role-binding-examples). -For a complete example of authorizing a PodSecurityPolicy, see [below](#example). - -### Recommended Practice - -PodSecurityPolicy is being replaced by a new, simplified `PodSecurity` -{{< glossary_tooltip text="admission controller" term_id="admission-controller" >}}. -For more details on this change, see -[PodSecurityPolicy Deprecation: Past, Present, and Future](/blog/2021/04/06/podsecuritypolicy-deprecation-past-present-and-future/). -Follow these guidelines to simplify migration from PodSecurityPolicy to the -new admission controller: - -1. Limit your PodSecurityPolicies to the policies defined by the - [Pod Security Standards](/docs/concepts/security/pod-security-standards): - - - {{< example file="policy/privileged-psp.yaml" >}}Privileged{{< /example >}} - - {{< example file="policy/baseline-psp.yaml" >}}Baseline{{< /example >}} - - {{< example file="policy/restricted-psp.yaml" >}}Restricted{{< /example >}} - -1. Only bind PSPs to entire namespaces, by using the `system:serviceaccounts:` group - (where `` is the target namespace). For example: - - ```yaml - apiVersion: rbac.authorization.k8s.io/v1 - # This cluster role binding allows all pods in the "development" namespace to use the baseline PSP. - kind: ClusterRoleBinding - metadata: - name: psp-baseline-namespaces - roleRef: - kind: ClusterRole - name: psp-baseline - apiGroup: rbac.authorization.k8s.io - subjects: - - kind: Group - name: system:serviceaccounts:development - apiGroup: rbac.authorization.k8s.io - - kind: Group - name: system:serviceaccounts:canary - apiGroup: rbac.authorization.k8s.io - ``` - -### Troubleshooting - -- The [controller manager](/docs/reference/command-line-tools-reference/kube-controller-manager/) - must be run against the secured API port and must not have superuser permissions. See - [Controlling Access to the Kubernetes API](/docs/concepts/security/controlling-access) - to learn about API server access controls. - If the controller manager connected through the trusted API port (also known as the - `localhost` listener), requests would bypass authentication and authorization modules; - all PodSecurityPolicy objects would be allowed, and users would be able to create grant - themselves the ability to create privileged containers. - - For more details on configuring controller manager authorization, see - [Controller Roles](/docs/reference/access-authn-authz/rbac/#controller-roles). - -## Policy Order - -In addition to restricting pod creation and update, pod security policies can -also be used to provide default values for many of the fields that it -controls. When multiple policies are available, the pod security policy -controller selects policies according to the following criteria: - -1. PodSecurityPolicies which allow the pod as-is, without changing defaults or - mutating the pod, are preferred. The order of these non-mutating - PodSecurityPolicies doesn't matter. -2. If the pod must be defaulted or mutated, the first PodSecurityPolicy - (ordered by name) to allow the pod is selected. - -When a Pod is validated against a PodSecurityPolicy, [a `kubernetes.io/psp` annotation](/docs/reference/labels-annotations-taints/#kubernetes-io-psp) -is added to the Pod, with the name of the PodSecurityPolicy as the annotation value. - {{< note >}} -During update operations (during which mutations to pod specs are disallowed) -only non-mutating PodSecurityPolicies are used to validate the pod. -{{< /note >}} - -## Example - -This example assumes you have a running cluster with the PodSecurityPolicy -admission controller enabled and you have cluster admin privileges. - -### Set up - -Set up a namespace and a service account to act as for this example. We'll use -this service account to mock a non-admin user. - -```shell -kubectl create namespace psp-example -kubectl create serviceaccount -n psp-example fake-user -kubectl create rolebinding -n psp-example fake-editor --clusterrole=edit --serviceaccount=psp-example:fake-user -``` +PodSecurityPolicy was [deprecated](/blog/2021/04/08/kubernetes-1-21-release-announcement/#podsecuritypolicy-deprecation) +in Kubernetes v1.21, and removed from Kubernetes in v1.25. +Instead of using PodSecurityPolicy, you can enforce similar restrictions on Pods using +either or both: -To make it clear which user we're acting as and save some typing, create 2 -aliases: +- [Pod Security Admission](/docs/concepts/security/pod-security-admission/) +- a 3rd party admission plugin, that you deploy and configure yourself -```shell -alias kubectl-admin='kubectl -n psp-example' -alias kubectl-user='kubectl --as=system:serviceaccount:psp-example:fake-user -n psp-example' -``` - -### Create a policy and a pod - -This is a policy that prevents the creation of privileged pods. -The name of a PodSecurityPolicy object must be a valid -[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names). - -{{< codenew file="policy/example-psp.yaml" >}} - -And create it with kubectl: - -```shell -kubectl-admin create -f https://k8s.io/examples/policy/example-psp.yaml -``` - -Now, as the unprivileged user, try to create a simple pod: - -```shell -kubectl-user create -f- <}} -This is not the recommended way! See the [next section](#run-another-pod) -for the preferred approach. +If you are not running Kubernetes v{{< skew currentVersion >}}, check the documentation for +your version of Kubernetes. {{< /note >}} - -```shell -kubectl-admin create role psp:unprivileged \ - --verb=use \ - --resource=podsecuritypolicy \ - --resource-name=example -``` - -``` -role "psp:unprivileged" created -``` - -```shell -kubectl-admin create rolebinding fake-user:psp:unprivileged \ - --role=psp:unprivileged \ - --serviceaccount=psp-example:fake-user -``` - -``` -rolebinding "fake-user:psp:unprivileged" created -``` - -```shell -kubectl-user auth can-i use podsecuritypolicy/example -``` - -``` -yes -``` - -Now retry creating the pod: - -```shell -kubectl-user create -f- <}} - -This is an example of a restrictive policy that requires users to run as an -unprivileged user, blocks possible escalations to root, and requires use of -several security mechanisms. - -{{< codenew file="policy/restricted-psp.yaml" >}} - -See [Pod Security Standards](/docs/concepts/security/pod-security-standards/#policy-instantiation) -for more examples. - -## Policy Reference - -### Privileged - -**Privileged** - determines if any container in a pod can enable privileged mode. -By default a container is not allowed to access any devices on the host, but a -"privileged" container is given access to all devices on the host. This allows -the container nearly all the same access as processes running on the host. -This is useful for containers that want to use linux capabilities like -manipulating the network stack and accessing devices. - -### Host namespaces - -**HostPID** - Controls whether the pod containers can share the host process ID -namespace. Note that when paired with ptrace this can be used to escalate -privileges outside of the container (ptrace is forbidden by default). - -**HostIPC** - Controls whether the pod containers can share the host IPC -namespace. - -**HostNetwork** - Controls whether the pod may use the node network -namespace. Doing so gives the pod access to the loopback device, services -listening on localhost, and could be used to snoop on network activity of other -pods on the same node. - -**HostPorts** - Provides a list of ranges of allowable ports in the host -network namespace. Defined as a list of `HostPortRange`, with `min`(inclusive) -and `max`(inclusive). Defaults to no allowed host ports. - -### Volumes and file systems - -**Volumes** - Provides a list of allowed volume types. The allowable values -correspond to the volume sources that are defined when creating a volume. For -the complete list of volume types, see [Types of -Volumes](/docs/concepts/storage/volumes/#types-of-volumes). Additionally, -`*` may be used to allow all volume types. - -The **recommended minimum set** of allowed volumes for new PSPs are: - -- `configMap` -- `downwardAPI` -- `emptyDir` -- `persistentVolumeClaim` -- `secret` -- `projected` - -{{< warning >}} -PodSecurityPolicy does not limit the types of `PersistentVolume` objects that -may be referenced by a `PersistentVolumeClaim`, and hostPath type -`PersistentVolumes` do not support read-only access mode. Only trusted users -should be granted permission to create `PersistentVolume` objects. -{{< /warning >}} - -**FSGroup** - Controls the supplemental group applied to some volumes. - -- *MustRunAs* - Requires at least one `range` to be specified. Uses the - minimum value of the first range as the default. Validates against all ranges. -- *MayRunAs* - Requires at least one `range` to be specified. Allows - `FSGroups` to be left unset without providing a default. Validates against - all ranges if `FSGroups` is set. -- *RunAsAny* - No default provided. Allows any `fsGroup` ID to be specified. - -**AllowedHostPaths** - This specifies a list of host paths that are allowed -to be used by hostPath volumes. An empty list means there is no restriction on -host paths used. This is defined as a list of objects with a single `pathPrefix` -field, which allows hostPath volumes to mount a path that begins with an -allowed prefix, and a `readOnly` field indicating it must be mounted read-only. -For example: - -```yaml - allowedHostPaths: - # This allows "/foo", "/foo/", "/foo/bar" etc., but - # disallows "/fool", "/etc/foo" etc. - # "/foo/../" is never valid. - - pathPrefix: "/foo" - readOnly: true # only allow read-only mounts -``` - -{{< warning >}} -There are many ways a container with unrestricted access to the host -filesystem can escalate privileges, including reading data from other -containers, and abusing the credentials of system services, such as Kubelet. - -Writeable hostPath directory volumes allow containers to write -to the filesystem in ways that let them traverse the host filesystem outside the `pathPrefix`. -`readOnly: true`, available in Kubernetes 1.11+, must be used on **all** `allowedHostPaths` -to effectively limit access to the specified `pathPrefix`. -{{< /warning >}} - -**ReadOnlyRootFilesystem** - Requires that containers must run with a read-only -root filesystem (i.e. no writable layer). - -### FlexVolume drivers - -This specifies a list of FlexVolume drivers that are allowed to be used -by flexvolume. An empty list or nil means there is no restriction on the drivers. -Please make sure [`volumes`](#volumes-and-file-systems) field contains the -`flexVolume` volume type; no FlexVolume driver is allowed otherwise. - -For example: - -```yaml -apiVersion: policy/v1beta1 -kind: PodSecurityPolicy -metadata: - name: allow-flex-volumes -spec: - # ... other spec fields - volumes: - - flexVolume - allowedFlexVolumes: - - driver: example/lvm - - driver: example/cifs -``` - -### Users and groups - -**RunAsUser** - Controls which user ID the containers are run with. - -- *MustRunAs* - Requires at least one `range` to be specified. Uses the - minimum value of the first range as the default. Validates against all ranges. -- *MustRunAsNonRoot* - Requires that the pod be submitted with a non-zero - `runAsUser` or have the `USER` directive defined (using a numeric UID) in the - image. Pods which have specified neither `runAsNonRoot` nor `runAsUser` settings - will be mutated to set `runAsNonRoot=true`, thus requiring a defined non-zero - numeric `USER` directive in the container. No default provided. Setting - `allowPrivilegeEscalation=false` is strongly recommended with this strategy. -- *RunAsAny* - No default provided. Allows any `runAsUser` to be specified. - -**RunAsGroup** - Controls which primary group ID the containers are run with. - -- *MustRunAs* - Requires at least one `range` to be specified. Uses the - minimum value of the first range as the default. Validates against all ranges. -- *MayRunAs* - Does not require that RunAsGroup be specified. However, when RunAsGroup - is specified, they have to fall in the defined range. -- *RunAsAny* - No default provided. Allows any `runAsGroup` to be specified. - - -**SupplementalGroups** - Controls which group IDs containers add. - -- *MustRunAs* - Requires at least one `range` to be specified. Uses the - minimum value of the first range as the default. Validates against all ranges. -- *MayRunAs* - Requires at least one `range` to be specified. Allows - `supplementalGroups` to be left unset without providing a default. - Validates against all ranges if `supplementalGroups` is set. -- *RunAsAny* - No default provided. Allows any `supplementalGroups` to be - specified. - -### Privilege Escalation - -These options control the `allowPrivilegeEscalation` container option. This bool -directly controls whether the -[`no_new_privs`](https://www.kernel.org/doc/Documentation/prctl/no_new_privs.txt) -flag gets set on the container process. This flag will prevent `setuid` binaries -from changing the effective user ID, and prevent files from enabling extra -capabilities (e.g. it will prevent the use of the `ping` tool). This behavior is -required to effectively enforce `MustRunAsNonRoot`. - -**AllowPrivilegeEscalation** - Gates whether or not a user is allowed to set the -security context of a container to `allowPrivilegeEscalation=true`. This -defaults to allowed so as to not break setuid binaries. Setting it to `false` -ensures that no child process of a container can gain more privileges than its parent. - -**DefaultAllowPrivilegeEscalation** - Sets the default for the -`allowPrivilegeEscalation` option. The default behavior without this is to allow -privilege escalation so as to not break setuid binaries. If that behavior is not -desired, this field can be used to default to disallow, while still permitting -pods to request `allowPrivilegeEscalation` explicitly. - -### Capabilities - -Linux capabilities provide a finer grained breakdown of the privileges -traditionally associated with the superuser. Some of these capabilities can be -used to escalate privileges or for container breakout, and may be restricted by -the PodSecurityPolicy. For more details on Linux capabilities, see -[capabilities(7)](http://man7.org/linux/man-pages/man7/capabilities.7.html). - -The following fields take a list of capabilities, specified as the capability -name in ALL_CAPS without the `CAP_` prefix. - -**AllowedCapabilities** - Provides a list of capabilities that are allowed to be added -to a container. The default set of capabilities are implicitly allowed. The -empty set means that no additional capabilities may be added beyond the default -set. `*` can be used to allow all capabilities. - -**RequiredDropCapabilities** - The capabilities which must be dropped from -containers. These capabilities are removed from the default set, and must not be -added. Capabilities listed in `RequiredDropCapabilities` must not be included in -`AllowedCapabilities` or `DefaultAddCapabilities`. - -**DefaultAddCapabilities** - The capabilities which are added to containers by -default, in addition to the runtime defaults. See the -documentation for your container runtime for information on working with Linux capabilities. - -### SELinux - -- *MustRunAs* - Requires `seLinuxOptions` to be configured. Uses -`seLinuxOptions` as the default. Validates against `seLinuxOptions`. -- *RunAsAny* - No default provided. Allows any `seLinuxOptions` to be -specified. - -### AllowedProcMountTypes - -`allowedProcMountTypes` is a list of allowed ProcMountTypes. -Empty or nil indicates that only the `DefaultProcMountType` may be used. - -`DefaultProcMount` uses the container runtime defaults for readonly and masked -paths for /proc. Most container runtimes mask certain paths in /proc to avoid -accidental security exposure of special devices or information. This is denoted -as the string `Default`. - -The only other ProcMountType is `UnmaskedProcMount`, which bypasses the -default masking behavior of the container runtime and ensures the newly -created /proc the container stays intact with no modifications. This is -denoted as the string `Unmasked`. - -### AppArmor - -Controlled via annotations on the PodSecurityPolicy. Refer to the -[AppArmor documentation](/docs/tutorials/security/apparmor/#podsecuritypolicy-annotations). - -### Seccomp - -As of Kubernetes v1.19, you can use the `seccompProfile` field in the -`securityContext` of Pods or containers to -[control use of seccomp profiles](/docs/tutorials/security/seccomp/). -In prior versions, seccomp was controlled by adding annotations to a Pod. The -same PodSecurityPolicies can be used with either version to enforce how these -fields or annotations are applied. - -**seccomp.security.alpha.kubernetes.io/defaultProfileName** - Annotation that -specifies the default seccomp profile to apply to containers. Possible values -are: - -- `unconfined` - Seccomp is not applied to the container processes (this is the - default in Kubernetes), if no alternative is provided. -- `runtime/default` - The default container runtime profile is used. -- `docker/default` - The Docker default seccomp profile is used. Deprecated as - of Kubernetes 1.11. Use `runtime/default` instead. -- `localhost/` - Specify a profile as a file on the node located at - `/`, where `` is defined via the - `--seccomp-profile-root` flag on the Kubelet. If the `--seccomp-profile-root` - flag is not defined, the default path will be used, which is - `/seccomp` where `` is specified by the `--root-dir` flag. - - {{< note >}} - The `--seccomp-profile-root` flag is deprecated since Kubernetes - v1.19. Users are encouraged to use the default path. - {{< /note >}} - -**seccomp.security.alpha.kubernetes.io/allowedProfileNames** - Annotation that -specifies which values are allowed for the pod seccomp annotations. Specified as -a comma-delimited list of allowed values. Possible values are those listed -above, plus `*` to allow all profiles. Absence of this annotation means that the -default cannot be changed. - -### Sysctl - -By default, all safe sysctls are allowed. - -- `forbiddenSysctls` - excludes specific sysctls. You can forbid a combination - of safe and unsafe sysctls in the list. To forbid setting any sysctls, use - `*` on its own. -- `allowedUnsafeSysctls` - allows specific sysctls that had been disallowed by - the default list, so long as these are not listed in `forbiddenSysctls`. - -Refer to the [Sysctl documentation](/docs/tasks/administer-cluster/sysctl-cluster/#podsecuritypolicy). - -## {{% heading "whatsnext" %}} - -- See [PodSecurityPolicy Deprecation: Past, Present, and Future](/blog/2021/04/06/podsecuritypolicy-deprecation-past-present-and-future/) - to learn about the future of pod security policy. - -- See [Pod Security Standards](/docs/concepts/security/pod-security-standards/) - for policy recommendations. - -- Refer to [PodSecurityPolicy reference](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#podsecuritypolicy-v1beta1-policy) - for the API details. - diff --git a/content/en/docs/concepts/security/pod-security-standards.md b/content/en/docs/concepts/security/pod-security-standards.md index d60f3b0ae1eaa..3c018f0f1fe8d 100644 --- a/content/en/docs/concepts/security/pod-security-standards.md +++ b/content/en/docs/concepts/security/pod-security-standards.md @@ -451,12 +451,6 @@ of individual policies are not defined here. - {{< example file="security/podsecurity-baseline.yaml" >}}Baseline namespace{{< /example >}} - {{< example file="security/podsecurity-restricted.yaml" >}}Restricted namespace{{< /example >}} -[**PodSecurityPolicy**](/docs/concepts/security/pod-security-policy/) (Deprecated) - -- {{< example file="policy/privileged-psp.yaml" >}}Privileged{{< /example >}} -- {{< example file="policy/baseline-psp.yaml" >}}Baseline{{< /example >}} -- {{< example file="policy/restricted-psp.yaml" >}}Restricted{{< /example >}} - ### Alternatives {{% thirdparty-content %}} diff --git a/content/en/docs/contribute/style/write-new-topic.md b/content/en/docs/contribute/style/write-new-topic.md index 7cac1aa6b7fd5..9c39463db98d5 100644 --- a/content/en/docs/contribute/style/write-new-topic.md +++ b/content/en/docs/contribute/style/write-new-topic.md @@ -105,8 +105,8 @@ following cases (not an exhaustive list): [FlexVolume](/docs/concepts/storage/volumes#flexvolume) implementation. - The code is an incomplete example because its purpose is to highlight a portion of a larger file. For example, when describing ways to - customize the [PodSecurityPolicy](/docs/tasks/administer-cluster/sysctl-cluster/#podsecuritypolicy) - for some reasons, you can provide a short snippet directly in your topic file. + customize a [RoleBinding](/docs/reference/access-authn-authz/rbac/#role-binding-examples), + you can provide a short snippet directly in your topic file. - The code is not meant for users to try out due to other reasons. For example, when describing how a new attribute should be added to a resource using the `kubectl edit` command, you can provide a short example that includes only diff --git a/content/en/docs/reference/access-authn-authz/admission-controllers.md b/content/en/docs/reference/access-authn-authz/admission-controllers.md index f03b04f8e3908..6cf89e3845dec 100644 --- a/content/en/docs/reference/access-authn-authz/admission-controllers.md +++ b/content/en/docs/reference/access-authn-authz/admission-controllers.md @@ -195,7 +195,7 @@ have access to the host PID namespace. The DenyEscalatingExec admission plugin is deprecated. -Use of a policy-based admission plugin (like [PodSecurityPolicy](#podsecuritypolicy) or a custom admission plugin) +Use of a policy-based admission plugin (like [`PodSecurity`](#podsecurity) or a custom admission plugin) which can be targeted at specific users or Namespaces and also protects against creation of overly privileged Pods is recommended instead. @@ -208,7 +208,7 @@ This admission controller will intercept all requests to exec a command in a pod This functionality has been merged into [DenyEscalatingExec](#denyescalatingexec). The DenyExecOnPrivileged admission plugin is deprecated. -Use of a policy-based admission plugin (like [PodSecurityPolicy](#podsecuritypolicy) or a custom admission plugin) +Use of a policy-based admission plugin (like [PodSecurity](#podsecurity) or a custom admission plugin) which can be targeted at specific users or Namespaces and also protects against creation of overly privileged Pods is recommended instead. @@ -661,23 +661,16 @@ admission plugin, which allows preventing pods from running on specifically tain {{< feature-state for_k8s_version="v1.23" state="beta" >}} -This is the replacement for the deprecated [PodSecurityPolicy](#podsecuritypolicy) admission controller -defined in the next section. This admission controller acts on creation and modification of the pod and +This admission controller acts on creation and modification of the pod and determines if it should be admitted based on the requested security context and the [Pod Security Standards](/docs/concepts/security/pod-security-standards/). See the [Pod Security Admission documentation](/docs/concepts/security/pod-security-admission/) for more information. -### PodSecurityPolicy {#podsecuritypolicy} - -{{< feature-state for_k8s_version="v1.21" state="deprecated" >}} - -This admission controller acts on creation and modification of the pod and determines if it should be admitted -based on the requested security context and the available Pod Security Policies. - -See also the [PodSecurityPolicy](/docs/concepts/security/pod-security-policy/) documentation -for more information. +Versions of Kubernetes prior to 1.25 included an admission controller for +the beta `PodSecurityPolicy` API; the Pod Security admission controller +provides similar, but not identical, security enforcement. ### PodTolerationRestriction {#podtolerationrestriction} diff --git a/content/en/docs/reference/access-authn-authz/authorization.md b/content/en/docs/reference/access-authn-authz/authorization.md index ea6147fcbad96..9a77c7ac21124 100644 --- a/content/en/docs/reference/access-authn-authz/authorization.md +++ b/content/en/docs/reference/access-authn-authz/authorization.md @@ -80,8 +80,6 @@ The `get`, `list` and `watch` verbs can all return the full details of a resourc Kubernetes sometimes checks authorization for additional permissions using specialized verbs. For example: -* [PodSecurityPolicy](/docs/concepts/security/pod-security-policy/) - * `use` verb on `podsecuritypolicies` resources in the `policy` API group. * [RBAC](/docs/reference/access-authn-authz/rbac/#privilege-escalation-prevention-and-bootstrapping) * `bind` and `escalate` verbs on `roles` and `clusterroles` resources in the `rbac.authorization.k8s.io` API group. * [Authentication](/docs/reference/access-authn-authz/authentication/) diff --git a/content/en/docs/reference/access-authn-authz/psp-to-pod-security-standards.md b/content/en/docs/reference/access-authn-authz/psp-to-pod-security-standards.md index 82394b363afd0..34e40fca43611 100644 --- a/content/en/docs/reference/access-authn-authz/psp-to-pod-security-standards.md +++ b/content/en/docs/reference/access-authn-authz/psp-to-pod-security-standards.md @@ -9,7 +9,7 @@ weight: 95 The tables below enumerate the configuration parameters on -[PodSecurityPolicy](/docs/concepts/security/pod-security-policy/) objects, whether the field mutates +`PodSecurityPolicy` objects, whether the field mutates and/or validates pods, and how the configuration values map to the [Pod Security Standards](/docs/concepts/security/pod-security-standards/). diff --git a/content/en/docs/reference/glossary/pod-security-policy.md b/content/en/docs/reference/glossary/pod-security-policy.md index ff178bb7cc474..c358b92fc879a 100644 --- a/content/en/docs/reference/glossary/pod-security-policy.md +++ b/content/en/docs/reference/glossary/pod-security-policy.md @@ -17,4 +17,5 @@ tags: A cluster-level resource that controls security sensitive aspects of the Pod specification. The `PodSecurityPolicy` objects define a set of conditions that a Pod must run with in order to be accepted into the system, as well as defaults for the related fields. Pod Security Policy control is implemented as an optional admission controller. -PodSecurityPolicy is deprecated as of Kubernetes v1.21, and will be removed in v1.25. We recommend migrating to [Pod Security Admission](/docs/concepts/security/pod-security-admission/), or a 3rd party admission plugin. +PodSecurityPolicy was deprecated as of Kubernetes v1.21, and removed in v1.25. +As an alternative, use [Pod Security Admission](/docs/concepts/security/pod-security-admission/) or a 3rd party admission plugin. diff --git a/content/en/docs/tasks/administer-cluster/sysctl-cluster.md b/content/en/docs/tasks/administer-cluster/sysctl-cluster.md index 1a560cd8e65a7..367901b390e40 100644 --- a/content/en/docs/tasks/administer-cluster/sysctl-cluster.md +++ b/content/en/docs/tasks/administer-cluster/sysctl-cluster.md @@ -175,55 +175,3 @@ is recommended to use [_taints and toleration_ feature](/docs/reference/generated/kubectl/kubectl-commands/#taint) or [taints on nodes](/docs/concepts/scheduling-eviction/taint-and-toleration/) to schedule those pods onto the right nodes. - -## PodSecurityPolicy - -{{< feature-state for_k8s_version="v1.21" state="deprecated" >}} - -You can further control which sysctls can be set in pods by specifying lists of -sysctls or sysctl patterns in the `forbiddenSysctls` and/or -`allowedUnsafeSysctls` fields of the PodSecurityPolicy. A sysctl pattern ends -with a `*` character, such as `kernel.*`. A `*` character on its own matches -all sysctls. - -By default, all safe sysctls are allowed. - -Both `forbiddenSysctls` and `allowedUnsafeSysctls` are lists of plain sysctl names -or sysctl patterns (which end with `*`). The string `*` matches all sysctls. - -The `forbiddenSysctls` field excludes specific sysctls. You can forbid a -combination of safe and unsafe sysctls in the list. To forbid setting any -sysctls, use `*` on its own. - -If you specify any unsafe sysctl in the `allowedUnsafeSysctls` field and it is -not present in the `forbiddenSysctls` field, that sysctl can be used in Pods -using this PodSecurityPolicy. To allow all unsafe sysctls in the -PodSecurityPolicy to be set, use `*` on its own. - -Do not configure these two fields such that there is overlap, meaning that a -given sysctl is both allowed and forbidden. - -{{< warning >}} -If you allow unsafe sysctls via the `allowedUnsafeSysctls` field -in a PodSecurityPolicy, any pod using such a sysctl will fail to start -if the sysctl is not allowed via the `--allowed-unsafe-sysctls` kubelet -flag as well on that node. -{{< /warning >}} - -This example allows unsafe sysctls prefixed with `kernel.msg` to be set and -disallows setting of the `kernel.shm_rmid_forced` sysctl. - -```yaml -apiVersion: policy/v1beta1 -kind: PodSecurityPolicy -metadata: - name: sysctl-psp -spec: - allowedUnsafeSysctls: - - kernel.msg* - forbiddenSysctls: - - kernel.shm_rmid_forced - ... -``` - - diff --git a/content/en/docs/tasks/configure-pod-container/security-context.md b/content/en/docs/tasks/configure-pod-container/security-context.md index 80d942b724e52..4053862de4800 100644 --- a/content/en/docs/tasks/configure-pod-container/security-context.md +++ b/content/en/docs/tasks/configure-pod-container/security-context.md @@ -484,7 +484,7 @@ kubectl delete pod security-context-demo-4 * [Tuning Docker with the newest security enhancements](https://github.com/containerd/containerd/blob/main/docs/cri/config.md) * [Security Contexts design document](https://git.k8s.io/design-proposals-archive/auth/security_context.md) * [Ownership Management design document](https://git.k8s.io/design-proposals-archive/storage/volume-ownership-management.md) -* [PodSecurityPolicy](/docs/concepts/security/pod-security-policy/) +* [PodSecurity Admission](/docs/concepts/security/pod-security-admission/) * [AllowPrivilegeEscalation design document](https://git.k8s.io/design-proposals-archive/auth/no-new-privs.md) * For more information about security mechanisms in Linux, see diff --git a/content/en/docs/tasks/configure-pod-container/static-pod.md b/content/en/docs/tasks/configure-pod-container/static-pod.md index e9e9a6c356db9..e2eab5088e096 100644 --- a/content/en/docs/tasks/configure-pod-container/static-pod.md +++ b/content/en/docs/tasks/configure-pod-container/static-pod.md @@ -182,7 +182,7 @@ static-web 1/1 Running 0 2m ``` {{< note >}} -Make sure the kubelet has permission to create the mirror Pod in the API server. If not, the creation request is rejected by the API server. See [Pod Security admission](/docs/concepts/security/pod-security-admission) and [PodSecurityPolicy](/docs/concepts/security/pod-security-policy/). +Make sure the kubelet has permission to create the mirror Pod in the API server. If not, the creation request is rejected by the API server. {{< /note >}} {{< glossary_tooltip term_id="label" text="Labels" >}} from the static Pod are diff --git a/content/en/docs/tutorials/security/apparmor.md b/content/en/docs/tutorials/security/apparmor.md index 5af4d86668cec..99338809a1e29 100644 --- a/content/en/docs/tutorials/security/apparmor.md +++ b/content/en/docs/tutorials/security/apparmor.md @@ -346,33 +346,6 @@ class of profiles) on the node, and use a [node selector](/docs/concepts/scheduling-eviction/assign-pod-node/) to ensure the Pod is run on a node with the required profile. -### Restricting profiles with the PodSecurityPolicy - -{{< note >}} -PodSecurityPolicy is deprecated in Kubernetes v1.21, and will be removed in v1.25. -See [PodSecurityPolicy](/docs/concepts/security/pod-security-policy/) documentation for more information. -{{< /note >}} - -If the PodSecurityPolicy extension is enabled, cluster-wide AppArmor restrictions can be applied. To -enable the PodSecurityPolicy, the following flag must be set on the `apiserver`: - -``` ---enable-admission-plugins=PodSecurityPolicy[,others...] -``` - -The AppArmor options can be specified as annotations on the PodSecurityPolicy: - -```yaml -apparmor.security.beta.kubernetes.io/defaultProfileName: -apparmor.security.beta.kubernetes.io/allowedProfileNames: [,others...] -``` - -The default profile name option specifies the profile to apply to containers by default when none is -specified. The allowed profile names option specifies a list of profiles that Pod containers are -allowed to be run with. If both options are provided, the default must be allowed. The profiles are -specified in the same format as on containers. See the [API Reference](#api-reference) for the full -specification. - ### Disabling AppArmor If you do not want AppArmor to be available on your cluster, it can be disabled by a command-line flag: @@ -421,7 +394,7 @@ Specifying the profile a container will run with: ### Profile Reference - `runtime/default`: Refers to the default runtime profile. - - Equivalent to not specifying a profile (without a PodSecurityPolicy default), except it still + - Equivalent to not specifying a profile, except it still requires AppArmor to be enabled. - In practice, many container runtimes use the same OCI default profile, defined here: https://github.com/containers/common/blob/main/pkg/apparmor/apparmor_linux_template.go @@ -432,22 +405,6 @@ Specifying the profile a container will run with: Any other profile reference format is invalid. -### PodSecurityPolicy Annotations - -Specifying the default profile to apply to containers when none is provided: - -* **key**: `apparmor.security.beta.kubernetes.io/defaultProfileName` -* **value**: a profile reference, described above - -Specifying the list of profiles Pod containers is allowed to specify: - -* **key**: `apparmor.security.beta.kubernetes.io/allowedProfileNames` -* **value**: a comma-separated list of profile references (described above) - - Although an escaped comma is a legal character in a profile name, it cannot be explicitly - allowed here. - - - ## {{% heading "whatsnext" %}} From 5db7ddf2dae9df37ec60be5a8eb179d9baf0b22e Mon Sep 17 00:00:00 2001 From: Jiawei Wang Date: Tue, 16 Aug 2022 04:38:47 +0000 Subject: [PATCH 55/77] Update CSI migration feature status, and remove docs for unsupported plugins --- .../concepts/storage/persistent-volumes.md | 51 +++--- .../docs/concepts/storage/storage-classes.md | 149 ------------------ content/en/docs/concepts/storage/volumes.md | 147 ++++------------- .../feature-gates.md | 29 ++-- 4 files changed, 75 insertions(+), 301 deletions(-) diff --git a/content/en/docs/concepts/storage/persistent-volumes.md b/content/en/docs/concepts/storage/persistent-volumes.md index 07521f42eec68..9506461dfa097 100644 --- a/content/en/docs/concepts/storage/persistent-volumes.md +++ b/content/en/docs/concepts/storage/persistent-volumes.md @@ -9,7 +9,7 @@ title: Persistent Volumes feature: title: Storage orchestration description: > - Automatically mount the storage system of your choice, whether from local storage, a public cloud provider such as GCP or AWS, or a network storage system such as NFS, iSCSI, Gluster, Ceph, Cinder, or Flocker. + Automatically mount the storage system of your choice, whether from local storage, a public cloud provider such as AWS or GCP, or a network storage system such as NFS, iSCSI, Ceph, Cinder. content_type: concept weight: 20 --- @@ -238,10 +238,9 @@ Source: Events: ``` -Enabling the `CSIMigration` feature for a specific in-tree volume plugin will remove -the `kubernetes.io/pv-controller` finalizer, while adding the `external-provisioner.volume.kubernetes.io/finalizer` -finalizer. Similarly, disabling `CSIMigration` will remove the `external-provisioner.volume.kubernetes.io/finalizer` -finalizer, while adding the `kubernetes.io/pv-controller` finalizer. +When the `CSIMigration{provider}` feature flag is enabled for a specific in-tree volume plugin, +the `kubernetes.io/pv-controller` finalizer is replaced by the +`external-provisioner.volume.kubernetes.io/finalizer` finalizer. ### Reserving a PersistentVolume @@ -413,14 +412,9 @@ Kubernetes does not support shrinking a PVC to less than its current size. PersistentVolume types are implemented as plugins. Kubernetes currently supports the following plugins: -* [`awsElasticBlockStore`](/docs/concepts/storage/volumes/#awselasticblockstore) - AWS Elastic Block Store (EBS) -* [`azureDisk`](/docs/concepts/storage/volumes/#azuredisk) - Azure Disk -* [`azureFile`](/docs/concepts/storage/volumes/#azurefile) - Azure File * [`cephfs`](/docs/concepts/storage/volumes/#cephfs) - CephFS volume * [`csi`](/docs/concepts/storage/volumes/#csi) - Container Storage Interface (CSI) * [`fc`](/docs/concepts/storage/volumes/#fc) - Fibre Channel (FC) storage -* [`gcePersistentDisk`](/docs/concepts/storage/volumes/#gcepersistentdisk) - GCE Persistent Disk -* [`glusterfs`](/docs/concepts/storage/volumes/#glusterfs) - Glusterfs volume * [`hostPath`](/docs/concepts/storage/volumes/#hostpath) - HostPath volume (for single node testing only; WILL NOT WORK in a multi-node cluster; consider using `local` volume instead) @@ -428,29 +422,41 @@ PersistentVolume types are implemented as plugins. Kubernetes currently supports * [`local`](/docs/concepts/storage/volumes/#local) - local storage devices mounted on nodes. * [`nfs`](/docs/concepts/storage/volumes/#nfs) - Network File System (NFS) storage -* [`portworxVolume`](/docs/concepts/storage/volumes/#portworxvolume) - Portworx volume * [`rbd`](/docs/concepts/storage/volumes/#rbd) - Rados Block Device (RBD) volume -* [`vsphereVolume`](/docs/concepts/storage/volumes/#vspherevolume) - vSphere VMDK volume The following types of PersistentVolume are deprecated. This means that support is still available but will be removed in a future Kubernetes release. +* [`awsElasticBlockStore`](/docs/concepts/storage/volumes/#awselasticblockstore) - AWS Elastic Block Store (EBS) + (**deprecated** in v1.17) +* [`azureDisk`](/docs/concepts/storage/volumes/#azuredisk) - Azure Disk + (**deprecated** in v1.19) +* [`azureFile`](/docs/concepts/storage/volumes/#azurefile) - Azure File + (**deprecated** in v1.21) * [`cinder`](/docs/concepts/storage/volumes/#cinder) - Cinder (OpenStack block storage) (**deprecated** in v1.18) * [`flexVolume`](/docs/concepts/storage/volumes/#flexvolume) - FlexVolume (**deprecated** in v1.23) -* [`flocker`](/docs/concepts/storage/volumes/#flocker) - Flocker storage - (**deprecated** in v1.22) -* [`quobyte`](/docs/concepts/storage/volumes/#quobyte) - Quobyte volume - (**deprecated** in v1.22) -* [`storageos`](/docs/concepts/storage/volumes/#storageos) - StorageOS volume - (**deprecated** in v1.22) +* [`gcePersistentDisk`](/docs/concepts/storage/volumes/#gcepersistentdisk) - GCE Persistent Disk + (**deprecated** in v1.17) +* [`glusterfs`](/docs/concepts/storage/volumes/#glusterfs) - Glusterfs volume + (**deprecated** in v1.25) +* [`portworxVolume`](/docs/concepts/storage/volumes/#portworxvolume) - Portworx volume + (**deprecated** in v1.25) +* [`vsphereVolume`](/docs/concepts/storage/volumes/#vspherevolume) - vSphere VMDK volume + (**deprecated** in v1.19) Older versions of Kubernetes also supported the following in-tree PersistentVolume types: * `photonPersistentDisk` - Photon controller persistent disk. - (**not available** after v1.15) + (**not available** starting v1.15) * [`scaleIO`](/docs/concepts/storage/volumes/#scaleio) - ScaleIO volume - (**not available** after v1.21) + (**not available** starting v1.21) +* [`flocker`](/docs/concepts/storage/volumes/#flocker) - Flocker storage + (**not available** starting v1.25) +* [`quobyte`](/docs/concepts/storage/volumes/#quobyte) - Quobyte volume + (**not available** starting v1.25) +* [`storageos`](/docs/concepts/storage/volumes/#storageos) - StorageOS volume + (**not available** starting v1.25) ## Persistent Volumes @@ -562,17 +568,14 @@ If the access modes are specified as ReadWriteOncePod, the volume is constrained | CSI | depends on the driver | depends on the driver | depends on the driver | depends on the driver | | FC | ✓ | ✓ | - | - | | FlexVolume | ✓ | ✓ | depends on the driver | - | -| Flocker | ✓ | - | - | - | | GCEPersistentDisk | ✓ | ✓ | - | - | | Glusterfs | ✓ | ✓ | ✓ | - | | HostPath | ✓ | - | - | - | | iSCSI | ✓ | ✓ | - | - | -| Quobyte | ✓ | ✓ | ✓ | - | | NFS | ✓ | ✓ | ✓ | - | | RBD | ✓ | ✓ | - | - | | VsphereVolume | ✓ | - | - (works when Pods are collocated) | - | | PortworxVolume | ✓ | - | ✓ | - | - | -| StorageOS | ✓ | - | - | - | ### Class @@ -616,9 +619,7 @@ The following volume types support mount options: * `glusterfs` * `iscsi` * `nfs` -* `quobyte` (**deprecated** in v1.22) * `rbd` -* `storageos` (**deprecated** in v1.22) * `vsphereVolume` Mount options are not validated. If a mount option is invalid, the mount fails. diff --git a/content/en/docs/concepts/storage/storage-classes.md b/content/en/docs/concepts/storage/storage-classes.md index 8fda0b2ff3f3e..113be2a5f6a11 100644 --- a/content/en/docs/concepts/storage/storage-classes.md +++ b/content/en/docs/concepts/storage/storage-classes.md @@ -71,17 +71,13 @@ for provisioning PVs. This field must be specified. | Cinder | ✓ | [OpenStack Cinder](#openstack-cinder)| | FC | - | - | | FlexVolume | - | - | -| Flocker | ✓ | - | | GCEPersistentDisk | ✓ | [GCE PD](#gce-pd) | | Glusterfs | ✓ | [Glusterfs](#glusterfs) | | iSCSI | - | - | -| Quobyte | ✓ | [Quobyte](#quobyte) | | NFS | - | [NFS](#nfs) | | RBD | ✓ | [Ceph RBD](#ceph-rbd) | | VsphereVolume | ✓ | [vSphere](#vsphere) | | PortworxVolume | ✓ | [Portworx Volume](#portworx-volume) | -| ScaleIO | ✓ | [ScaleIO](#scaleio) | -| StorageOS | ✓ | [StorageOS](#storageos) | | Local | - | [Local](#local) | You are not restricted to specifying the "internal" provisioners @@ -599,61 +595,6 @@ parameters: set `imageFormat` to "2". Currently supported features are `layering` only. Default is "", and no features are turned on. -### Quobyte - -{{< feature-state for_k8s_version="v1.22" state="deprecated" >}} - -The Quobyte in-tree storage plugin is deprecated, an -[example](https://github.com/quobyte/quobyte-csi/blob/master/example/StorageClass.yaml) -`StorageClass` for the out-of-tree Quobyte plugin can be found at the Quobyte CSI repository. - -```yaml -apiVersion: storage.k8s.io/v1 -kind: StorageClass -metadata: - name: slow -provisioner: kubernetes.io/quobyte -parameters: - quobyteAPIServer: "http://138.68.74.142:7860" - registry: "138.68.74.142:7861" - adminSecretName: "quobyte-admin-secret" - adminSecretNamespace: "kube-system" - user: "root" - group: "root" - quobyteConfig: "BASE" - quobyteTenant: "DEFAULT" -``` - -* `quobyteAPIServer`: API Server of Quobyte in the format - `"http(s)://api-server:7860"` -* `registry`: Quobyte registry to use to mount the volume. You can specify the - registry as ``:`` pair or if you want to specify multiple - registries, put a comma between them. - ``:,:,:``. - The host can be an IP address or if you have a working DNS you can also - provide the DNS names. -* `adminSecretNamespace`: The namespace for `adminSecretName`. - Default is "default". -* `adminSecretName`: secret that holds information about the Quobyte user and - the password to authenticate against the API server. The provided secret - must have type "kubernetes.io/quobyte" and the keys `user` and `password`, - for example: - - ```shell - kubectl create secret generic quobyte-admin-secret \ - --type="kubernetes.io/quobyte" --from-literal=user='admin' --from-literal=password='opensesame' \ - --namespace=kube-system - ``` - -* `user`: maps all access to this user. Default is "root". -* `group`: maps all access to this group. Default is "nfsnobody". -* `quobyteConfig`: use the specified configuration to create the volume. You - can create a new configuration or modify an existing one with the Web - console or the quobyte CLI. Default is "BASE". -* `quobyteTenant`: use the specified tenant ID to create/delete the volume. - This Quobyte tenant has to be already present in Quobyte. - Default is "DEFAULT". - ### Azure Disk #### Azure Unmanaged Disk storage class {#azure-unmanaged-disk-storage-class} @@ -782,96 +723,6 @@ parameters: to false, `true/false` (default `false`). A string is expected here i.e. `"true"` and not `true`. -### ScaleIO - -```yaml -apiVersion: storage.k8s.io/v1 -kind: StorageClass -metadata: - name: slow -provisioner: kubernetes.io/scaleio -parameters: - gateway: https://192.168.99.200:443/api - system: scaleio - protectionDomain: pd0 - storagePool: sp1 - storageMode: ThinProvisioned - secretRef: sio-secret - readOnly: "false" - fsType: xfs -``` - -* `provisioner`: attribute is set to `kubernetes.io/scaleio` -* `gateway`: address to a ScaleIO API gateway (required) -* `system`: the name of the ScaleIO system (required) -* `protectionDomain`: the name of the ScaleIO protection domain (required) -* `storagePool`: the name of the volume storage pool (required) -* `storageMode`: the storage provision mode: `ThinProvisioned` (default) or - `ThickProvisioned` -* `secretRef`: reference to a configured Secret object (required) -* `readOnly`: specifies the access mode to the mounted volume (default false) -* `fsType`: the file system to use for the volume (default ext4) - -The ScaleIO Kubernetes volume plugin requires a configured Secret object. -The secret must be created with type `kubernetes.io/scaleio` and use the same -namespace value as that of the PVC where it is referenced -as shown in the following command: - -```shell -kubectl create secret generic sio-secret --type="kubernetes.io/scaleio" \ ---from-literal=username=sioadmin --from-literal=password=d2NABDNjMA== \ ---namespace=default -``` - -### StorageOS - -```yaml -apiVersion: storage.k8s.io/v1 -kind: StorageClass -metadata: - name: fast -provisioner: kubernetes.io/storageos -parameters: - pool: default - description: Kubernetes volume - fsType: ext4 - adminSecretNamespace: default - adminSecretName: storageos-secret -``` - -* `pool`: The name of the StorageOS distributed capacity pool to provision the - volume from. Uses the `default` pool which is normally present if not specified. -* `description`: The description to assign to volumes that were created dynamically. - All volume descriptions will be the same for the storage class, but different - storage classes can be used to allow descriptions for different use cases. - Defaults to `Kubernetes volume`. -* `fsType`: The default filesystem type to request. Note that user-defined rules - within StorageOS may override this value. Defaults to `ext4`. -* `adminSecretNamespace`: The namespace where the API configuration secret is - located. Required if adminSecretName set. -* `adminSecretName`: The name of the secret to use for obtaining the StorageOS - API credentials. If not specified, default values will be attempted. - -The StorageOS Kubernetes volume plugin can use a Secret object to specify an -endpoint and credentials to access the StorageOS API. This is only required when -the defaults have been changed. -The secret must be created with type `kubernetes.io/storageos` as shown in the -following command: - -```shell -kubectl create secret generic storageos-secret \ ---type="kubernetes.io/storageos" \ ---from-literal=apiAddress=tcp://localhost:5705 \ ---from-literal=apiUsername=storageos \ ---from-literal=apiPassword=storageos \ ---namespace=default -``` - -Secrets used for dynamically provisioned volumes may be created in any namespace -and referenced with the `adminSecretNamespace` parameter. Secrets used by -pre-provisioned volumes must be created in the same namespace as the PVC that -references it. - ### Local {{< feature-state for_k8s_version="v1.14" state="stable" >}} diff --git a/content/en/docs/concepts/storage/volumes.md b/content/en/docs/concepts/storage/volumes.md index 25a0e818231a6..999d7aad07e4b 100644 --- a/content/en/docs/concepts/storage/volumes.md +++ b/content/en/docs/concepts/storage/volumes.md @@ -121,14 +121,13 @@ If the EBS volume is partitioned, you can supply the optional field `partition: #### AWS EBS CSI migration -{{< feature-state for_k8s_version="v1.17" state="beta" >}} +{{< feature-state for_k8s_version="v1.25" state="stable" >}} The `CSIMigration` feature for `awsElasticBlockStore`, when enabled, redirects all plugin operations from the existing in-tree plugin to the `ebs.csi.aws.com` Container Storage Interface (CSI) driver. In order to use this feature, the [AWS EBS CSI driver](https://github.com/kubernetes-sigs/aws-ebs-csi-driver) -must be installed on the cluster and the `CSIMigration` and `CSIMigrationAWS` -beta features must be enabled. +must be installed on the cluster. #### AWS EBS CSI migration complete @@ -153,7 +152,7 @@ The `CSIMigration` feature for `azureDisk`, when enabled, redirects all plugin o from the existing in-tree plugin to the `disk.csi.azure.com` Container Storage Interface (CSI) Driver. In order to use this feature, the [Azure Disk CSI Driver](https://github.com/kubernetes-sigs/azuredisk-csi-driver) -must be installed on the cluster and the `CSIMigration` feature must be enabled. +must be installed on the cluster. #### azureDisk CSI migration complete @@ -179,7 +178,7 @@ The `CSIMigration` feature for `azureFile`, when enabled, redirects all plugin o from the existing in-tree plugin to the `file.csi.azure.com` Container Storage Interface (CSI) Driver. In order to use this feature, the [Azure File CSI Driver](https://github.com/kubernetes-sigs/azurefile-csi-driver) -must be installed on the cluster and the `CSIMigration` and `CSIMigrationAzureFile` +must be installed on the cluster and the `CSIMigrationAzureFile` [feature gates](/docs/reference/command-line-tools-reference/feature-gates/) must be enabled. Azure File CSI driver does not support using same volume with different fsgroups. If @@ -382,24 +381,6 @@ beforehand so that Kubernetes hosts can access them. See the [fibre channel example](https://github.com/kubernetes/examples/tree/master/staging/volumes/fibre_channel) for more details. -### flocker (deprecated) {#flocker} - -[Flocker](https://github.com/ClusterHQ/flocker) is an open-source, clustered -container data volume manager. Flocker provides management -and orchestration of data volumes backed by a variety of storage backends. - -A `flocker` volume allows a Flocker dataset to be mounted into a Pod. If the -dataset does not already exist in Flocker, it needs to be first created with the Flocker -CLI or by using the Flocker API. If the dataset already exists it will be -reattached by Flocker to the node that the pod is scheduled. This means data -can be shared between pods as required. - -{{< note >}} -You must have your own Flocker installation running before you can use it. -{{< /note >}} - -See the [Flocker example](https://github.com/kubernetes/examples/tree/master/staging/volumes/flocker) for more details. - ### gcePersistentDisk (deprecated) {#gcepersistentdisk} {{< feature-state for_k8s_version="v1.17" state="deprecated" >}} @@ -507,14 +488,13 @@ spec: #### GCE CSI migration -{{< feature-state for_k8s_version="v1.17" state="beta" >}} +{{< feature-state for_k8s_version="v1.25" state="stable" >}} The `CSIMigration` feature for GCE PD, when enabled, redirects all plugin operations from the existing in-tree plugin to the `pd.csi.storage.gke.io` Container Storage Interface (CSI) Driver. In order to use this feature, the [GCE PD CSI Driver](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver) -must be installed on the cluster and the `CSIMigration` and `CSIMigrationGCE` -beta features must be enabled. +must be installed on the cluster. #### GCE CSI migration complete @@ -554,7 +534,9 @@ spec: revision: "22f1d8406d464b0c0874075539c1f2e96c253775" ``` -### glusterfs +### glusterfs (deprecated) + +{{< feature-state for_k8s_version="v1.25" state="deprecated" >}} A `glusterfs` volume allows a [Glusterfs](https://www.gluster.org) (an open source networked filesystem) volume to be mounted into your Pod. Unlike @@ -796,7 +778,9 @@ iSCSI volume) without knowing the details of the particular cloud environment. See the information about [PersistentVolumes](/docs/concepts/storage/persistent-volumes/) for more details. -### portworxVolume {#portworxvolume} +### portworxVolume (deprecated) {#portworxvolume} + +{{< feature-state for_k8s_version="v1.25" state="deprecated" >}} A `portworxVolume` is an elastic block storage layer that runs hyperconverged with Kubernetes. [Portworx](https://portworx.com/use-case/kubernetes-storage/) fingerprints storage @@ -834,25 +818,22 @@ before using it in the Pod. For more details, see the [Portworx volume](https://github.com/kubernetes/examples/tree/master/staging/volumes/portworx/README.md) examples. +#### Portworx CSI migration +{{< feature-state for_k8s_version="v1.25" state="beta" >}} + +The `CSIMigration` feature for Portworx has been added but disabled by default in Kubernetes 1.23 since it's in alpha state. +It has been beta now since v1.25 but it is still turned off by default. +It redirects all plugin operations from the existing in-tree plugin to the +`pxd.portworx.com` Container Storage Interface (CSI) Driver. +[Portworx CSI Driver](https://docs.portworx.com/portworx-install-with-kubernetes/storage-operations/csi/) +must be installed on the cluster. +To enable the feature, set `CSIMigrationPortworx=true` in kube-controller-manager and kubelet. + ### projected A projected volume maps several existing volume sources into the same directory. For more details, see [projected volumes](/docs/concepts/storage/projected-volumes/). -### quobyte (deprecated) {#quobyte} - -A `quobyte` volume allows an existing [Quobyte](https://www.quobyte.com) volume to -be mounted into your Pod. - -{{< note >}} -You must have your own Quobyte setup and running with the volumes -created before you can use it. -{{< /note >}} - -Quobyte supports the {{< glossary_tooltip text="Container Storage Interface" term_id="csi" >}}. -CSI is the recommended plugin to use Quobyte volumes inside Kubernetes. Quobyte's -GitHub project has [instructions](https://github.com/quobyte/quobyte-csi#quobyte-csi) for deploying Quobyte using CSI, along with examples. - ### rbd An `rbd` volume allows a @@ -884,9 +865,10 @@ operations from the existing in-tree plugin to the `rbd.csi.ceph.com` {{< glossary_tooltip text="CSI" term_id="csi" >}} driver. In order to use this feature, the [Ceph CSI driver](https://github.com/ceph/ceph-csi) -must be installed on the cluster and the `CSIMigration` and `csiMigrationRBD` -[feature gates](/docs/reference/command-line-tools-reference/feature-gates/) -must be enabled. +must be installed on the cluster and the `CSIMigrationRBD` +[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) +must be enabled. (Note that the `csiMigrationRBD` flag has been removed and +replaced with `CSIMigrationRBD` in release v1.24) {{< note >}} @@ -926,61 +908,6 @@ receive Secret updates. For more details, see [Configuring Secrets](/docs/concepts/configuration/secret/). -### storageOS (deprecated) {#storageos} - -A `storageos` volume allows an existing [StorageOS](https://www.storageos.com) -volume to mount into your Pod. - -StorageOS runs as a container within your Kubernetes environment, making local -or attached storage accessible from any node within the Kubernetes cluster. -Data can be replicated to protect against node failure. Thin provisioning and -compression can improve utilization and reduce cost. - -At its core, StorageOS provides block storage to containers, accessible from a file system. - -The StorageOS Container requires 64-bit Linux and has no additional dependencies. -A free developer license is available. - -{{< caution >}} -You must run the StorageOS container on each node that wants to -access StorageOS volumes or that will contribute storage capacity to the pool. -For installation instructions, consult the -[StorageOS documentation](https://docs.storageos.com). -{{< /caution >}} - -The following example is a Pod configuration with StorageOS: - -```yaml -apiVersion: v1 -kind: Pod -metadata: - labels: - name: redis - role: master - name: test-storageos-redis -spec: - containers: - - name: master - image: kubernetes/redis:v1 - env: - - name: MASTER - value: "true" - ports: - - containerPort: 6379 - volumeMounts: - - mountPath: /redis-master-data - name: redis-data - volumes: - - name: redis-data - storageos: - # The `redis-vol01` volume must already exist within StorageOS in the `default` namespace. - volumeName: redis-vol01 - fsType: ext4 -``` - -For more information about StorageOS, dynamic provisioning, and PersistentVolumeClaims, see the -[StorageOS examples](https://github.com/kubernetes/examples/blob/master/volumes/storageos). - ### vsphereVolume (deprecated) {#vspherevolume} {{< note >}} @@ -1001,8 +928,8 @@ All plugin operations from the in-tree `vspherevolume` will be redirected to the [vSphere CSI driver](https://github.com/kubernetes-sigs/vsphere-csi-driver) -must be installed on the cluster. You can find additional advice on how to migrate in-tree `vsphereVolume` in VMware's -documentation page [Migrating In-Tree vSphere Volumes to vSphere Container Storage Plug-in](https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/2.0/vmware-vsphere-csp-getting-started/GUID-968D421F-D464-4E22-8127-6CB9FF54423F.html). +must be installed on the cluster. You can find additional advice on how to migrate in-tree `vsphereVolume` in VMware's documentation page +[Migrating In-Tree vSphere Volumes to vSphere Container Storage Plug-in](https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/2.0/vmware-vsphere-csp-getting-started/GUID-968D421F-D464-4E22-8127-6CB9FF54423F.html). As of Kubernetes v1.25, vSphere releases less than 7.0u2 are not supported for the (deprecated) in-tree vSphere storage driver. You must run vSphere 7.0u2 or later @@ -1034,16 +961,6 @@ but new volumes created by the vSphere CSI driver will not be honoring these par To turn off the `vsphereVolume` plugin from being loaded by the controller manager and the kubelet, you need to set `InTreePluginvSphereUnregister` feature flag to `true`. You must install a `csi.vsphere.vmware.com` {{< glossary_tooltip text="CSI" term_id="csi" >}} driver on all worker nodes. -#### Portworx CSI migration -{{< feature-state for_k8s_version="v1.23" state="alpha" >}} - -The `CSIMigration` feature for Portworx has been added but disabled by default in Kubernetes 1.23 since it's in alpha state. -It redirects all plugin operations from the existing in-tree plugin to the -`pxd.portworx.com` Container Storage Interface (CSI) Driver. -[Portworx CSI Driver](https://docs.portworx.com/portworx-install-with-kubernetes/storage-operations/csi/) -must be installed on the cluster. -To enable the feature, set `CSIMigrationPortworx=true` in kube-controller-manager and kubelet. - ## Using subPath {#using-subpath} Sometimes, it is useful to share one volume for multiple uses in a single pod. @@ -1281,9 +1198,9 @@ For more details, refer to the deployment guide of the CSI plugin you wish to de #### Migrating to CSI drivers from in-tree plugins -{{< feature-state for_k8s_version="v1.17" state="beta" >}} +{{< feature-state for_k8s_version="v1.25" state="stable" >}} -The `CSIMigration` feature, when enabled, directs operations against existing in-tree +The `CSIMigration` feature directs operations against existing in-tree plugins to corresponding CSI plugins (which are expected to be installed and configured). As a result, operators do not have to make any configuration changes to existing Storage Classes, PersistentVolumes or PersistentVolumeClaims @@ -1303,7 +1220,7 @@ The following in-tree plugins support persistent storage on Windows nodes: * [`gcePersistentDisk`](#gcepersistentdisk) * [`vsphereVolume`](#vspherevolume) -### flexVolume +### flexVolume (deprecated) {{< feature-state for_k8s_version="v1.23" state="deprecated" >}} diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 9e0695bbc85af..361393167b997 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -70,21 +70,15 @@ different Kubernetes components. | `CPUManagerPolicyBetaOptions` | `true` | Beta | 1.23 | | | `CPUManagerPolicyOptions` | `false` | Alpha | 1.22 | 1.22 | | `CPUManagerPolicyOptions` | `true` | Beta | 1.23 | | -| `CSIMigration` | `false` | Alpha | 1.14 | 1.16 | -| `CSIMigration` | `true` | Beta | 1.17 | | -| `CSIMigrationAWS` | `false` | Alpha | 1.14 | 1.16 | -| `CSIMigrationAWS` | `false` | Beta | 1.17 | 1.22 | -| `CSIMigrationAWS` | `true` | Beta | 1.23 | | -| `CSIMigrationAzureFile` | `false` | Alpha | 1.15 | 1.19 | +| `CSIMigrationAzureFile` | `false` | Alpha | 1.15 | 1.20 | | `CSIMigrationAzureFile` | `false` | Beta | 1.21 | 1.23 | | `CSIMigrationAzureFile` | `true` | Beta | 1.24 | | -| `CSIMigrationGCE` | `false` | Alpha | 1.14 | 1.16 | -| `CSIMigrationGCE` | `false` | Beta | 1.17 | 1.22 | -| `CSIMigrationGCE` | `true` | Beta | 1.23 | | -| `CSIMigrationvSphere` | `false` | Beta | 1.19 | | -| `CSIMigrationPortworx` | `false` | Alpha | 1.23 | | +| `CSIMigrationPortworx` | `false` | Alpha | 1.23 | 1.24 | | `CSIMigrationPortworx` | `false` | Beta | 1.25 | | -| `csiMigrationRBD` | `false` | Alpha | 1.23 | | +| `CSIMigrationRBD` | `false` | Alpha | 1.23 | | +| `CSIMigrationvSphere` | `false` | Alpha | 1.18 | 1.18 | +| `CSIMigrationvSphere` | `false` | Beta | 1.19 | 1.24 | +| `CSIMigrationvSphere` | `true` | Beta | 1.25 | | | `CSINodeExpandSecret` | `false` | Alpha | 1.25 | | | `CSIVolumeHealth` | `false` | Alpha | 1.21 | | | `ContextualLogging` | `false` | Alpha | 1.24 | | @@ -245,6 +239,13 @@ different Kubernetes components. | `CSIInlineVolume` | `false` | Alpha | 1.15 | 1.15 | | `CSIInlineVolume` | `true` | Beta | 1.16 | 1.24 | | `CSIInlineVolume` | `true` | GA | 1.25 | - | +| `CSIMigration` | `false` | Alpha | 1.14 | 1.16 | +| `CSIMigration` | `true` | Beta | 1.17 | 1.24 | +| `CSIMigration` | `true` | GA | 1.25 | - | +| `CSIMigrationAWS` | `false` | Alpha | 1.14 | 1.16 | +| `CSIMigrationAWS` | `false` | Beta | 1.17 | 1.22 | +| `CSIMigrationAWS` | `true` | Beta | 1.23 | 1.24 | +| `CSIMigrationAWS` | `true` | GA | 1.25 | - | | `CSIMigrationAWSComplete` | `false` | Alpha | 1.17 | 1.20 | | `CSIMigrationAWSComplete` | - | Deprecated | 1.21 | - | | `CSIMigrationAzureDisk` | `false` | Alpha | 1.15 | 1.18 | @@ -255,6 +256,10 @@ different Kubernetes components. | `CSIMigrationAzureDiskComplete` | - | Deprecated | 1.21 | - | | `CSIMigrationAzureFileComplete` | `false` | Alpha | 1.17 | 1.20 | | `CSIMigrationAzureFileComplete` | - | Deprecated | 1.21 | - | +| `CSIMigrationGCE` | `false` | Alpha | 1.14 | 1.16 | +| `CSIMigrationGCE` | `false` | Beta | 1.17 | 1.22 | +| `CSIMigrationGCE` | `true` | Beta | 1.23 | 1.24 | +| `CSIMigrationGCE` | `true` | GA | 1.25 | - | | `CSIMigrationGCEComplete` | `false` | Alpha | 1.17 | 1.20 | | `CSIMigrationGCEComplete` | - | Deprecated | 1.21 | - | | `CSIMigrationOpenStack` | `false` | Alpha | 1.14 | 1.17 | From 5aa74d45b5cb1a1a0c5155c5bb41e3767ccd9c80 Mon Sep 17 00:00:00 2001 From: kerthcet Date: Tue, 16 Aug 2022 15:10:16 +0800 Subject: [PATCH 56/77] address comments Signed-off-by: kerthcet --- .../scheduling-eviction/topology-spread-constraints.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md b/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md index 8413f68b91297..2d7de26f6f06f 100644 --- a/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md +++ b/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md @@ -112,9 +112,8 @@ your cluster. Those fields are: - **topologyKey** is the key of [node labels](#node-labels). Nodes that have a label with this key and identical values are considered to be in the same topology. - We consider each as a "bucket", and try to put balanced number - of pods into each bucket. - We define a domain as a particular instance of a topology. + We call each instance of a topology (in other words, a pair) a domain. The scheduler + will try to put a balanced number of pods into each domain. Also, we define an eligible domain as a domain whose nodes meet the requirements of nodeAffinityPolicy and nodeTaintsPolicy. From 77df8a9fb5282e5c342e634b530f369e3cf1cfa1 Mon Sep 17 00:00:00 2001 From: Jan Safranek Date: Wed, 3 Aug 2022 14:09:18 +0200 Subject: [PATCH 57/77] Add documentation for SELinuxMountReadWriteOncePod alpha feature Co-authored-by: Tim Bannister --- .../feature-gates.md | 3 ++ .../security-context.md | 37 +++++++++++++++++++ 2 files changed, 40 insertions(+) diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 8262abf7dd89c..4c088dab026bc 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -1106,6 +1106,9 @@ Each feature gate is designed for enabling/disabling a specific feature: The seccomp profile is specified in the `securityContext` of a Pod and/or a Container. - `SelectorIndex`: Allows label and field based indexes in API server watch cache to accelerate list operations. +- `SELinuxMountReadWriteOncePod`: Allows kubelet to mount volumes for a Pod directly with the + right SELinux label instead of applying the SELinux label recursively on every file on the + volume. - `ServerSideApply`: Enables the [Sever Side Apply (SSA)](/docs/reference/using-api/server-side-apply/) feature on the API Server. - `ServerSideFieldValidation`: Enables server-side field validation. This means the validation diff --git a/content/en/docs/tasks/configure-pod-container/security-context.md b/content/en/docs/tasks/configure-pod-container/security-context.md index 80d942b724e52..73ff19912dbea 100644 --- a/content/en/docs/tasks/configure-pod-container/security-context.md +++ b/content/en/docs/tasks/configure-pod-container/security-context.md @@ -444,6 +444,43 @@ securityContext: To assign SELinux labels, the SELinux security module must be loaded on the host operating system. {{< /note >}} +### Efficient SELinux volume relabeling + +{{< feature-state for_k8s_version="v1.25" state="alpha" >}} + +By default, the contrainer runtime recursively assigns SELinux label to all +files on all Pod volumes. To speed up this process, Kubernetes can change the +SELinux label of a volume instantly by using a mount option +`-o context=