diff --git a/content/en/docs/concepts/architecture/garbage-collection.md b/content/en/docs/concepts/architecture/garbage-collection.md index 4c61c968ba054..ffc648014e47a 100644 --- a/content/en/docs/concepts/architecture/garbage-collection.md +++ b/content/en/docs/concepts/architecture/garbage-collection.md @@ -141,7 +141,7 @@ until disk usage reaches the `LowThresholdPercent` value. {{< feature-state feature_gate_name="ImageMaximumGCAge" >}} -As an alpha feature, you can specify the maximum time a local image can be unused for, +As a beta feature, you can specify the maximum time a local image can be unused for, regardless of disk usage. This is a kubelet setting that you configure for each node. To configure the setting, enable the `ImageMaximumGCAge` @@ -151,6 +151,13 @@ and also set a value for the `ImageMaximumGCAge` field in the kubelet configurat The value is specified as a Kubernetes _duration_; for example, you can set the configuration field to `3d12h`, which means 3 days and 12 hours. +{{< note >}} +This feature does not track image usage across kubelet restarts. If the kubelet +is restarted, the tracked image age is reset, causing the kubelet to wait the full +`ImageMaximumGCAge` duration before qualifying images for garbage collection +based on image age. +{{< /note>}} + ### Container garbage collection {#container-image-garbage-collection} The kubelet garbage collects unused containers based on the following variables, diff --git a/content/en/docs/concepts/architecture/nodes.md b/content/en/docs/concepts/architecture/nodes.md index d03a232b4ddbe..636dfac5c878c 100644 --- a/content/en/docs/concepts/architecture/nodes.md +++ b/content/en/docs/concepts/architecture/nodes.md @@ -516,14 +516,44 @@ During a non-graceful shutdown, Pods are terminated in the two phases: recovered since the user was the one who originally added the taint. {{< /note >}} +### Forced storage detach on timeout {#storage-force-detach-on-timeout} + +In any situation where a pod deletion has not succeeded for 6 minutes, kubernetes will +force detach volumes being unmounted if the node is unhealthy at that instant. Any +workload still running on the node that uses a force-detached volume will cause a +violation of the +[CSI specification](https://github.com/container-storage-interface/spec/blob/master/spec.md#controllerunpublishvolume), +which states that `ControllerUnpublishVolume` "**must** be called after all +`NodeUnstageVolume` and `NodeUnpublishVolume` on the volume are called and succeed". +In such circumstances, volumes on the node in question might encounter data corruption. + +The forced storage detach behaviour is optional; users might opt to use the "Non-graceful +node shutdown" feature instead. + +Force storage detach on timeout can be disabled by setting the `disable-force-detach-on-timeout` +config field in `kube-controller-manager`. Disabling the force detach on timeout feature means +that a volume that is hosted on a node that is unhealthy for more than 6 minutes will not have +its associated +[VolumeAttachment](/docs/reference/kubernetes-api/config-and-storage-resources/volume-attachment-v1/) +deleted. + +After this setting has been applied, unhealthy pods still attached to a volumes must be recovered +via the [Non-Graceful Node Shutdown](#non-graceful-node-shutdown) procedure mentioned above. + +{{< note >}} +- Caution must be taken while using the [Non-Graceful Node Shutdown](#non-graceful-node-shutdown) procedure. +- Deviation from the steps documented above can result in data corruption. +{{< /note >}} + ## Swap memory management {#swap-memory} {{< feature-state feature_gate_name="NodeSwap" >}} To enable swap on a node, the `NodeSwap` feature gate must be enabled on -the kubelet, and the `--fail-swap-on` command line flag or `failSwapOn` +the kubelet (default is true), and the `--fail-swap-on` command line flag or `failSwapOn` [configuration setting](/docs/reference/config-api/kubelet-config.v1beta1/) -must be set to false. +must be set to false. +To allow Pods to utilize swap, `swapBehavior` should not be set to `NoSwap` (which is the default behavior) in the kubelet config. {{< warning >}} When the memory swap feature is turned on, Kubernetes data such as the content @@ -535,17 +565,16 @@ specify how a node will use swap memory. For example, ```yaml memorySwap: - swapBehavior: UnlimitedSwap + swapBehavior: LimitedSwap ``` -- `UnlimitedSwap` (default): Kubernetes workloads can use as much swap memory as they - request, up to the system limit. +- `NoSwap` (default): Kubernetes workloads will not use swap. - `LimitedSwap`: The utilization of swap memory by Kubernetes workloads is subject to limitations. Only Pods of Burstable QoS are permitted to employ swap. If configuration for `memorySwap` is not specified and the feature gate is enabled, by default the kubelet will apply the same behaviour as the -`UnlimitedSwap` setting. +`NoSwap` setting. With `LimitedSwap`, Pods that do not fall under the Burstable QoS classification (i.e. `BestEffort`/`Guaranteed` Qos Pods) are prohibited from utilizing swap memory. diff --git a/content/en/docs/concepts/cluster-administration/logging.md b/content/en/docs/concepts/cluster-administration/logging.md index ceffd28b15582..5cc9429578bf0 100644 --- a/content/en/docs/concepts/cluster-administration/logging.md +++ b/content/en/docs/concepts/cluster-administration/logging.md @@ -108,6 +108,15 @@ using the [kubelet configuration file](/docs/tasks/administer-cluster/kubelet-co These settings let you configure the maximum size for each log file and the maximum number of files allowed for each container respectively. +In order to perform an efficient log rotation in clusters where the volume of the logs generated by +the workload is large, kubelet also provides a mechanism to tune how the logs are rotated in +terms of how many concurrent log rotations can be performed and the interval at which the logs are +monitored and rotated as required. +You can configure two kubelet [configuration settings](/docs/reference/config-api/kubelet-config.v1beta1/), +`containerLogMaxWorkers` and `containerLogMonitorInterval` using the +[kubelet configuration file](/docs/tasks/administer-cluster/kubelet-config-file/). + + When you run [`kubectl logs`](/docs/reference/generated/kubectl/kubectl-commands#logs) as in the basic logging example, the kubelet on the node handles the request and reads directly from the log file. The kubelet returns the content of the log file. @@ -148,7 +157,7 @@ If systemd is not present, the kubelet and container runtime write to `.log` fil run the kubelet via a helper tool, `kube-log-runner`, and use that tool to redirect kubelet logs to a directory that you choose. -The kubelet always directs your container runtime to write logs into directories within +By default, kubelet directs your container runtime to write logs into directories within `/var/log/pods`. For more information on `kube-log-runner`, read [System Logs](/docs/concepts/cluster-administration/system-logs/#klog). @@ -166,7 +175,7 @@ If you want to have logs written elsewhere, you can indirectly run the kubelet via a helper tool, `kube-log-runner`, and use that tool to redirect kubelet logs to a directory that you choose. -However, the kubelet always directs your container runtime to write logs within the +However, by default, kubelet directs your container runtime to write logs within the directory `C:\var\log\pods`. For more information on `kube-log-runner`, read [System Logs](/docs/concepts/cluster-administration/system-logs/#klog). @@ -180,6 +189,22 @@ the `/var/log` directory, bypassing the default logging mechanism (the component do not write to the systemd journal). You can use Kubernetes' storage mechanisms to map persistent storage into the container that runs the component. +Kubelet allows changing the pod logs directory from default `/var/log/pods` +to a custom path. This adjustment can be made by configuring the `podLogsDir` +parameter in the kubelet's configuration file. + +{{< caution >}} +It's important to note that the default location `/var/log/pods` has been in use for +an extended period and certain processes might implicitly assume this path. +Therefore, altering this parameter must be approached with caution and at your own risk. + +Another caveat to keep in mind is that the kubelet supports the location being on the same +disk as `/var`. Otherwise, if the logs are on a separate filesystem from `/var`, +then the kubelet will not track that filesystem's usage, potentially leading to issues if +it fills up. + +{{< /caution >}} + For details about etcd and its logs, view the [etcd documentation](https://etcd.io/docs/). Again, you can use Kubernetes' storage mechanisms to map persistent storage into the container that runs the component. @@ -200,7 +225,7 @@ as your responsibility. ## Cluster-level logging architectures -While Kubernetes does not provide a native solution for cluster-level logging, there are +While Kubernetes does not provide a native solution for cluster-level logging, there are several common approaches you can consider. Here are some options: * Use a node-level logging agent that runs on every node. diff --git a/content/en/docs/concepts/cluster-administration/system-logs.md b/content/en/docs/concepts/cluster-administration/system-logs.md index 9fed93fc75dca..6af3effcdd5dc 100644 --- a/content/en/docs/concepts/cluster-administration/system-logs.md +++ b/content/en/docs/concepts/cluster-administration/system-logs.md @@ -122,7 +122,7 @@ second line.} ### Contextual Logging -{{< feature-state for_k8s_version="v1.24" state="alpha" >}} +{{< feature-state for_k8s_version="v1.30" state="beta" >}} Contextual logging builds on top of structured logging. It is primarily about how developers use logging calls: code based on that concept is more flexible @@ -133,8 +133,9 @@ If developers use additional functions like `WithValues` or `WithName` in their components, then log entries contain additional information that gets passed into functions by their caller. -Currently this is gated behind the `StructuredLogging` feature gate and -disabled by default. The infrastructure for this was added in 1.24 without +For Kubernetes {{< skew currentVersion >}}, this is gated behind the `ContextualLogging` +[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) and is +enabled by default. The infrastructure for this was added in 1.24 without modifying components. The [`component-base/logs/example`](https://github.com/kubernetes/kubernetes/blob/v1.24.0-beta.0/staging/src/k8s.io/component-base/logs/example/cmd/logger.go) command demonstrates how to use the new logging calls and how a component @@ -147,14 +148,14 @@ $ go run . --help --feature-gates mapStringBool A set of key=value pairs that describe feature gates for alpha/experimental features. Options are: AllAlpha=true|false (ALPHA - default=false) AllBeta=true|false (BETA - default=false) - ContextualLogging=true|false (ALPHA - default=false) + ContextualLogging=true|false (BETA - default=true) $ go run . --feature-gates ContextualLogging=true ... -I0404 18:00:02.916429 451895 logger.go:94] "example/myname: runtime" foo="bar" duration="1m0s" -I0404 18:00:02.916447 451895 logger.go:95] "example: another runtime" foo="bar" duration="1m0s" +I0222 15:13:31.645988 197901 example.go:54] "runtime" logger="example.myname" foo="bar" duration="1m0s" +I0222 15:13:31.646007 197901 example.go:55] "another runtime" logger="example" foo="bar" duration="1h0m0s" duration="1m0s" ``` -The `example` prefix and `foo="bar"` were added by the caller of the function +The `logger` key and `foo="bar"` were added by the caller of the function which logs the `runtime` message and `duration="1m0s"` value, without having to modify that function. @@ -165,8 +166,8 @@ is not in the log output anymore: ```console $ go run . --feature-gates ContextualLogging=false ... -I0404 18:03:31.171945 452150 logger.go:94] "runtime" duration="1m0s" -I0404 18:03:31.171962 452150 logger.go:95] "another runtime" duration="1m0s" +I0222 15:14:40.497333 198174 example.go:54] "runtime" duration="1m0s" +I0222 15:14:40.497346 198174 example.go:55] "another runtime" duration="1h0m0s" duration="1m0s" ``` ### JSON log format @@ -244,11 +245,11 @@ To help with debugging issues on nodes, Kubernetes v1.27 introduced a feature th running on the node. To use the feature, ensure that the `NodeLogQuery` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled for that node, and that the kubelet configuration options `enableSystemLogHandler` and `enableSystemLogQuery` are both set to true. On Linux -we assume that service logs are available via journald. On Windows we assume that service logs are available -in the application log provider. On both operating systems, logs are also available by reading files within +the assumption is that service logs are available via journald. On Windows the assumption is that service logs are +available in the application log provider. On both operating systems, logs are also available by reading files within `/var/log/`. -Provided you are authorized to interact with node objects, you can try out this alpha feature on all your nodes or +Provided you are authorized to interact with node objects, you can try out this feature on all your nodes or just a subset. Here is an example to retrieve the kubelet service logs from a node: ```shell @@ -293,4 +294,4 @@ kubectl get --raw "/api/v1/nodes/node-1.example/proxy/logs/?query=kubelet&patter * Read about [Contextual Logging](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/3077-contextual-logging) * Read about [deprecation of klog flags](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components) * Read about the [Conventions for logging severity](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md) - +* Read about [Log Query](https://kep.k8s.io/2258) diff --git a/content/en/docs/concepts/configuration/configmap.md b/content/en/docs/concepts/configuration/configmap.md index b8b3e09e219a7..49ece878fc8c5 100644 --- a/content/en/docs/concepts/configuration/configmap.md +++ b/content/en/docs/concepts/configuration/configmap.md @@ -208,6 +208,42 @@ ConfigMaps consumed as environment variables are not updated automatically and r A container using a ConfigMap as a [subPath](/docs/concepts/storage/volumes#using-subpath) volume mount will not receive ConfigMap updates. {{< /note >}} + +### Using Configmaps as environment variables + +To use a Configmap in an {{< glossary_tooltip text="environment variable" term_id="container-env-variables" >}} +in a Pod: + +1. For each container in your Pod specification, add an environment variable + for each Configmap key that you want to use to the + `env[].valueFrom.configMapKeyRef` field. +1. Modify your image and/or command line so that the program looks for values + in the specified environment variables. + +This is an example of defining a ConfigMap as a pod environment variable: +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: env-configmap +spec: + containers: + - name: envars-test-container + image: nginx + env: + - name: CONFIGMAP_USERNAME + valueFrom: + configMapKeyRef: + name: myconfigmap + key: username + +``` + +It's important to note that the range of characters allowed for environment +variable names in pods is [restricted](/docs/tasks/inject-data-application/define-environment-variable-container/#using-environment-variables-inside-of-your-config). +If any keys do not meet the rules, those keys are not made available to your container, though +the Pod is allowed to start. + ## Immutable ConfigMaps {#configmap-immutable} {{< feature-state for_k8s_version="v1.21" state="stable" >}} diff --git a/content/en/docs/concepts/configuration/secret.md b/content/en/docs/concepts/configuration/secret.md index 65f3a56f6ab02..e27f86cc0ebe8 100644 --- a/content/en/docs/concepts/configuration/secret.md +++ b/content/en/docs/concepts/configuration/secret.md @@ -567,25 +567,10 @@ in a Pod: For instructions, refer to [Define container environment variables using Secret data](/docs/tasks/inject-data-application/distribute-credentials-secure/#define-container-environment-variables-using-secret-data). -#### Invalid environment variables {#restriction-env-from-invalid} - -If your environment variable definitions in your Pod specification are -considered to be invalid environment variable names, those keys aren't made -available to your container. The Pod is allowed to start. - -Kubernetes adds an Event with the reason set to `InvalidVariableNames` and a -message that lists the skipped invalid keys. The following example shows a Pod that refers to a Secret named `mysecret`, where `mysecret` contains 2 invalid keys: `1badkey` and `2alsobad`. - -```shell -kubectl get events -``` - -The output is similar to: - -``` -LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON -0s 0s 1 dapi-test-pod Pod Warning InvalidEnvironmentVariableNames kubelet, 127.0.0.1 Keys [1badkey, 2alsobad] from the EnvFrom secret default/mysecret were skipped since they are considered invalid environment variable names. -``` +It's important to note that the range of characters allowed for environment variable +names in pods is [restricted](/docs/tasks/inject-data-application/define-environment-variable-container/#using-environment-variables-inside-of-your-config). +If any keys do not meet the rules, those keys are not made available to your container, though +the Pod is allowed to start. ### Container image pull Secrets {#using-imagepullsecrets} diff --git a/content/en/docs/concepts/containers/container-lifecycle-hooks.md b/content/en/docs/concepts/containers/container-lifecycle-hooks.md index 9cb9fa367779d..21d6acc1d6286 100644 --- a/content/en/docs/concepts/containers/container-lifecycle-hooks.md +++ b/content/en/docs/concepts/containers/container-lifecycle-hooks.md @@ -56,8 +56,7 @@ There are three types of hook handlers that can be implemented for Containers: Resources consumed by the command are counted against the Container. * HTTP - Executes an HTTP request against a specific endpoint on the Container. * Sleep - Pauses the container for a specified duration. - The "Sleep" action is available when the [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) - `PodLifecycleSleepAction` is enabled. + This is a beta-level feature default enabled by the `PodLifecycleSleepAction` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/). ### Hook handler execution diff --git a/content/en/docs/concepts/extend-kubernetes/api-extension/custom-resources.md b/content/en/docs/concepts/extend-kubernetes/api-extension/custom-resources.md index d4042bfd792a5..5e8ac0a935c24 100644 --- a/content/en/docs/concepts/extend-kubernetes/api-extension/custom-resources.md +++ b/content/en/docs/concepts/extend-kubernetes/api-extension/custom-resources.md @@ -295,6 +295,50 @@ When you add a custom resource, you can access it using: (generating one is an advanced undertaking, but some projects may provide a client along with the CRD or AA). + +## Custom resource field selectors + +[Field Selectors](/docs/concepts/overview/working-with-objects/field-selectors/) +let clients select custom resources based on the value of one or more resource +fields. + +All custom resources support the `metadata.name` and `metadata.namespace` field +selectors. + +Fields declared in a {{< glossary_tooltip term_id="CustomResourceDefinition" text="CustomResourceDefinition" >}} +may also be used with field selectors when included in the `spec.versions[*].selectableFields` field of the +{{< glossary_tooltip term_id="CustomResourceDefinition" text="CustomResourceDefinition" >}}. + +### Selectable fields for custom resources {#crd-selectable-fields} + +{{< feature-state feature_gate_name="CustomResourceFieldSelectors" >}} + +You need to enable the `CustomResourceFieldSelectors` +[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to +use this behavior, which then applies to all CustomResourceDefinitions in your +cluster. + +The `spec.versions[*].selectableFields` field of a {{< glossary_tooltip term_id="CustomResourceDefinition" text="CustomResourceDefinition" >}} may be used to +declare which other fields in a custom resource may be used in field selectors. +The following example adds the `.spec.color` and `.spec.size` fields as +selectable fields. + +{{% code_sample file="customresourcedefinition/shirt-resource-definition.yaml" %}} + +Field selectors can then be used to get only resources with with a `color` of `blue`: + +```shell +kubectl get shirts.stable.example.com --field-selector spec.color=blue +``` + +The output should be: + +``` +NAME COLOR SIZE +example1 blue S +example2 blue M +``` + ## {{% heading "whatsnext" %}} * Learn how to [Extend the Kubernetes API with the aggregation layer](/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/). diff --git a/content/en/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins.md b/content/en/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins.md index cae75f99e334a..089d0c418381c 100644 --- a/content/en/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins.md +++ b/content/en/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins.md @@ -54,19 +54,6 @@ that plugin or [networking provider](/docs/concepts/cluster-administration/netwo ## Network Plugin Requirements -For plugin developers and users who regularly build or deploy Kubernetes, the plugin may also need -specific configuration to support kube-proxy. The iptables proxy depends on iptables, and the -plugin may need to ensure that container traffic is made available to iptables. For example, if -the plugin connects containers to a Linux bridge, the plugin must set the -`net/bridge/bridge-nf-call-iptables` sysctl to `1` to ensure that the iptables proxy functions -correctly. If the plugin does not use a Linux bridge, but uses something like Open vSwitch or -some other mechanism instead, it should ensure container traffic is appropriately routed for the -proxy. - -By default, if no kubelet network plugin is specified, the `noop` plugin is used, which sets -`net/bridge/bridge-nf-call-iptables=1` to ensure simple configurations (like Docker with a bridge) -work correctly with the iptables proxy. - ### Loopback CNI In addition to the CNI plugin installed on the nodes for implementing the Kubernetes network diff --git a/content/en/docs/concepts/overview/kubernetes-api.md b/content/en/docs/concepts/overview/kubernetes-api.md index ce703f67accb3..c8e5840ad0bf2 100644 --- a/content/en/docs/concepts/overview/kubernetes-api.md +++ b/content/en/docs/concepts/overview/kubernetes-api.md @@ -71,22 +71,22 @@ separate endpoint for each group version. ### Aggregated discovery -{{< feature-state state="beta" for_k8s_version="v1.27" >}} +{{< feature-state feature_gate_name="AggregatedDiscoveryEndpoint" >}} -Kubernetes offers beta support for aggregated discovery, publishing +Kubernetes offers stable support for _aggregated discovery_, publishing all resources supported by a cluster through two endpoints (`/api` and `/apis`). Requesting this endpoint drastically reduces the number of requests sent to fetch the discovery data from the cluster. You can access the data by requesting the respective endpoints with an `Accept` header indicating the aggregated discovery resource: -`Accept: application/json;v=v2beta1;g=apidiscovery.k8s.io;as=APIGroupDiscoveryList`. +`Accept: application/json;v=v2;g=apidiscovery.k8s.io;as=APIGroupDiscoveryList`. Without indicating the resource type using the `Accept` header, the default response for the `/api` and `/apis` endpoint is an unaggregated discovery document. -The [discovery document](https://github.com/kubernetes/kubernetes/blob/release-{{< skew currentVersion >}}/api/discovery/aggregated_v2beta1.json) +The [discovery document](https://github.com/kubernetes/kubernetes/blob/release-{{< skew currentVersion >}}/api/discovery/aggregated_v2.json) for the built-in resources can be found in the Kubernetes GitHub repository. This Github document can be used as a reference of the base set of the available resources if a Kubernetes cluster is not available to query. @@ -282,30 +282,6 @@ packages that define the API objects. Kubernetes stores the serialized state of objects by writing them into {{< glossary_tooltip term_id="etcd" >}}. -## API Discovery - -A list of all group versions supported by a cluster is published at -the `/api` and `/apis` endpoints. Each group version also advertises -the list of resources supported via `/apis//` (for -example: `/apis/rbac.authorization.k8s.io/v1alpha1`). These endpoints -are used by kubectl to fetch the list of resources supported by a -cluster. - -### Aggregated Discovery - -{{< feature-state feature_gate_name="AggregatedDiscoveryEndpoint" >}} - -Kubernetes offers beta support for aggregated discovery, publishing -all resources supported by a cluster through two endpoints (`/api` and -`/apis`) compared to one for every group version. Requesting this -endpoint drastically reduces the number of requests sent to fetch the -discovery for the average Kubernetes cluster. This may be accessed by -requesting the respective endpoints with an Accept header indicating -the aggregated discovery resource: -`Accept: application/json;v=v2beta1;g=apidiscovery.k8s.io;as=APIGroupDiscoveryList`. - -The endpoint also supports ETag and protobuf encoding. - ## API groups and versioning To make it easier to eliminate fields or restructure resource representations, diff --git a/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md b/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md index fdc5408a9c5d0..f68f078f42b8f 100644 --- a/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md +++ b/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md @@ -9,13 +9,16 @@ weight: 65 -{{< feature-state for_k8s_version="v1.27" state="alpha" >}} +{{< feature-state feature_gate_name="DynamicResourceAllocation" >}} Dynamic resource allocation is an API for requesting and sharing resources between pods and containers inside a pod. It is a generalization of the persistent volumes API for generic resources. Third-party resource drivers are -responsible for tracking and allocating resources. Different kinds of -resources support arbitrary parameters for defining requirements and +responsible for tracking and allocating resources, with additional support +provided by Kubernetes via _structured parameters_ (introduced in Kubernetes 1.30). +When a driver uses structured parameters, Kubernetes handles scheduling +and resource allocation without having to communicate with the driver. +Different kinds of resources support arbitrary parameters for defining requirements and initialization. ## {{% heading "prerequisites" %}} @@ -56,11 +59,39 @@ PodSchedulingContext to coordinate pod scheduling when ResourceClaims need to be allocated for a Pod. +ResourceSlice +: Used with structured parameters to publish information about resources + that are available in the cluster. + +ResourceClaimParameters +: Contain the parameters for a ResourceClaim which influence scheduling, + in a format that is understood by Kubernetes (the "structured parameter + model"). Additional parameters may be embedded in an opaque + extension, for use by the vendor driver when setting up the underlying + resource. + +ResourceClassParameters +: Similar to ResourceClaimParameters, the ResourceClassParameters provides + a type for ResourceClass parameters which is understood by Kubernetes. + Parameters for ResourceClass and ResourceClaim are stored in separate objects, typically using the type defined by a {{< glossary_tooltip term_id="CustomResourceDefinition" text="CRD" >}} that was created when installing a resource driver. +The developer of a resource driver decides whether they want to handle these +parameters in their own external controller or instead rely on Kubernetes to +handle them through the use of structured parameters. A +custom controller provides more flexibility, but cluster autoscaling is not +going to work reliably for node-local resources. Structured parameters enable +cluster autoscaling, but might not satisfy all use-cases. + +When a driver uses structured parameters, it is still possible to let the +end-user specify parameters with vendor-specific CRDs. When doing so, the +driver needs to translate those +custom parameters into the in-tree types. Alternatively, a driver may also +document how to use the in-tree types directly. + The `core/v1` `PodSpec` defines ResourceClaims that are needed for a Pod in a `resourceClaims` field. Entries in that list reference either a ResourceClaim or a ResourceClaimTemplate. When referencing a ResourceClaim, all Pods using @@ -129,8 +160,11 @@ spec: ## Scheduling +### Without structured parameters + In contrast to native resources (CPU, RAM) and extended resources (managed by a -device plugin, advertised by kubelet), the scheduler has no knowledge of what +device plugin, advertised by kubelet), without structured parameters +the scheduler has no knowledge of what dynamic resources are available in a cluster or how they could be split up to satisfy the requirements of a specific ResourceClaim. Resource drivers are responsible for that. They mark ResourceClaims as "allocated" once resources @@ -172,6 +206,27 @@ ResourceClaims, and thus scheduling the next pod gets delayed. {{< /note >}} +### With structured parameters + +When a driver uses structured parameters, the scheduler takes over the +responsibility of allocating resources to a ResourceClaim whenever a pod needs +them. It does so by retrieving the full list of available resources from +ResourceSlice objects, tracking which of those resources have already been +allocated to existing ResourceClaims, and then selecting from those resources +that remain. The exact resources selected are subject to the constraints +provided in any ResourceClaimParameters or ResourceClassParameters associated +with the ResourceClaim. + +The chosen resource is recorded in the ResourceClaim status together with any +vendor-specific parameters, so when a pod is about to start on a node, the +resource driver on the node has all the information it needs to prepare the +resource. + +By using structured parameters, the scheduler is able to reach a decision +without communicating with any DRA resource drivers. It is also able to +schedule multiple pods quickly by keeping information about ResourceClaim +allocations in memory and writing this information to the ResourceClaim objects +in the background while concurrently binding the pod to a node. ## Monitoring resources @@ -193,7 +248,13 @@ was not enabled in the scheduler at the time when the Pod got scheduled detects this and tries to make the Pod runnable by triggering allocation and/or reserving the required ResourceClaims. -However, it is better to avoid this because a Pod that is assigned to a node +{{< note >}} + +This only works with resource drivers that don't use structured parameters. + +{{< /note >}} + +It is better to avoid bypassing the scheduler because a Pod that is assigned to a node blocks normal resources (RAM, CPU) that then cannot be used for other Pods while the Pod is stuck. To make a Pod run on a specific node while still going through the normal scheduling flow, create the Pod with a node selector that @@ -255,4 +316,5 @@ be installed. Please refer to the driver's documentation for details. ## {{% heading "whatsnext" %}} - For more information on the design, see the -[Dynamic Resource Allocation KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/3063-dynamic-resource-allocation/README.md). +[Dynamic Resource Allocation KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/3063-dynamic-resource-allocation/README.md) + and the [Structured Parameters KEP](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4381-dra-structured-parameters). diff --git a/content/en/docs/concepts/scheduling-eviction/node-pressure-eviction.md b/content/en/docs/concepts/scheduling-eviction/node-pressure-eviction.md index fcadea652ea2e..76f930721ef6e 100644 --- a/content/en/docs/concepts/scheduling-eviction/node-pressure-eviction.md +++ b/content/en/docs/concepts/scheduling-eviction/node-pressure-eviction.md @@ -169,6 +169,7 @@ The kubelet has the following default hard eviction thresholds: - `nodefs.available<10%` - `imagefs.available<15%` - `nodefs.inodesFree<5%` (Linux nodes) +- `imagefs.inodesFree<5%` (Linux nodes) These default values of hard eviction thresholds will only be set if none of the parameters is changed. If you change the value of any parameter, diff --git a/content/en/docs/concepts/scheduling-eviction/pod-scheduling-readiness.md b/content/en/docs/concepts/scheduling-eviction/pod-scheduling-readiness.md index e895ffd5fb5bc..9b6f98066b2a3 100644 --- a/content/en/docs/concepts/scheduling-eviction/pod-scheduling-readiness.md +++ b/content/en/docs/concepts/scheduling-eviction/pod-scheduling-readiness.md @@ -6,7 +6,7 @@ weight: 40 -{{< feature-state for_k8s_version="v1.27" state="beta" >}} +{{< feature-state for_k8s_version="v1.30" state="stable" >}} Pods were considered ready for scheduling once created. Kubernetes scheduler does its due diligence to find nodes to place all pending Pods. However, in a @@ -89,9 +89,7 @@ The metric `scheduler_pending_pods` comes with a new label `"gated"` to distingu has been tried scheduling but claimed as unschedulable, or explicitly marked as not ready for scheduling. You can use `scheduler_pending_pods{queue="gated"}` to check the metric result. -## Mutable Pod Scheduling Directives - -{{< feature-state for_k8s_version="v1.27" state="beta" >}} +## Mutable Pod scheduling directives You can mutate scheduling directives of Pods while they have scheduling gates, with certain constraints. At a high level, you can only tighten the scheduling directives of a Pod. In other words, the updated diff --git a/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md b/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md index 6ebddabd8adc4..d82dcdd065e1b 100644 --- a/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md +++ b/content/en/docs/concepts/scheduling-eviction/topology-spread-constraints.md @@ -60,7 +60,7 @@ spec: # Configure a topology spread constraint topologySpreadConstraints: - maxSkew: - minDomains: # optional; beta since v1.25 + minDomains: # optional topologyKey: whenUnsatisfiable: labelSelector: @@ -96,11 +96,11 @@ your cluster. Those fields are: A domain is a particular instance of a topology. An eligible domain is a domain whose nodes match the node selector. + {{< note >}} - The `MinDomainsInPodTopologySpread` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) - enables `minDomains` for pod topology spread. Starting from v1.28, - the `MinDomainsInPodTopologySpread` gate - is enabled by default. In older Kubernetes clusters it might be explicitly + Before Kubernetes v1.30, the `minDomains` field was only available if the + `MinDomainsInPodTopologySpread` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) + was enabled (default since v1.28). In older Kubernetes clusters it might be explicitly disabled or the field might not be available. {{< /note >}} diff --git a/content/en/docs/concepts/security/cloud-native-security.md b/content/en/docs/concepts/security/cloud-native-security.md index 778dba0c3836e..d8f4ccdd7b998 100644 --- a/content/en/docs/concepts/security/cloud-native-security.md +++ b/content/en/docs/concepts/security/cloud-native-security.md @@ -143,7 +143,7 @@ To protect your compute at runtime, you can: Pods with different trust contexts are run on separate sets of nodes. 1. Use a {{< glossary_tooltip text="container runtime" term_id="container-runtime" >}} that provides security restrictions. -1. On Linux nodes, use a Linux security module such as [AppArmor](/docs/tutorials/security/apparmor/) (beta) +1. On Linux nodes, use a Linux security module such as [AppArmor](/docs/tutorials/security/apparmor/) or [seccomp](/docs/tutorials/security/seccomp/). ### Runtime protection: storage {#protection-runtime-storage} @@ -223,4 +223,3 @@ logs are both tamper-proof and confidential. * [Network policies](/docs/concepts/services-networking/network-policies/) for Pods * [Pod security standards](/docs/concepts/security/pod-security-standards/) * [RuntimeClasses](/docs/concepts/containers/runtime-class) - diff --git a/content/en/docs/concepts/security/pod-security-admission.md b/content/en/docs/concepts/security/pod-security-admission.md index 95a2d6e3f4b88..c4caa6905d38d 100644 --- a/content/en/docs/concepts/security/pod-security-admission.md +++ b/content/en/docs/concepts/security/pod-security-admission.md @@ -121,7 +121,7 @@ current policy level: - Any metadata updates **except** changes to the seccomp or AppArmor annotations: - `seccomp.security.alpha.kubernetes.io/pod` (deprecated) - `container.seccomp.security.alpha.kubernetes.io/*` (deprecated) - - `container.apparmor.security.beta.kubernetes.io/*` + - `container.apparmor.security.beta.kubernetes.io/*` (deprecated) - Valid updates to `.spec.activeDeadlineSeconds` - Valid updates to `.spec.tolerations` diff --git a/content/en/docs/concepts/security/pod-security-standards.md b/content/en/docs/concepts/security/pod-security-standards.md index 9757e581598a2..fb9cab9d15c8c 100644 --- a/content/en/docs/concepts/security/pod-security-standards.md +++ b/content/en/docs/concepts/security/pod-security-standards.md @@ -170,8 +170,21 @@ fail validation. AppArmor -

On supported hosts, the runtime/default AppArmor profile is applied by default. The baseline policy should prevent overriding or disabling the default AppArmor profile, or restrict overrides to an allowed set of profiles.

+

On supported hosts, the RuntimeDefault AppArmor profile is applied by default. The baseline policy should prevent overriding or disabling the default AppArmor profile, or restrict overrides to an allowed set of profiles.

Restricted Fields

+
    +
  • spec.securityContext.appArmorProfile.type
  • +
  • spec.containers[*].securityContext.appArmorProfile.type
  • +
  • spec.initContainers[*].securityContext.appArmorProfile.type
  • +
  • spec.ephemeralContainers[*].securityContext.appArmorProfile.type
  • +
+

Allowed Values

+
    +
  • Undefined/nil
  • +
  • RuntimeDefault
  • +
  • Localhost
  • +
+
  • metadata.annotations["container.apparmor.security.beta.kubernetes.io/*"]
@@ -532,4 +545,3 @@ kernel. This allows for workloads requiring heightened permissions to still be i Additionally, the protection of sandboxed workloads is highly dependent on the method of sandboxing. As such, no single recommended profile is recommended for all sandboxed workloads. - diff --git a/content/en/docs/concepts/security/security-checklist.md b/content/en/docs/concepts/security/security-checklist.md index 6987a6b92aaf1..e78b8da0c5dff 100644 --- a/content/en/docs/concepts/security/security-checklist.md +++ b/content/en/docs/concepts/security/security-checklist.md @@ -177,10 +177,10 @@ Seccomp is only available on Linux nodes. #### AppArmor -[AppArmor](https://apparmor.net/) is a Linux kernel security module that can +[AppArmor](/docs/tutorials/security/apparmor/) is a Linux kernel security module that can provide an easy way to implement Mandatory Access Control (MAC) and better -auditing through system logs. To [enable AppArmor in Kubernetes](/docs/tutorials/security/apparmor/), -at least version 1.4 is required. Like seccomp, AppArmor is also configured +auditing through system logs. A default AppArmor profile is enforced on nodes that support it, or a custom profile can be configured. +Like seccomp, AppArmor is also configured through profiles, where each profile is either running in enforcing mode, which blocks access to disallowed resources or complain mode, which only reports violations. AppArmor profiles are enforced on a per-container basis, with an diff --git a/content/en/docs/concepts/services-networking/service.md b/content/en/docs/concepts/services-networking/service.md index 1e32c22448e22..63cc7e5e4d0e4 100644 --- a/content/en/docs/concepts/services-networking/service.md +++ b/content/en/docs/concepts/services-networking/service.md @@ -622,6 +622,16 @@ You can integrate with [Gateway](https://gateway-api.sigs.k8s.io/) rather than S can define your own (provider specific) annotations on the Service that specify the equivalent detail. {{< /note >}} +#### Node liveness impact on load balancer traffic + +Load balancer health checks are critical to modern applications. They are used to +determine which server (virtual machine, or IP address) the load balancer should +dispatch traffic to. The Kubernetes APIs do not define how health checks have to be +implemented for Kubernetes managed load balancers, instead it's the cloud providers +(and the people implementing integration code) who decide on the behavior. Load +balancer health checks are extensively used within the context of supporting the +`externalTrafficPolicy` field for Services. + #### Load balancers with mixed protocol types {{< feature-state feature_gate_name="MixedProtocolLBService" >}} @@ -675,7 +685,7 @@ Unprefixed names are reserved for end-users. {{< feature-state feature_gate_name="LoadBalancerIPMode" >}} -Starting as Alpha in Kubernetes 1.29, +As a Beta feature in Kubernetes 1.30, a [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) named `LoadBalancerIPMode` allows you to set the `.status.loadBalancer.ingress.ipMode` for a Service with `type` set to `LoadBalancer`. @@ -980,6 +990,35 @@ to control how Kubernetes routes traffic to healthy (“ready”) backends. See [Traffic Policies](/docs/reference/networking/virtual-ips/#traffic-policies) for more details. +### Trafic distribution + +{{< feature-state feature_gate_name="ServiceTrafficDistribution" >}} + +The `.spec.trafficDistribution` field provides another way to influence traffic +routing within a Kubernetes Service. While traffic policies focus on strict +semantic guarantees, traffic distribution allows you to express _preferences_ +(such as routing to topologically closer endpoints). This can help optimize for +performance, cost, or reliability. This optional field can be used if you have +enabled the `ServiceTrafficDistribution` [feature +gate](/docs/reference/command-line-tools-reference/feature-gates/) for your +cluster and all of its nodes. In Kubernetes {{< skew currentVersion >}}, the +following field value is supported: + +`PreferClose` +: Indicates a preference for routing traffic to endpoints that are topologically + proximate to the client. The interpretation of "topologically proximate" may + vary across implementations and could encompass endpoints within the same + node, rack, zone, or even region. Setting this value gives implementations + permission to make different tradeoffs, e.g. optimizing for proximity rather + than equal distribution of load. Users should not set this value if such + tradeoffs are not acceptable. + +If the field is not set, the implementation will apply its default routing strategy. + +See [Traffic +Distribution](/docs/reference/networking/virtual-ips/#traffic-distribution) for +more details + ### Session stickiness If you want to make sure that connections from a particular client are passed to diff --git a/content/en/docs/concepts/services-networking/topology-aware-routing.md b/content/en/docs/concepts/services-networking/topology-aware-routing.md index 7092946e85b65..05e93549ebff8 100644 --- a/content/en/docs/concepts/services-networking/topology-aware-routing.md +++ b/content/en/docs/concepts/services-networking/topology-aware-routing.md @@ -198,3 +198,8 @@ yet cover some relevant and plausible situations. ## {{% heading "whatsnext" %}} * Follow the [Connecting Applications with Services](/docs/tutorials/services/connect-applications-service/) tutorial +* Learn about the + [trafficDistribution](/docs/concepts/services-networking/service/#trafic-distribution) + field, which is closely related to the `service.kubernetes.io/topology-mode` + annotation and provides flexible options for traffic routing within + Kubernetes. diff --git a/content/en/docs/concepts/storage/persistent-volumes.md b/content/en/docs/concepts/storage/persistent-volumes.md index cb0f123423d64..65041f5587d3a 100644 --- a/content/en/docs/concepts/storage/persistent-volumes.md +++ b/content/en/docs/concepts/storage/persistent-volumes.md @@ -509,30 +509,33 @@ PersistentVolume types are implemented as plugins. Kubernetes currently supports mounted on nodes. * [`nfs`](/docs/concepts/storage/volumes/#nfs) - Network File System (NFS) storage -The following types of PersistentVolume are deprecated. -This means that support is still available but will be removed in a future Kubernetes release. +The following types of PersistentVolume are deprecated but still available. +If you are using these volume types except for `flexVolume`, `cephfs` and `rbd`, +please install corresponding CSI drivers. +* [`awsElasticBlockStore`](/docs/concepts/storage/volumes/#awselasticblockstore) - AWS Elastic Block Store (EBS) + (**migration on by default** starting v1.23) +* [`azureDisk`](/docs/concepts/storage/volumes/#azuredisk) - Azure Disk + (**migration on by default** starting v1.23) * [`azureFile`](/docs/concepts/storage/volumes/#azurefile) - Azure File - (**deprecated** in v1.21) + (**migration on by default** starting v1.24) +* [`cephfs`](/docs/concepts/storage/volumes/#cephfs) - CephFS volume + (**deprecated** starting v1.28, no migration plan, support will be removed in a future release) +* [`cinder`](/docs/concepts/storage/volumes/#cinder) - Cinder (OpenStack block storage) + (**migration on by default** starting v1.21) * [`flexVolume`](/docs/concepts/storage/volumes/#flexvolume) - FlexVolume - (**deprecated** in v1.23) + (**deprecated** starting v1.23, no migration plan and no plan to remove support) +* [`gcePersistentDisk`](/docs/concepts/storage/volumes/#gcePersistentDisk) - GCE Persistent Disk + (**migration on by default** starting v1.23) * [`portworxVolume`](/docs/concepts/storage/volumes/#portworxvolume) - Portworx volume - (**deprecated** in v1.25) -* [`vsphereVolume`](/docs/concepts/storage/volumes/#vspherevolume) - vSphere VMDK volume - (**deprecated** in v1.19) -* [`cephfs`](/docs/concepts/storage/volumes/#cephfs) - CephFS volume - (**deprecated** in v1.28) + (**deprecated** starting v1.25) * [`rbd`](/docs/concepts/storage/volumes/#rbd) - Rados Block Device (RBD) volume - (**deprecated** in v1.28) + (**deprecated** starting v1.28, no migration plan, support will be removed in a future release) +* [`vsphereVolume`](/docs/concepts/storage/volumes/#vspherevolume) - vSphere VMDK volume + (**migration on by default** starting v1.25) Older versions of Kubernetes also supported the following in-tree PersistentVolume types: -* [`awsElasticBlockStore`](/docs/concepts/storage/volumes/#awselasticblockstore) - AWS Elastic Block Store (EBS) - (**not available** in v1.27) -* [`azureDisk`](/docs/concepts/storage/volumes/#azuredisk) - Azure Disk - (**not available** in v1.27) -* [`cinder`](/docs/concepts/storage/volumes/#cinder) - Cinder (OpenStack block storage) - (**not available** in v1.26) * `photonPersistentDisk` - Photon controller persistent disk. (**not available** starting v1.15) * `scaleIO` - ScaleIO volume. diff --git a/content/en/docs/concepts/storage/volumes.md b/content/en/docs/concepts/storage/volumes.md index cd5fc1ad24009..a673ecf27b9cc 100644 --- a/content/en/docs/concepts/storage/volumes.md +++ b/content/en/docs/concepts/storage/volumes.md @@ -65,12 +65,14 @@ a different volume. Kubernetes supports several types of volumes. -### awsElasticBlockStore (removed) {#awselasticblockstore} +### awsElasticBlockStore (deprecated) {#awselasticblockstore} -Kubernetes {{< skew currentVersion >}} does not include a `awsElasticBlockStore` volume type. +In Kubernetes {{< skew currentVersion >}}, all operations for the in-tree `awsElasticBlockStore` type +are redirected to the `ebs.csi.aws.com` {{< glossary_tooltip text="CSI" term_id="csi" >}} driver. + The AWSElasticBlockStore in-tree storage driver was deprecated in the Kubernetes v1.19 release and then removed entirely in the v1.27 release. @@ -78,12 +80,13 @@ and then removed entirely in the v1.27 release. The Kubernetes project suggests that you use the [AWS EBS](https://github.com/kubernetes-sigs/aws-ebs-csi-driver) third party storage driver instead. -### azureDisk (removed) {#azuredisk} +### azureDisk (deprecated) {#azuredisk} -Kubernetes {{< skew currentVersion >}} does not include a `azureDisk` volume type. +In Kubernetes {{< skew currentVersion >}}, all operations for the in-tree `azureDisk` type +are redirected to the `disk.csi.azure.com` {{< glossary_tooltip text="CSI" term_id="csi" >}} driver. The AzureDisk in-tree storage driver was deprecated in the Kubernetes v1.19 release and then removed entirely in the v1.27 release. @@ -121,7 +124,7 @@ Azure File CSI driver does not support using same volume with different fsgroups To disable the `azureFile` storage plugin from being loaded by the controller manager and the kubelet, set the `InTreePluginAzureFileUnregister` flag to `true`. -### cephfs +### cephfs (deprecated) {#cephfs} {{< feature-state for_k8s_version="v1.28" state="deprecated" >}} {{< note >}} @@ -142,12 +145,13 @@ You must have your own Ceph server running with the share exported before you ca See the [CephFS example](https://github.com/kubernetes/examples/tree/master/volumes/cephfs/) for more details. -### cinder (removed) {#cinder} +### cinder (deprecated) {#cinder} -Kubernetes {{< skew currentVersion >}} does not include a `cinder` volume type. +In Kubernetes {{< skew currentVersion >}}, all operations for the in-tree `cinder` type +are redirected to the `cinder.csi.openstack.org` {{< glossary_tooltip text="CSI" term_id="csi" >}} driver. The OpenStack Cinder in-tree storage driver was deprecated in the Kubernetes v1.11 release and then removed entirely in the v1.26 release. @@ -298,9 +302,10 @@ beforehand so that Kubernetes hosts can access them. See the [fibre channel example](https://github.com/kubernetes/examples/tree/master/staging/volumes/fibre_channel) for more details. -### gcePersistentDisk (removed) {#gcepersistentdisk} +### gcePersistentDisk (deprecated) {#gcepersistentdisk} -Kubernetes {{< skew currentVersion >}} does not include a `gcePersistentDisk` volume type. +In Kubernetes {{< skew currentVersion >}}, all operations for the in-tree `gcePersistentDisk` type +are redirected to the `pd.csi.storage.gke.io` {{< glossary_tooltip text="CSI" term_id="csi" >}} driver. The `gcePersistentDisk` in-tree storage driver was deprecated in the Kubernetes v1.17 release and then removed entirely in the v1.28 release. @@ -1225,7 +1230,65 @@ in `containers[*].volumeMounts`. Its values are: (unmounted) by the containers on termination. {{< /warning >}} +## Read-only mounts + +A mount can be made read-only by setting the `.spec.containers[].volumeMounts[].readOnly` +field to `true`. +This does not make the volume itself read-only, but that specific container will +not be able to write to it. +Other containers in the Pod may mount the same volume as read-write. + +On Linux, read-only mounts are not recursively read-only by default. +For example, consider a Pod which mounts the hosts `/mnt` as a `hostPath` volume. If +there is another filesystem mounted read-write on `/mnt/` (such as tmpfs, +NFS, or USB storage), the volume mounted into the container(s) will also have a writeable +`/mnt/`, even if the mount itself was specified as read-only. + +### Recursive read-only mounts + +{{< feature-state feature_gate_name="RecursiveReadOnlyMounts" >}} + +Recursive read-only mounts can be enabled by setting the +`RecursiveReadOnlyMounts` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) +for kubelet and kube-apiserver, and setting the `.spec.containers[].volumeMounts[].recursiveReadOnly` +field for a pod. + +The allowed values are: + +* `Disabled` (default): no effect. + +* `Enabled`: makes the mount recursively read-only. + Needs all the following requirements to be satisfied: + * `readOnly` is set to `true` + * `mountPropagation` is unset, or, set to `None` + * The host is running with Linux kernel v5.12 or later + * The [CRI-level](/docs/concepts/architecture/cri) container runtime supports recursive read-only mounts + * The OCI-level container runtime supports recursive read-only mounts. + It will fail if any of these is not true. + +* `IfPossible`: attempts to apply `Enabled`, and falls back to `Disabled` + if the feature is not supported by the kernel or the runtime class. + +Example: +{{% code_sample file="storage/rro.yaml" %}} + +When this property is recognized by kubelet and kube-apiserver, +the `.status.containerStatuses[].volumeMounts[].recursiveReadOnly` field is set to either +`Enabled` or `Disabled`. + + +#### Implementations {#implementations-rro} + +{{% thirdparty-content %}} + +The following container runtimes are known to support recursive read-only mounts. + +CRI-level: +- [containerd](https://containerd.io/), since v2.0 +OCI-level: +- [runc](https://runc.io/), since v1.1 +- [crun](https://github.com/containers/crun), since v1.8.6 ## {{% heading "whatsnext" %}} diff --git a/content/en/docs/concepts/workloads/controllers/job.md b/content/en/docs/concepts/workloads/controllers/job.md index c91f856e7f9df..6a9a916aed412 100644 --- a/content/en/docs/concepts/workloads/controllers/job.md +++ b/content/en/docs/concepts/workloads/controllers/job.md @@ -553,6 +553,62 @@ terminating Pods only once these Pods reach the terminal `Failed` phase. This be to `podReplacementPolicy: Failed`. For more information, see [Pod replacement policy](#pod-replacement-policy). {{< /note >}} +## Success policy {#success-policy} + +{{< feature-state feature_gate_name="JobSuccessPolicy" >}} + +{{< note >}} +You can only configure a success policy for an Indexed Job if you have the +`JobSuccessPolicy` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) +enabled in your cluster. +{{< /note >}} + +When creating an Indexed Job, you can define when a Job can be declared as succeeded using a `.spec.successPolicy`, +based on the pods that succeeded. + +By default, a Job succeeds when the number of succeeded Pods equals `.spec.completions`. +These are some situations where you might want additional control for declaring a Job succeeded: + +* When running simulations with different parameters, + you might not need all the simulations to succeed for the overall Job to be successful. +* When following a leader-worker pattern, only the success of the leader determines the success or + failure of a Job. Examples of this are frameworks like MPI and PyTorch etc. + +You can configure a success policy, in the `.spec.successPolicy` field, +to meet the above use cases. This policy can handle Job success based on the +succeeded pods. After the Job meets the success policy, the job controller terminates the lingering Pods. +A success policy is defined by rules. Each rule can take one of the following forms: + +* When you specify the `succeededIndexes` only, + once all indexes specified in the `succeededIndexes` succeed, the job controller marks the Job as succeeded. + The `succeededIndexes` must be a list of intervals between 0 and `.spec.completions-1`. +* When you specify the `succeededCount` only, + once the number of succeeded indexes reaches the `succeededCount`, the job controller marks the Job as succeeded. +* When you specify both `succeededIndexes` and `succeededCount`, + once the number of succeeded indexes from the subset of indexes specified in the `succeededIndexes` reaches the `succeededCount`, + the job controller marks the Job as succeeded. + +Note that when you specify multiple rules in the `.spec.succeessPolicy.rules`, +the job controller evaluates the rules in order. Once the Job meets a rule, the job controller ignores remaining rules. + +Here is a manifest for a Job with `successPolicy`: + +{{% code_sample file="/controllers/job-success-policy.yaml" %}} + +In the example above, both `succeededIndexes` and `succeededCount` have been specified. +Therefore, the job controller will mark the Job as succeeded and terminate the lingering Pods +when either of the specified indexes, 0, 2, or 3, succeed. +The Job that meets the success policy gets the `SuccessCriteriaMet` condition. +After the removal of the lingering Pods is issued, the Job gets the `Complete` condition. + +Note that the `succeededIndexes` is represented as intervals separated by a hyphen. +The number are listed in represented by the first and last element of the series, separated by a hyphen. + +{{< note >}} +When you specify both a success policy and some terminating policies such as `.spec.backoffLimit` and `.spec.podFailurePolicy`, +once the Job meets either policy, the job controller respects the terminating policy and ignores the success policy. +{{< /note >}} + ## Job termination and cleanup When a Job completes, no more Pods are created, but the Pods are [usually](#pod-backoff-failure-policy) not deleted either. @@ -1009,6 +1065,50 @@ status: terminating: 3 # three Pods are terminating and have not yet reached the Failed phase ``` +### Delegation of managing a Job object to external controller + +{{< feature-state feature_gate_name="JobManagedBy" >}} + +{{< note >}} +You can only set the `managedBy` field on Jobs if you enable the `JobManagedBy` +[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) +(disabled by default). +{{< /note >}} + +This feature allows you to disable the built-in Job controller, for a specific +Job, and delegate reconciliation of the Job to an external controller. + +You indicate the controller that reconciles the Job by setting a custom value +for the `spec.managedBy` field - any value +other than `kubernetes.io/job-controller`. The value of the field is immutable. + +{{< note >}} +When using this feature, make sure the controller indicated by the field is +installed, otherwise the Job may not be reconciled at all. +{{< /note >}} + +{{< note >}} +When developing an external Job controller be aware that your controller needs +to operate in a fashion conformant with the definitions of the API spec and +status fields of the Job object. + +Please review these in detail in the [Job API](/docs/reference/kubernetes-api/workload-resources/job-v1/). +We also recommend that you run the e2e conformance tests for the Job object to +verify your implementation. + +Finally, when developing an external Job controller make sure it does not use the +`batch.kubernetes.io/job-tracking` finalizer, reserved for the built-in controller. +{{< /note >}} + +{{< warning >}} +If you are considering to disable the `JobManagedBy` feature gate, or to +downgrade the cluster to a version without the feature gate enabled, check if +there are jobs with a custom value of the `spec.managedBy` field. If there +are such jobs, there is a risk that they might be reconciled by two controllers +after the operation: the built-in Job controller and the external controller +indicated by the field value. +{{< /warning >}} + ## Alternatives ### Bare Pods diff --git a/content/en/docs/concepts/workloads/pods/downward-api.md b/content/en/docs/concepts/workloads/pods/downward-api.md index e084c92abd8be..aac5314da2649 100644 --- a/content/en/docs/concepts/workloads/pods/downward-api.md +++ b/content/en/docs/concepts/workloads/pods/downward-api.md @@ -77,7 +77,6 @@ The following information is available through environment variables `status.hostIPs` : the IP addresses is a dual-stack version of `status.hostIP`, the first is always the same as `status.hostIP`. - The field is available if you enable the `PodHostIPs` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/). `status.podIP` : the pod's primary IP address (usually, its IPv4 address) diff --git a/content/en/docs/concepts/workloads/pods/init-containers.md b/content/en/docs/concepts/workloads/pods/init-containers.md index 480d4cee80b9e..54143f8ea69e2 100644 --- a/content/en/docs/concepts/workloads/pods/init-containers.md +++ b/content/en/docs/concepts/workloads/pods/init-containers.md @@ -331,8 +331,10 @@ for resource usage apply: Quota and limits are applied based on the effective Pod request and limit. -Pod level control groups (cgroups) are based on the effective Pod request and -limit, the same as the scheduler. +### Init containers and Linux cgroups {#cgroups} + +On Linux, resource allocations for Pod level control groups (cgroups) are based on the effective Pod +request and limit, the same as the scheduler. {{< comment >}} This section also present under [sidecar containers](/docs/concepts/workloads/pods/sidecar-containers/) page. diff --git a/content/en/docs/concepts/workloads/pods/sidecar-containers.md b/content/en/docs/concepts/workloads/pods/sidecar-containers.md index 0ad25781ff8e8..76eca5cf5913f 100644 --- a/content/en/docs/concepts/workloads/pods/sidecar-containers.md +++ b/content/en/docs/concepts/workloads/pods/sidecar-containers.md @@ -9,21 +9,43 @@ weight: 50 Sidecar containers are the secondary containers that run along with the main application container within the same {{< glossary_tooltip text="Pod" term_id="pod" >}}. -These containers are used to enhance or to extend the functionality of the main application -container by providing additional services, or functionality such as logging, monitoring, +These containers are used to enhance or to extend the functionality of the primary _app +container_ by providing additional services, or functionality such as logging, monitoring, security, or data synchronization, without directly altering the primary application code. +Typically, you only have one app container in a Pod. For example, if you have a web +application that requires a local webserver, the local webserver is a sidecar and the +web application itself is the app container. + -## Enabling sidecar containers +## Sidecar containers in Kubernetes {#pod-sidecar-containers} + +Kubernetes implements sidecar containers as a special case of +[init containers](/docs/concepts/workloads/pods/init-containers/); sidecar containers remain +running after Pod startup. This document uses the term _regular init containers_ to clearly +refer to containers that only run during Pod startup. + +Provided that your cluster has the `SidecarContainers` +[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) enabled +(the feature is active by default since Kubernetes v1.29), you can specify a `restartPolicy` +for containers listed in a Pod's `initContainers` field. +These restartable _sidecar_ containers are independent from other init containers and from +the main application container(s) within the same pod. +These can be started, stopped, or restarted without effecting the main application container +and other init containers. + +You can also run a Pod with multiple containers that are not marked as init or sidecar +containers. This is appropriate if the containers within the Pod are required for the +Pod to work overall, but you don't need to control which containers start or stop first. +You could also do this if you need to support older versions of Kubernetes that don't +support a container-level `restartPolicy` field. + +### Example application {#sidecar-example} + +Here's an example of a Deployment with two containers, one of which is a sidecar: -Enabled by default with Kubernetes 1.29, a -[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) named -`SidecarContainers` allows you to specify a `restartPolicy` for containers listed in a -Pod's `initContainers` field. These restartable _sidecar_ containers are independent with -other [init containers](/docs/concepts/workloads/pods/init-containers/) and main -application container within the same pod. These can be started, stopped, or restarted -without affecting the main application container and other init containers. +{{% code_sample language="yaml" file="application/deployment-sidecar.yaml" %}} ## Sidecar containers and Pod lifecycle @@ -35,8 +57,8 @@ If a `readinessProbe` is specified for this init container, its result will be u to determine the `ready` state of the Pod. Since these containers are defined as init containers, they benefit from the same -ordering and sequential guarantees as other init containers, allowing them to -be mixed with other init containers into complex Pod initialization flows. +ordering and sequential guarantees as regular init containers, allowing you to mix +sidecar containers with regular init containers for complex Pod initialization flows. Compared to regular init containers, sidecars defined within `initContainers` continue to run after they have started. This is important when there is more than one entry inside @@ -46,30 +68,28 @@ next init container from the ordered `.spec.initContainers` list. That status either becomes true because there is a process running in the container and no startup probe defined, or as a result of its `startupProbe` succeeding. -Here's an example of a Deployment with two containers, one of which is a sidecar: - -{{% code_sample language="yaml" file="application/deployment-sidecar.yaml" %}} +### Jobs with sidecar containers -This feature is also useful for running Jobs with sidecars, as the sidecar -container will not prevent the Job from completing after the main container -has finished. +If you define a Job that uses sidecar using Kubernetes-style init containers, +the sidecar container in each Pod does not prevent the Job from completing after the +main container has finished. Here's an example of a Job with two containers, one of which is a sidecar: {{% code_sample language="yaml" file="application/job/job-sidecar.yaml" %}} -## Differences from regular containers +## Differences from application containers -Sidecar containers run alongside regular containers in the same pod. However, they do not +Sidecar containers run alongside _app containers_ in the same pod. However, they do not execute the primary application logic; instead, they provide supporting functionality to the main application. Sidecar containers have their own independent lifecycles. They can be started, stopped, -and restarted independently of regular containers. This means you can update, scale, or +and restarted independently of app containers. This means you can update, scale, or maintain sidecar containers without affecting the primary application. Sidecar containers share the same network and storage namespaces with the primary -container This co-location allows them to interact closely and share resources. +container. This co-location allows them to interact closely and share resources. ## Differences from init containers @@ -112,8 +132,10 @@ for resource usage apply: Quota and limits are applied based on the effective Pod request and limit. -Pod level control groups (cgroups) are based on the effective Pod request and -limit, the same as the scheduler. +### Sidecar containers and Linux cgroups {#cgroups} + +On Linux, resource allocations for Pod level control groups (cgroups) are based on the effective Pod +request and limit, the same as the scheduler. ## {{% heading "whatsnext" %}} diff --git a/content/en/docs/concepts/workloads/pods/user-namespaces.md b/content/en/docs/concepts/workloads/pods/user-namespaces.md index 410b3c90524d2..4b9a1da89e49b 100644 --- a/content/en/docs/concepts/workloads/pods/user-namespaces.md +++ b/content/en/docs/concepts/workloads/pods/user-namespaces.md @@ -7,7 +7,7 @@ min-kubernetes-server-version: v1.25 --- -{{< feature-state for_k8s_version="v1.25" state="alpha" >}} +{{< feature-state for_k8s_version="v1.30" state="beta" >}} This page explains how user namespaces are used in Kubernetes pods. A user namespace isolates the user running inside the container from the one @@ -46,7 +46,26 @@ tmpfs, Secrets use a tmpfs, etc.) Some popular filesystems that support idmap mounts in Linux 6.3 are: btrfs, ext4, xfs, fat, tmpfs, overlayfs. -In addition, support is needed in the +In addition, the container runtime and its underlying OCI runtime must support +user namespaces. The following OCI runtimes offer support: + +* [crun](https://github.com/containers/crun) version 1.9 or greater (it's recommend version 1.13+). + + +{{< note >}} +Many OCI runtimes do not include the support needed for using user namespaces in +Linux pods. If you use a managed Kubernetes, or have downloaded it from packages +and set it up, it's likely that nodes in your cluster use a runtime that doesn't +include this support. For example, the most widely used OCI runtime is `runc`, +and version `1.1.z` of runc doesn't support all the features needed by the +Kubernetes implementation of user namespaces. + +If there is a newer release of runc than 1.1 available for use, check its +documentation and release notes for compatibility (look for idmap mounts support +in particular, because that is the missing feature). +{{< /note >}} + +To use user namespaces with Kubernetes, you also need to use a CRI {{< glossary_tooltip text="container runtime" term_id="container-runtime" >}} to use this feature with Kubernetes pods: @@ -137,20 +156,67 @@ use, see `man 7 user_namespaces`. ## Set up a node to support user namespaces -It is recommended that the host's files and host's processes use UIDs/GIDs in -the range of 0-65535. +By default, the kubelet assigns pods UIDs/GIDs above the range 0-65535, based on +the assumption that the host's files and processes use UIDs/GIDs within this +range, which is standard for most Linux distributions. This approach prevents +any overlap between the UIDs/GIDs of the host and those of the pods. + +Avoiding the overlap is important to mitigate the impact of vulnerabilities such +as [CVE-2021-25741][CVE-2021-25741], where a pod can potentially read arbitrary +files in the host. If the UIDs/GIDs of the pod and the host don't overlap, it is +limited what a pod would be able to do: the pod UID/GID won't match the host's +file owner/group. + +The kubelet can use a custom range for user IDs and group IDs for pods. To +configure a custom range, the node needs to have: + + * A user `kubelet` in the system (you cannot use any other username here) + * The binary `getsubids` installed (part of [shadow-utils][shadow-utils]) and + in the `PATH` for the kubelet binary. + * A configuration of subordinate UIDs/GIDs for the `kubelet` user (see + [`man 5 subuid`](https://man7.org/linux/man-pages/man5/subuid.5.html) and + [`man 5 subgid`](https://man7.org/linux/man-pages/man5/subgid.5.html)). + +This setting only gathers the UID/GID range configuration and does not change +the user executing the `kubelet`. + +You must follow some constraints for the subordinate ID range that you assign +to the `kubelet` user: + +* The subordinate user ID, that starts the UID range for Pods, **must** be a + multiple of 65536 and must also be greater than or equal to 65536. In other + words, you cannot use any ID from the range 0-65535 for Pods; the kubelet + imposes this restriction to make it difficult to create an accidentally insecure + configuration. + +* The subordinate ID count must be a multiple of 65536 + +* The subordinate ID count must be at least `65536 x ` where `` + is the maximum number of pods that can run on the node. + +* You must assign the same range for both user IDs and for group IDs, It doesn't + matter if other users have user ID ranges that don't align with the group ID + ranges. + +* None of the assigned ranges should overlap with any other assignment. + +* The subordinate configuration must be only one line. In other words, you can't + have multiple ranges. -The kubelet will assign UIDs/GIDs higher than that to pods. Therefore, to -guarantee as much isolation as possible, the UIDs/GIDs used by the host's files -and host's processes should be in the range 0-65535. +For example, you could define `/etc/subuid` and `/etc/subgid` to both have +these entries for the `kubelet` user: -Note that this recommendation is important to mitigate the impact of CVEs like -[CVE-2021-25741][CVE-2021-25741], where a pod can potentially read arbitrary -files in the hosts. If the UIDs/GIDs of the pod and the host don't overlap, it -is limited what a pod would be able to do: the pod UID/GID won't match the -host's file owner/group. +``` +# The format is +# name:firstID:count of IDs +# where +# - firstID is 65536 (the minimum value possible) +# - count of IDs is 110 (default limit for number of) * 65536 +kubelet:65536:7208960 +``` [CVE-2021-25741]: https://github.com/kubernetes/kubernetes/issues/104980 +[shadow-utils]: https://github.com/shadow-maint/shadow ## Integration with Pod security admission checks diff --git a/content/en/docs/reference/access-authn-authz/admission-controllers.md b/content/en/docs/reference/access-authn-authz/admission-controllers.md index 2ce4f659da04c..9f6f31e3b641e 100644 --- a/content/en/docs/reference/access-authn-authz/admission-controllers.md +++ b/content/en/docs/reference/access-authn-authz/admission-controllers.md @@ -792,49 +792,6 @@ defined in the corresponding RuntimeClass. See also [Pod Overhead](/docs/concepts/scheduling-eviction/pod-overhead/) for more information. -### SecurityContextDeny {#securitycontextdeny} - -**Type**: Validating. - -{{< feature-state for_k8s_version="v1.27" state="deprecated" >}} - -{{< caution >}} -The Kubernetes project recommends that you **do not use** the -`SecurityContextDeny` admission controller. - -The `SecurityContextDeny` admission controller plugin is deprecated and disabled -by default. It will be removed in a future version. If you choose to enable the -`SecurityContextDeny` admission controller plugin, you must enable the -`SecurityContextDeny` feature gate as well. - -The `SecurityContextDeny` admission plugin is deprecated because it is outdated -and incomplete; it may be unusable or not do what you would expect. As -implemented, this plugin is unable to restrict all security-sensitive attributes -of the Pod API. For example, the `privileged` and `ephemeralContainers` fields -were never restricted by this plugin. - -The [Pod Security Admission](/docs/concepts/security/pod-security-admission/) -plugin enforcing the [Pod Security Standards](/docs/concepts/security/pod-security-standards/) -`Restricted` profile captures what this plugin was trying to achieve in a better -and up-to-date way. -{{< /caution >}} - -This admission controller will deny any Pod that attempts to set the following -[SecurityContext](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context) -fields: -- `.spec.securityContext.supplementalGroups` -- `.spec.securityContext.seLinuxOptions` -- `.spec.securityContext.runAsUser` -- `.spec.securityContext.fsGroup` -- `.spec.(init)Containers[*].securityContext.seLinuxOptions` -- `.spec.(init)Containers[*].securityContext.runAsUser` - -For more historical context on this plugin, see -[The birth of PodSecurityPolicy](/blog/2022/08/23/podsecuritypolicy-the-historical-context/#the-birth-of-podsecuritypolicy) -from the Kubernetes blog article about PodSecurityPolicy and its removal. The -article details the PodSecurityPolicy historical context and the birth of the -`securityContext` field for Pods. - ### ServiceAccount {#serviceaccount} **Type**: Mutating and Validating. diff --git a/content/en/docs/reference/access-authn-authz/authentication.md b/content/en/docs/reference/access-authn-authz/authentication.md index a814b94397326..c7a0b0e3c15af 100644 --- a/content/en/docs/reference/access-authn-authz/authentication.md +++ b/content/en/docs/reference/access-authn-authz/authentication.md @@ -329,19 +329,42 @@ To enable the plugin, configure the following flags on the API server: | `--oidc-ca-file` | The path to the certificate for the CA that signed your identity provider's web certificate. Defaults to the host's root CAs. | `/etc/kubernetes/ssl/kc-ca.pem` | No | | `--oidc-signing-algs` | The signing algorithms accepted. Default is "RS256". | `RS512` | No | -##### Using Authentication Configuration +##### Authentication configuration from a file {#using-authentication-configuration} -{{< feature-state for_k8s_version="v1.29" state="alpha" >}} +{{< feature-state feature_gate_name="StructuredAuthenticationConfiguration" >}} JWT Authenticator is an authenticator to authenticate Kubernetes users using JWT compliant tokens. The authenticator will attempt to parse a raw ID token, verify it's been signed by the configured issuer. The public key to verify the signature is discovered from the issuer's public endpoint using OIDC discovery. -The API server can be configured to use a JWT authenticator via the `--authentication-config` flag. This flag takes a path to a file containing the `AuthenticationConfiguration`. An example configuration is provided below. -To use this config, the `StructuredAuthenticationConfiguration` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) -has to be enabled. +The minimum valid JWT payload must contain the following claims: +```yaml +{ + "iss": "https://example.com", // must match the issuer.url + "aud": ["my-app"], // at least one of the entries in issuer.audiences must match the "aud" claim in presented JWTs. + "exp": 1234567890, // token expiration as Unix time (the number of seconds elapsed since January 1, 1970 UTC) + "": "user" // this is the username claim configured in the claimMappings.username.claim or claimMappings.username.expression +} +``` + +The configuration file approach allows you to configure multiple JWT authenticators, each with a unique `issuer.url` and `issuer.discoveryURL`. The configuration file even allows you to specify [CEL](/docs/reference/using-api/cel/) +expressions to map claims to user attributes, and to validate claims and user information. The API server also automatically reloads the authenticators when the configuration file is modified. You can use +`apiserver_authentication_config_controller_automatic_reload_last_timestamp_seconds` metric to monitor the last time the configuration was reloaded by the API server. + +You must specify the path to the authentication configuration using the `--authentication-config` flag on the API server. If you want to use command line flags instead of the configuration file, those will continue to work as-is. +To access the new capabilities like configuring multiple authenticators, setting multiple audiences for an issuer, switch to using the configuration file. + +For Kubernetes v{{< skew currentVersion >}}, the structured authentication configuration file format +is beta-level, and the mechanism for using that configuration is also beta. Provided you didn't specifically +disable the `StructuredAuthenticationConfiguration` +[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) for your cluster, +you can turn on structured authentication by specifying the `--authentication-config` command line +argument to the kube-apiserver. An example of the structured authentication configuration file is shown below. {{< note >}} -When the feature is enabled, setting both `--authentication-config` and any of the `--oidc-*` flags will result in an error. If you want to use the feature, you have to remove the `--oidc-*` flags and use the configuration file instead. +If you specify `--authentication-config` along with any of the `--oidc-*` command line arguments, this is +a misconfiguration. In this situation, the API server reports an error and then immediately exits. +If you want to switch to using structured authentication configuration, you have to remove the `--oidc-*` +command line arguments, and use the configuration file instead. {{< /note >}} ```yaml @@ -350,18 +373,37 @@ When the feature is enabled, setting both `--authentication-config` and any of t # CAUTION: this is an example configuration. # Do not use this for your own cluster! # -apiVersion: apiserver.config.k8s.io/v1alpha1 +apiVersion: apiserver.config.k8s.io/v1beta1 kind: AuthenticationConfiguration # list of authenticators to authenticate Kubernetes users using JWT compliant tokens. +# the maximum number of allowed authenticators is 64. jwt: - issuer: + # url must be unique across all authenticators. + # url must not conflict with issuer configured in --service-account-issuer. url: https://example.com # Same as --oidc-issuer-url. + # discoveryURL, if specified, overrides the URL used to fetch discovery + # information instead of using "{url}/.well-known/openid-configuration". + # The exact value specified is used, so "/.well-known/openid-configuration" + # must be included in discoveryURL if needed. + # + # The "issuer" field in the fetched discovery information must match the "issuer.url" field + # in the AuthenticationConfiguration and will be used to validate the "iss" claim in the presented JWT. + # This is for scenarios where the well-known and jwks endpoints are hosted at a different + # location than the issuer (such as locally in the cluster). + # discoveryURL must be different from url if specified and must be unique across all authenticators. + discoveryURL: https://discovery.example.com/.well-known/openid-configuration # PEM encoded CA certificates used to validate the connection when fetching # discovery information. If not set, the system verifier will be used. # Same value as the content of the file referenced by the --oidc-ca-file flag. - certificateAuthority: + certificateAuthority: + # audiences is the set of acceptable audiences the JWT must be issued to. + # At least one of the entries must match the "aud" claim in presented JWTs. audiences: - my-app # Same as --oidc-client-id. + - my-other-app + # this is required to be set to "MatchAny" when multiple audiences are specified. + audienceMatchPolicy: MatchAny # rules applied to validate token claims to authenticate users. claimValidationRules: # Same as --oidc-required-claim key=value. @@ -387,6 +429,13 @@ jwt: prefix: "" # Mutually exclusive with username.claim and username.prefix. # expression is a CEL expression that evaluates to a string. + # + # 1. If username.expression uses 'claims.email', then 'claims.email_verified' must be used in + # username.expression or extra[*].valueExpression or claimValidationRules[*].expression. + # An example claim validation rule expression that matches the validation automatically + # applied when username.claim is set to 'email' is 'claims.?email_verified.orValue(true)'. + # 2. If the username asserted based on username.expression is the empty string, the authentication + # request will fail. expression: 'claims.username + ":external-user"' # groups represents an option for the groups attribute. groups: @@ -446,7 +495,7 @@ jwt: {{< tabs name="example_configuration" >}} {{% tab name="Valid token" %}} ```yaml - apiVersion: apiserver.config.k8s.io/v1alpha1 + apiVersion: apiserver.config.k8s.io/v1beta1 kind: AuthenticationConfiguration jwt: - issuer: @@ -506,7 +555,7 @@ jwt: {{% /tab %}} {{% tab name="Fails claim validation" %}} ```yaml - apiVersion: apiserver.config.k8s.io/v1alpha1 + apiVersion: apiserver.config.k8s.io/v1beta1 kind: AuthenticationConfiguration jwt: - issuer: @@ -554,7 +603,7 @@ jwt: {{% /tab %}} {{% tab name="Fails user validation" %}} ```yaml - apiVersion: apiserver.config.k8s.io/v1alpha1 + apiVersion: apiserver.config.k8s.io/v1beta1 kind: AuthenticationConfiguration jwt: - issuer: @@ -618,12 +667,10 @@ jwt: {{% /tab %}} {{< /tabs >}} -Importantly, the API server is not an OAuth2 client, rather it can only be -configured to trust a single issuer. This allows the use of public providers, -such as Google, without trusting credentials issued to third parties. Admins who -wish to utilize multiple OAuth clients should explore providers which support the -`azp` (authorized party) claim, a mechanism for allowing one client to issue -tokens on behalf of another. +###### Limitations + +1. Distributed claims do not work via [CEL](/docs/reference/using-api/cel/) expressions. +1. Egress selector configuration is not supported for calls to `issuer.url` and `issuer.discoveryURL`. Kubernetes does not provide an OpenID Connect Identity Provider. You can use an existing public OpenID Connect Identity Provider (such as Google, or @@ -635,9 +682,15 @@ Tremolo Security's [OpenUnison](https://openunison.github.io/). For an identity provider to work with Kubernetes it must: -1. Support [OpenID connect discovery](https://openid.net/specs/openid-connect-discovery-1_0.html); not all do. -1. Run in TLS with non-obsolete ciphers -1. Have a CA signed certificate (even if the CA is not a commercial CA or is self signed) +1. Support [OpenID connect discovery](https://openid.net/specs/openid-connect-discovery-1_0.html) + + The public key to verify the signature is discovered from the issuer's public endpoint using OIDC discovery. + If you're using the authentication configuration file, the identity provider doesn't need to publicly expose the discovery endpoint. + You can host the discovery endpoint at a different location than the issuer (such as locally in the cluster) and specify the + `issuer.discoveryURL` in the configuration file. + +2. Run in TLS with non-obsolete ciphers +3. Have a CA signed certificate (even if the CA is not a commercial CA or is self signed) A note about requirement #3 above, requiring a CA signed certificate. If you deploy your own identity provider (as opposed to one of the cloud providers like Google or Microsoft) you MUST diff --git a/content/en/docs/reference/access-authn-authz/authorization.md b/content/en/docs/reference/access-authn-authz/authorization.md index 621cc9773b474..189c35bf109ad 100644 --- a/content/en/docs/reference/access-authn-authz/authorization.md +++ b/content/en/docs/reference/access-authn-authz/authorization.md @@ -211,33 +211,31 @@ so an earlier module has higher priority to allow or deny a request. ## Configuring the API Server using an Authorization Config File -{{< feature-state state="alpha" for_k8s_version="v1.29" >}} +{{< feature-state feature_gate_name="StructuredAuthorizationConfiguration" >}} The Kubernetes API server's authorizer chain can be configured using a configuration file. -You specify the path to that authorization configuration using the -`--authorization-config` command line argument. This feature enables -creation of authorization chains with multiple webhooks with well-defined -parameters that validate requests in a certain order and enables fine grained -control - such as explicit Deny on failures. An example configuration with -all possible values is provided below. +This feature enables the creation of authorization chains with multiple webhooks with well-defined parameters that validate requests in a particular order and allows fine-grained control – such as explicit Deny on failures. The configuration file approach even allows you to specify [CEL](/docs/reference/using-api/cel/) rules to pre-filter requests before they are dispatched to webhooks, helping you to prevent unnecessary invocations. The API server also automatically reloads the authorizer chain when the configuration file is modified. An example configuration with all possible values is provided below. -In order to customise the authorizer chain, you need to enable the -`StructuredAuthorizationConfiguration` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/). +You must specify the path to the authorization configuration using the `--authorization-config`command line argument. If you want to keep using command line flags instead of a configuration file, those will continue to work as-is. To gain access to new authorization webhook capabilities like multiple webhooks, failure policy, and pre-filter rules, switch to putting options in an `--authorization-config` file. -Note: When the feature is enabled, setting both `--authorization-config` and +Starting Kubernetes v{{< skew currentVersion >}}, the configuration file format is +beta-level, and only requires specifying `--authorization-config` since the `StructuredAuthorizationConfiguration` feature gate is enabled by default. + +{{< caution >}} +If you want to keep using command line flags to configure authorization instead of a configuration file, those will continue to work as-is. + +When the feature is enabled, setting both `--authorization-config` and configuring an authorization webhook using the `--authorization-mode` and `--authorization-webhook-*` command line flags is not allowed. If done, there will be an error and API Server would exit right away. -{{< caution >}} -While the feature is in Alpha/Beta, there is no change if you want to keep on -using command line flags. When the feature goes Beta, the feature flag would -be turned on by default. The feature flag would be removed when feature goes GA. +Authorization Config file reloads when an observed file event occurs or a 1 minute poll is encountered. All non-webhook authorizer types are required to remain unchanged in the file on reload. Reload must not add or remove Node or RBAC +authorizers. They can be reordered, but cannot be added or removed. When configuring the authorizer chain using a config file, make sure all the -apiserver nodes have the file. Also, take a note of the apiserver configuration +apiserver nodes have the file. Take a note of the apiserver configuration when upgrading/downgrading the clusters. For example, if upgrading to v1.29+ clusters and using the config file, you would need to make sure the config file exists before upgrading the cluster. When downgrading to v1.28, you would need @@ -248,9 +246,8 @@ to add the flags back to their bootstrap mechanism. # # DO NOT USE THE CONFIG AS IS. THIS IS AN EXAMPLE. # -apiVersion: apiserver.config.k8s.io/v1alpha1 +apiVersion: apiserver.config.k8s.io/v1beta1 kind: AuthorizationConfiguration -# authorizers are defined in order of precedence authorizers: - type: Webhook # Name used to describe the authorizer @@ -283,7 +280,7 @@ authorizers: # MatchConditionSubjectAccessReviewVersion specifies the SubjectAccessReview # version the CEL expressions are evaluated against # Valid values: v1 - # Required only if matchConditions are specified, no default value + # Required, no default value matchConditionSubjectAccessReviewVersion: v1 # Controls the authorization decision when a webhook request fails to # complete or returns a malformed response or errors evaluating diff --git a/content/en/docs/reference/access-authn-authz/extensible-admission-controllers.md b/content/en/docs/reference/access-authn-authz/extensible-admission-controllers.md index 091a638d963e0..9323651c5798e 100644 --- a/content/en/docs/reference/access-authn-authz/extensible-admission-controllers.md +++ b/content/en/docs/reference/access-authn-authz/extensible-admission-controllers.md @@ -721,7 +721,7 @@ The `matchPolicy` for an admission webhooks defaults to `Equivalent`. ### Matching requests: `matchConditions` -{{< feature-state state="beta" for_k8s_version="v1.28" >}} +{{< feature-state feature_gate_name="AdmissionWebhookMatchConditions" >}} You can define _match conditions_ for webhooks if you need fine-grained request filtering. These conditions are useful if you find that match rules, `objectSelectors` and `namespaceSelectors` still diff --git a/content/en/docs/reference/access-authn-authz/service-accounts-admin.md b/content/en/docs/reference/access-authn-authz/service-accounts-admin.md index 92e631fc20897..0f16071251a52 100644 --- a/content/en/docs/reference/access-authn-authz/service-accounts-admin.md +++ b/content/en/docs/reference/access-authn-authz/service-accounts-admin.md @@ -60,6 +60,102 @@ for a number of reasons: without many constraints and have namespaced names, such configuration is usually portable. +## Bound service account tokens + +ServiceAccount tokens can be bound to API objects that exist in the kube-apiserver. +This can be used to tie the validity of a token to the existence of another API object. +Supported object types are as follows: + +* Pod (used for projected volume mounts, see below) +* Secret (can be used to allow revoking a token by deleting the Secret) +* Node (in v1.30, creating new node-bound tokens is alpha, using existing node-bound tokens is beta) + +When a token is bound to an object, the object's `metadata.name` and `metadata.uid` are +stored as extra 'private claims' in the issued JWT. + +When a bound token is presented to the kube-apiserver, the service account authenticator +will extract and verify these claims. +If the referenced object no longer exists (or its `metadata.uid` does not match), +the request will not be authenticated. + +### Additional metadata in Pod bound tokens + +{{< feature-state feature_gate_name="ServiceAccountTokenPodNodeInfo" >}} + +When a service account token is bound to a Pod object, additional metadata is also +embedded into the token that indicates the value of the bound pod's `spec.nodeName` field, +and the uid of that Node, if available. + +This node information is **not** verified by the kube-apiserver when the token is used for authentication. +It is included so integrators do not have to fetch Pod or Node API objects to check the associated Node name +and uid when inspecting a JWT. + +### Verifying and inspecting private claims + +The `TokenReview` API can be used to verify and extract private claims from a token: + +1. First, assume you have a pod named `test-pod` and a service account named `my-sa`. +2. Create a token that is bound to this Pod: + +```shell +kubectl create token my-sa --bound-object-kind="Pod" --bound-object-name="test-pod" +``` + +3. Copy this token into a new file named `tokenreview.yaml`: + +```yaml +apiVersion: authentication.k8s.io/v1 +kind: TokenReview +spec: + token: +``` + +4. Submit this resource to the apiserver for review: + +```shell +kubectl create -o yaml -f tokenreview.yaml # we use '-o yaml' so we can inspect the output +``` + +You should see an output like below: + +```yaml +apiVersion: authentication.k8s.io/v1 +kind: TokenReview +metadata: + creationTimestamp: null +spec: + token: +status: + audiences: + - https://kubernetes.default.svc.cluster.local + authenticated: true + user: + extra: + authentication.kubernetes.io/credential-id: + - JTI=7ee52be0-9045-4653-aa5e-0da57b8dccdc + authentication.kubernetes.io/node-name: + - kind-control-plane + authentication.kubernetes.io/node-uid: + - 497e9d9a-47aa-4930-b0f6-9f2fb574c8c6 + authentication.kubernetes.io/pod-name: + - test-pod + authentication.kubernetes.io/pod-uid: + - e87dbbd6-3d7e-45db-aafb-72b24627dff5 + groups: + - system:serviceaccounts + - system:serviceaccounts:default + - system:authenticated + uid: f8b4161b-2e2b-11e9-86b7-2afc33b31a7e + username: system:serviceaccount:default:my-sa +``` + +{{< note >}} +Despite using `kubectl create -f` to create this resource, and defining it similar to +other resource types in Kubernetes, TokenReview is a special type and the kube-apiserver +does not actually persist the TokenReview object into etcd. +Hence `kubectl get tokenreview` is not a valid command. +{{< /note >}} + ## Bound service account token volume mechanism {#bound-service-account-token-volume} {{< feature-state feature_gate_name="BoundServiceAccountTokenVolume" >}} diff --git a/content/en/docs/reference/access-authn-authz/validating-admission-policy.md b/content/en/docs/reference/access-authn-authz/validating-admission-policy.md index f7f705aa9f59e..2d0ae273442ad 100644 --- a/content/en/docs/reference/access-authn-authz/validating-admission-policy.md +++ b/content/en/docs/reference/access-authn-authz/validating-admission-policy.md @@ -9,7 +9,7 @@ content_type: concept -{{< feature-state state="beta" for_k8s_version="v1.28" >}} +{{< feature-state state="stable" for_k8s_version="v1.30" >}} This page provides an overview of Validating Admission Policy. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/admission-webhook-match-conditions.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/admission-webhook-match-conditions.md index f213c21af4568..95395364754c2 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/admission-webhook-match-conditions.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/admission-webhook-match-conditions.md @@ -12,7 +12,11 @@ stages: toVersion: "1.27" - stage: beta defaultValue: true - fromVersion: "1.28" + fromVersion: "1.28" + toVersion: "1.29" + - stage: stable + defaultValue: true + fromVersion: "1.30" --- Enable [match conditions](/docs/reference/access-authn-authz/extensible-admission-controllers/#matching-requests-matchconditions) on mutating & validating admission webhooks. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/aggregated-discovery-endpoint.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/aggregated-discovery-endpoint.md index 1b82dec7618e3..047217ceedda8 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/aggregated-discovery-endpoint.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/aggregated-discovery-endpoint.md @@ -13,6 +13,10 @@ stages: - stage: beta defaultValue: true fromVersion: "1.27" + toVersion: "1.29" + - stage: stable + defaultValue: true + fromVersion: "1.30" --- Enable a single HTTP endpoint `/discovery/` which supports native HTTP caching with ETags containing all APIResources known to the API server. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/api-self-subject-review.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/api-self-subject-review.md index fd65f9919f866..dc706fb51af52 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/api-self-subject-review.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/api-self-subject-review.md @@ -1,4 +1,5 @@ --- +# Removed from Kubernetes title: APISelfSubjectReview content_type: feature_gate _build: diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/cloud-dual-stack-node-ips.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/cloud-dual-stack-node-ips.md index 4a850e6135557..970fc6ae8e3d6 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/cloud-dual-stack-node-ips.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/cloud-dual-stack-node-ips.md @@ -13,6 +13,11 @@ stages: - stage: beta defaultValue: true fromVersion: "1.29" + toVersion: "1.29" + - stage: stable + defaultValue: true + fromVersion: "1.30" + --- Enables dual-stack `kubelet --node-ip` with external cloud providers. See [Configure IPv4/IPv6 dual-stack](/docs/concepts/services-networking/dual-stack/#configure-ipv4-ipv6-dual-stack) diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/contextual-logging.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/contextual-logging.md index 9ae5102d64a3e..6416383f005d6 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/contextual-logging.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/contextual-logging.md @@ -9,6 +9,9 @@ stages: - stage: alpha defaultValue: false fromVersion: "1.24" + - stage: beta + defaultValue: true + fromVersion: "1.30" --- -When you enable this feature gate, Kubernetes components that support - contextual logging add extra detail to log output. +Enables extra details in log output of Kubernetes components that support +contextual logging. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/crd-validation-ratcheting.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/crd-validation-ratcheting.md index 165c3f1b5c35f..915929fe1197e 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/crd-validation-ratcheting.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/crd-validation-ratcheting.md @@ -9,6 +9,10 @@ stages: - stage: alpha defaultValue: false fromVersion: "1.28" + toVersion: "1.29" + - stage: beta + defaultValue: true + fromVersion: "1.30" --- Enable updates to custom resources to contain violations of their OpenAPI schema if the offending portions of the resource diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/custom-resource-field-selectors.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/custom-resource-field-selectors.md new file mode 100644 index 0000000000000..5ef021173f8de --- /dev/null +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/custom-resource-field-selectors.md @@ -0,0 +1,16 @@ +--- +title: CustomResourceFieldSelectors +content_type: feature_gate +_build: + list: never + render: false + +stages: + - stage: alpha + defaultValue: false + fromVersion: "1.30" +--- + +Enable `selectableFields` in the +{{< glossary_tooltip term_id="CustomResourceDefinition" text="CustomResourceDefinition" >}} API to allow filtering +of custom resource **list**, **watch** and **deletecollection** requests. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/hpa-container-metrics.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/hpa-container-metrics.md index 0beb5c474dfdd..84d076cb4b6a6 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/hpa-container-metrics.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/hpa-container-metrics.md @@ -13,6 +13,10 @@ stages: - stage: beta defaultValue: true fromVersion: "1.27" + toVersion: "1.29" + - stage: stable + defaultValue: true + fromVersion: "1.30" --- -Enable the `HorizontalPodAutoscaler` to scale based on -metrics from individual containers in target pods. +Allow {{< glossary_tooltip text="HorizontalPodAutoscalers" term_id="horizontal-pod-autoscaler" >}} +to scale based on metrics from individual containers within target pods. \ No newline at end of file diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/image-maximum-gc-age.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/image-maximum-gc-age.md index 5860765283dbd..10a6b2334e9a8 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/image-maximum-gc-age.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/image-maximum-gc-age.md @@ -8,6 +8,10 @@ _build: stages: - stage: alpha defaultValue: false - fromVersion: "1.29" + fromVersion: "1.29" + toVersion: "1.29" + - stage: beta + defaultValue: true + fromVersion: "1.30" --- Enables the kubelet configuration field `imageMaximumGCAge`, allowing an administrator to specify the age after which an image will be garbage collected. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/job-managed-by.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/job-managed-by.md new file mode 100644 index 0000000000000..38733b6de66ff --- /dev/null +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/job-managed-by.md @@ -0,0 +1,14 @@ +--- +title: JobManagedBy +content_type: feature_gate + +_build: + list: never + render: false + +stages: + - stage: alpha + defaultValue: false + fromVersion: "1.30" +--- +Allows to delegate reconciliation of a Job object to an external controller. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/job-success-policy.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/job-success-policy.md new file mode 100644 index 0000000000000..601680357ccc9 --- /dev/null +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/job-success-policy.md @@ -0,0 +1,14 @@ +--- +title: JobSuccessPolicy +content_type: feature_gate + +_build: + list: never + render: false + +stages: + - stage: alpha + defaultValue: false + fromVersion: "1.30" +--- +Allow users to specify when a Job can be declared as succeeded based on the set of succeeded pods. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/kube-proxy-draining-terminating-nodes.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/kube-proxy-draining-terminating-nodes.md index e9628a85998eb..d0c1b00f8f385 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/kube-proxy-draining-terminating-nodes.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/kube-proxy-draining-terminating-nodes.md @@ -9,6 +9,10 @@ stages: - stage: alpha defaultValue: false fromVersion: "1.28" + toVersion: "1.30" + - stage: beta + defaultValue: true + fromVersion: "1.30" --- Implement connection draining for terminating nodes for `externalTrafficPolicy: Cluster` services. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/legacy-service-account-token-clean-up.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/legacy-service-account-token-clean-up.md index 698e25067f9a4..f22aaae479dff 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/legacy-service-account-token-clean-up.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/legacy-service-account-token-clean-up.md @@ -13,6 +13,10 @@ stages: - stage: beta defaultValue: true fromVersion: "1.29" + toVersion: "1.29" + - stage: stable + defaultValue: true + fromVersion: "1.30" --- Enable cleaning up Secret-based [service account tokens](/docs/concepts/security/service-accounts/#get-a-token) diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/load-balancer-ip-mode.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/load-balancer-ip-mode.md index 1a46538eb7dec..6b87fd3abff38 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/load-balancer-ip-mode.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/load-balancer-ip-mode.md @@ -9,6 +9,10 @@ stages: - stage: alpha defaultValue: false fromVersion: "1.29" + toVersion: "1.30" + - stage: beta + defaultValue: true + fromVersion: "1.30" --- Allows setting `ipMode` for Services where `type` is set to `LoadBalancer`. See [Specifying IPMode of load balancer status](/docs/concepts/services-networking/service/#load-balancer-ip-mode) diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/min-domains-in-pod-topology-spread.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/min-domains-in-pod-topology-spread.md index a971222564b77..ae8a3f7f383ad 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/min-domains-in-pod-topology-spread.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/min-domains-in-pod-topology-spread.md @@ -17,6 +17,10 @@ stages: - stage: beta defaultValue: true fromVersion: "1.27" + toVersion: "1.29" + - stage: stable + defaultValue: true + fromVersion: "1.30" --- Enable `minDomains` in [Pod topology spread constraints](/docs/concepts/scheduling-eviction/topology-spread-constraints/). diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/name-generation-retries.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/name-generation-retries.md new file mode 100644 index 0000000000000..6f654f89faa52 --- /dev/null +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/name-generation-retries.md @@ -0,0 +1,19 @@ +--- +title: NameGenerationRetries +content_type: feature_gate + +_build: + list: never + render: false + +stages: + - stage: alpha + defaultValue: false + fromVersion: "1.30" + +--- +Enables retrying of object creation when the +{{< glossary_tooltip text="API server" term_id="kube-apiserver" >}} +is expected to generate a [name](/docs/concepts/overview/working-with-objects/names/#names). +When this feature is enabled, requests using `generateName` are retried automatically in case the +control plane detects a name conflict with an existing object, up to a limit of 8 total attempts. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/new-volume-manager-reconstruction.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/new-volume-manager-reconstruction.md index f9242f0050912..d58e828c56cbc 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/new-volume-manager-reconstruction.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/new-volume-manager-reconstruction.md @@ -13,16 +13,15 @@ stages: - stage: beta defaultValue: true fromVersion: "1.28" + toVersion: "1.29" + - stage: stable + defaultValue: true + fromVersion: "1.30" --- Enables improved discovery of mounted volumes during kubelet -startup. Since this code has been significantly refactored, we allow to opt-out in case kubelet -gets stuck at the startup or is not unmounting volumes from terminated Pods. Note that this -refactoring was behind `SELinuxMountReadWriteOncePod` alpha feature gate in Kubernetes 1.25. - - -Before Kubernetes v1.25, the kubelet used different default behavior for discovering mounted -volumes during the kubelet startup. If you disable this feature gate (it's enabled by default), you select -the legacy discovery behavior. +startup. Since the associated code had been significantly refactored, Kubernetes versions 1.25 to 1.29 +allowed you to opt-out in case the kubelet got stuck at the startup, or did not unmount volumes +from terminated Pods. -In Kubernetes v1.25 and v1.26, this behavior toggle was part of the `SELinuxMountReadWriteOncePod` -feature gate. +This refactoring was behind the `SELinuxMountReadWriteOncePod` feature gate in Kubernetes +releases 1.25 and 1.26. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/node-log-query.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/node-log-query.md index efdc9bdc45ba1..6f6ce6ee9d3ac 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/node-log-query.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/node-log-query.md @@ -9,5 +9,9 @@ stages: - stage: alpha defaultValue: false fromVersion: "1.27" + toVersion: "1.29" + - stage: beta + defaultValue: false + fromVersion: "1.30" --- Enables querying logs of node services using the `/logs` endpoint. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/pod-host-ips.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/pod-host-ips.md index 81e919aa6f069..0f39a10790f3c 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/pod-host-ips.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/pod-host-ips.md @@ -13,6 +13,10 @@ stages: - stage: beta defaultValue: true fromVersion: "1.29" + toVersion: "1.30" + - stage: stable + defaultValue: true + fromVersion: "1.30" --- Enable the `status.hostIPs` field for pods and the {{< glossary_tooltip term_id="downward-api" text="downward API" >}}. The field lets you expose host IP addresses to workloads. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/pod-lifecycle-sleep-action.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/pod-lifecycle-sleep-action.md index 42509131ebaa8..bb5ede9ce1079 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/pod-lifecycle-sleep-action.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/pod-lifecycle-sleep-action.md @@ -9,5 +9,9 @@ stages: - stage: alpha defaultValue: false fromVersion: "1.29" + toVersion: "1.29" + - stage: beta + defaultValue: true + fromVersion: "1.30" --- Enables the `sleep` action in Container lifecycle hooks. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/pod-scheduling-readiness.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/pod-scheduling-readiness.md index 8b03ffb2daef1..24951cfc8294d 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/pod-scheduling-readiness.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/pod-scheduling-readiness.md @@ -13,5 +13,9 @@ stages: - stage: beta defaultValue: true fromVersion: "1.27" + toVersion: "1.29" + - stage: stable + defaultValue: true + fromVersion: "1.30" --- Enable setting `schedulingGates` field to control a Pod's [scheduling readiness](/docs/concepts/scheduling-eviction/pod-scheduling-readiness). diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/port-forward-websockets.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/port-forward-websockets.md new file mode 100644 index 0000000000000..fb541f9f0ae15 --- /dev/null +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/port-forward-websockets.md @@ -0,0 +1,15 @@ +--- +title: PortForwardWebsockets +content_type: feature_gate +_build: + list: never + render: false + +stages: + - stage: alpha + defaultValue: false + fromVersion: "1.30" +--- +Allow WebSocket streaming of the +portforward sub-protocol (`port-forward`) from clients requesting +version v2 (`v2.portforward.k8s.io`) of the sub-protocol. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/recursive-read-only-mounts.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/recursive-read-only-mounts.md new file mode 100644 index 0000000000000..3ecca217d9d3c --- /dev/null +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/recursive-read-only-mounts.md @@ -0,0 +1,14 @@ +--- +title: RecursiveReadOnlyMounts +content_type: feature_gate +_build: + list: never + render: false + +stages: + - stage: alpha + defaultValue: false + fromVersion: "1.30" +--- +Enables support for recursive read-only mounts. +For more details, see [read-only mounts](/docs/concepts/storage/volumes/#read-only-mounts). diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/relaxed-environment-variable-validation.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/relaxed-environment-variable-validation.md new file mode 100644 index 0000000000000..862ae57214bae --- /dev/null +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/relaxed-environment-variable-validation.md @@ -0,0 +1,13 @@ +--- +title: RelaxedEnvironmentVariableValidation +content_type: feature_gate +_build: + list: never + render: false + +stages: + - stage: alpha + defaultValue: false + fromVersion: "1.30" +--- +Allow almost all printable ASCII characters in environment variables. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/remove-self-link.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/remove-self-link.md index ff8b45a51e7a6..0e7a492be3057 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/remove-self-link.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/remove-self-link.md @@ -1,4 +1,5 @@ --- +removed: true title: RemoveSelfLink content_type: feature_gate _build: @@ -17,6 +18,7 @@ stages: - stage: stable defaultValue: true fromVersion: "1.24" + toVersion: "1.29" --- Sets the `.metadata.selfLink` field to blank (empty string) for all objects and collections. This field has been deprecated since the Kubernetes v1.16 diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/selinux-mount.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/selinux-mount.md new file mode 100644 index 0000000000000..124862976773c --- /dev/null +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/selinux-mount.md @@ -0,0 +1,20 @@ +--- +title: SELinuxMount +content_type: feature_gate +_build: + list: never + render: false + +stages: + - stage: alpha + defaultValue: false + fromVersion: "1.30" +--- +Speeds up container startup by allowing kubelet to mount volumes +for a Pod directly with the correct SELinux label instead of changing each file on the volumes +recursively. +It widens the performance improvements behind the `SELinuxMountReadWriteOncePod` +feature gate by extending the implementation to all volumes. + +Enabling the `SELinuxMount` feature gate requires the feature gate `SELinuxMountReadWriteOncePod` to +be enabled. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/service-account-token-jti.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/service-account-token-jti.md index f4e9243184872..ab82953ada6da 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/service-account-token-jti.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/service-account-token-jti.md @@ -9,6 +9,10 @@ stages: - stage: alpha defaultValue: false fromVersion: "1.29" + toVersion: "1.29" + - stage: beta + defaultValue: true + fromVersion: "1.30" --- Controls whether JTIs (UUIDs) are embedded into generated service account tokens, and whether these JTIs are recorded into the Kubernetes audit log for future requests made by these tokens. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/service-account-token-node-binding-validation.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/service-account-token-node-binding-validation.md index fbdff26fd005a..94021587aef52 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/service-account-token-node-binding-validation.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/service-account-token-node-binding-validation.md @@ -9,6 +9,10 @@ stages: - stage: alpha defaultValue: false fromVersion: "1.29" + toVersion: "1.29" + - stage: beta + defaultValue: true + fromVersion: "1.30" --- Controls whether the apiserver will validate a Node reference in service account tokens. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/service-account-token-pod-node-info.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/service-account-token-pod-node-info.md index da5410122dd10..86d8940b55ec2 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/service-account-token-pod-node-info.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/service-account-token-pod-node-info.md @@ -9,6 +9,10 @@ stages: - stage: alpha defaultValue: false fromVersion: "1.29" + toVersion: "1.29" + - stage: beta + defaultValue: true + fromVersion: "1.30" --- Controls whether the apiserver embeds the node name and uid for the associated node when issuing service account tokens bound to Pod objects. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/service-traffic-distribution.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/service-traffic-distribution.md new file mode 100644 index 0000000000000..4c1e6d6c17933 --- /dev/null +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/service-traffic-distribution.md @@ -0,0 +1,16 @@ +--- +title: ServiceTrafficDistribution +content_type: feature_gate + +_build: + list: never + render: false + +stages: +- stage: alpha + defaultValue: false + fromVersion: "1.30" +--- +Allows usage of the optional `spec.trafficDistribution` field in Services. The +field offers a way to express preferences for how traffic is distributed to +Service endpoints. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/stable-load-balancer-node-set.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/stable-load-balancer-node-set.md index faaa09e420011..e2968c340482d 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/stable-load-balancer-node-set.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/stable-load-balancer-node-set.md @@ -9,6 +9,10 @@ stages: - stage: beta defaultValue: true fromVersion: "1.27" + toVersion: "1.29" + - stage: stable + defaultValue: true + fromVersion: "1.30" --- Enables less load balancer re-configurations by the service controller (KCCM) as an effect of changing node state. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/storage-version-migrator.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/storage-version-migrator.md new file mode 100644 index 0000000000000..01d9bd53b2304 --- /dev/null +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/storage-version-migrator.md @@ -0,0 +1,14 @@ +--- +title: StorageVersionMigrator +content_type: feature_gate +_build: + list: never + render: false + +stages: + - stage: alpha + defaultValue: false + fromVersion: "1.30" + toVersion: "1.32" +--- +Enables storage version migration. See [Migrate Kubernetes Objects Using Storage Version Migration](/docs/tasks/manage-kubernetes-objects/storage-version-migration) for more details. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/structured-authentication-configuration.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/structured-authentication-configuration.md index 76836a9425872..11c4f11ab09b5 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/structured-authentication-configuration.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/structured-authentication-configuration.md @@ -9,6 +9,10 @@ stages: - stage: alpha defaultValue: false fromVersion: "1.29" + toVersion: "1.29" + - stage: beta + defaultValue: true + fromVersion: "1.30" --- Enable [structured authentication configuration](/docs/reference/access-authn-authz/authentication/#configuring-the-api-server) for the API server. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/structured-authorization-configuration.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/structured-authorization-configuration.md index cad2cbb6415c3..d2f1a47283c6a 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/structured-authorization-configuration.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/structured-authorization-configuration.md @@ -9,6 +9,10 @@ stages: - stage: alpha defaultValue: false fromVersion: "1.29" + toVersion: "1.29" + - stage: beta + defaultValue: true + fromVersion: "1.30" --- Enable structured authorization configuration, so that cluster administrators can specify more than one [authorization webhook](/docs/reference/access-authn-authz/webhook/) diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/translate-stream-close-websocket-requests.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/translate-stream-close-websocket-requests.md index 08be9d219e2cb..95928403cb9c9 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/translate-stream-close-websocket-requests.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/translate-stream-close-websocket-requests.md @@ -6,9 +6,9 @@ _build: render: false stages: - - stage: alpha - defaultValue: false - fromVersion: "1.29" + - stage: beta + defaultValue: true + fromVersion: "1.30" --- Allow WebSocket streaming of the remote command sub-protocol (`exec`, `cp`, `attach`) from clients requesting diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/user-namespaces-support.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/user-namespaces-support.md index 0e46c3e3158aa..7cf8240545847 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/user-namespaces-support.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/user-namespaces-support.md @@ -6,8 +6,12 @@ _build: render: false stages: - - stage: alpha + - stage: alpha defaultValue: false fromVersion: "1.28" + toVersion: "1.29" + - stage: beta + defaultValue: false + fromVersion: "1.30" --- Enable user namespace support for Pods. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/validating-admission-policy.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/validating-admission-policy.md index 197422115d265..497c04e0a9291 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/validating-admission-policy.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/validating-admission-policy.md @@ -13,5 +13,9 @@ stages: - stage: beta defaultValue: false fromVersion: "1.28" + toVersion: "1.29" + - stage: stable + defaultValue: true + fromVersion: "1.30" --- Enable [ValidatingAdmissionPolicy](/docs/reference/access-authn-authz/validating-admission-policy/) support for CEL validations be used in Admission Control. diff --git a/content/en/docs/reference/command-line-tools-reference/kubelet.md b/content/en/docs/reference/command-line-tools-reference/kubelet.md index ff53fc63b2fb4..e5ece46bbd0ae 100644 --- a/content/en/docs/reference/command-line-tools-reference/kubelet.md +++ b/content/en/docs/reference/command-line-tools-reference/kubelet.md @@ -416,7 +416,7 @@ KubeletPodResourcesGet=true|false (ALPHA - default=false)
KubeletSeparateDiskGC=true|false (ALPHA - default=false)
KubeletTracing=true|false (BETA - default=true)
LegacyServiceAccountTokenCleanUp=true|false (BETA - default=true)
-LoadBalancerIPMode=true|false (ALPHA - default=false)
+LoadBalancerIPMode=true|false (BETA - default=true)
LocalStorageCapacityIsolationFSQuotaMonitoring=true|false (ALPHA - default=false)
LogarithmicScaleDown=true|false (BETA - default=true)
LoggingAlphaOptions=true|false (ALPHA - default=false)
diff --git a/content/en/docs/reference/kubectl/kubectl.md b/content/en/docs/reference/kubectl/kubectl.md index 99b7bc507f579..c09a05dfccc4b 100644 --- a/content/en/docs/reference/kubectl/kubectl.md +++ b/content/en/docs/reference/kubectl/kubectl.md @@ -350,6 +350,14 @@ kubectl [flags] When set to false, turns off extra HTTP headers detailing invoked kubectl command (Kubernetes version v1.22 or later) + +KUBECTL_DEBUG_CUSTOM_PROFILE + + +When set to true, custom flag will be enabled in kubectl debug. This flag is used to customize the pre-defined profiles. + + + KUBECTL_EXPLAIN_OPENAPIV3 @@ -366,6 +374,14 @@ kubectl [flags] + +KUBECTL_PORT_FORWARD_WEBSOCKETS + + +When set to true, the kubectl port-forward command will attempt to stream using the websockets protocol. If the upgrade to websockets fails, the commands will fallback to use the current SPDY protocol. + + + KUBECTL_REMOTE_COMMAND_WEBSOCKETS diff --git a/content/en/docs/reference/labels-annotations-taints/_index.md b/content/en/docs/reference/labels-annotations-taints/_index.md index 839ac6d742aa5..99330979e02a1 100644 --- a/content/en/docs/reference/labels-annotations-taints/_index.md +++ b/content/en/docs/reference/labels-annotations-taints/_index.md @@ -300,7 +300,7 @@ which is used by Kustomize and similar third-party tools. For example, Kustomize removes objects with this annotation from its final build output. -### container.apparmor.security.beta.kubernetes.io/* (beta) {#container-apparmor-security-beta-kubernetes-io} +### container.apparmor.security.beta.kubernetes.io/* (deprecated) {#container-apparmor-security-beta-kubernetes-io} Type: Annotation @@ -309,7 +309,7 @@ Example: `container.apparmor.security.beta.kubernetes.io/my-container: my-custom Used on: Pods This annotation allows you to specify the AppArmor security profile for a container within a -Kubernetes pod. +Kubernetes pod. As of Kubernetes v1.30, this should be set with the `appArmorProfile` field instead. To learn more, see the [AppArmor](/docs/tutorials/security/apparmor/) tutorial. The tutorial illustrates using AppArmor to restrict a container's abilities and access. @@ -1106,13 +1106,11 @@ Example: `kubernetes.io/legacy-token-invalid-since: 2023-10-27` Used on: Secret The control plane automatically adds this label to auto-generated Secrets that -have the type `kubernetes.io/service-account-token`, provided that you have the -`LegacyServiceAccountTokenCleanUp` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) -enabled. Kubernetes {{< skew currentVersion >}} enables that behavior by default. -This label marks the Secret-based token as invalid for authentication. The value -of this label records the date (ISO 8601 format, UTC time zone) when the control -plane detects that the auto-generated Secret has not been used for a specified -duration (defaults to one year). +have the type `kubernetes.io/service-account-token`. This label marks the +Secret-based token as invalid for authentication. The value of this label +records the date (ISO 8601 format, UTC time zone) when the control plane detects +that the auto-generated Secret has not been used for a specified duration +(defaults to one year). ### endpointslice.kubernetes.io/managed-by {#endpointslicekubernetesiomanaged-by} diff --git a/content/en/docs/reference/networking/ports-and-protocols.md b/content/en/docs/reference/networking/ports-and-protocols.md index 2e716e4d46fd9..388a50f519240 100644 --- a/content/en/docs/reference/networking/ports-and-protocols.md +++ b/content/en/docs/reference/networking/ports-and-protocols.md @@ -27,6 +27,7 @@ etcd cluster externally or on custom ports. | Protocol | Direction | Port Range | Purpose | Used By | |----------|-----------|-------------|-----------------------|-------------------------| | TCP | Inbound | 10250 | Kubelet API | Self, Control plane | +| TCP | Inbound | 10256 | kube-proxy | Self, Load balancers | | TCP | Inbound | 30000-32767 | NodePort Services† | All | † Default port range for [NodePort Services](/docs/concepts/services-networking/service/). diff --git a/content/en/docs/reference/networking/virtual-ips.md b/content/en/docs/reference/networking/virtual-ips.md index 862458009f81d..fc92c23af5ca0 100644 --- a/content/en/docs/reference/networking/virtual-ips.md +++ b/content/en/docs/reference/networking/virtual-ips.md @@ -488,6 +488,67 @@ route to ready node-local endpoints. If the traffic policy is `Local` and there are no node-local endpoints, the kube-proxy does not forward any traffic for the relevant Service. +If `Cluster` is specified all nodes are eligible load balancing targets _as long as_ +the node is not being deleted and kube-proxy is healthy. In this mode: load balancer +health checks are configured to target the service proxy's readiness port and path. +In the case of kube-proxy this evaluates to: `${NODE_IP}:10256/healthz`. kube-proxy +will return either an HTTP code 200 or 503. kube-proxy's load balancer health check +endpoint returns 200 if: + +1. kube-proxy is healthy, meaning: + - it's able to progress programming the network and isn't timing out while doing + so (the timeout is defined to be: **2 × `iptables.syncPeriod`**); and +2. the node is not being deleted (there is no deletion timestamp set for the Node). + +The reason why kube-proxy returns 503 and marks the node as not +eligible when it's being deleted, is because kube-proxy supports connection +draining for terminating nodes. A couple of important things occur from the point +of view of a Kubernetes-managed load balancer when a node _is being_ / _is_ deleted. + +While deleting: + +* kube-proxy will start failing its readiness probe and essentially mark the + node as not eligible for load balancer traffic. The load balancer health + check failing causes load balancers which support connection draining to + allow existing connections to terminate, and block new connections from + establishing. + +When deleted: + +* The service controller in the Kubernetes cloud controller manager removes the + node from the referenced set of eligible targets. Removing any instance from + the load balancer's set of backend targets immediately terminates all + connections. This is also the reason kube-proxy first fails the health check + while the node is deleting. + +It's important to note for Kubernetes vendors that if any vendor configures the +kube-proxy readiness probe as a liveness probe: that kube-proxy will start +restarting continuously when a node is deleting until it has been fully deleted. +kube-proxy exposes a `/livez` path which, as opposed to the `/healthz` one, does +**not** consider the Node's deleting state and only its progress programming the +network. `/livez` is therefore the recommended path for anyone looking to define +a livenessProbe for kube-proxy. + +Users deploying kube-proxy can inspect both the readiness / liveness state by +evaluating the metrics: `proxy_livez_total` / `proxy_healthz_total`. Both +metrics publish two series, one with the 200 label and one with the 503 one. + +For `Local` Services: kube-proxy will return 200 if + +1. kube-proxy is healthy/ready, and +2. has a local endpoint on the node in question. + +Node deletion does **not** have an impact on kube-proxy's return +code for what concerns load balancer health checks. The reason for this is: +deleting nodes could end up causing an ingress outage should all endpoints +simultaneously be running on said nodes. + +The Kubernetes project recommends that cloud provider integration code +configures load balancer health checks that target the service proxy's healthz +port. If you are using or implementing your own virtual IP implementation, +that people can use instead of kube-proxy, you should set up a similar health +checking port with logic that matches the kube-proxy implementation. + ### Traffic to terminating endpoints {{< feature-state for_k8s_version="v1.28" state="stable" >}} @@ -513,6 +574,94 @@ those terminating Pods. By the time the Pod completes termination, the external should have seen the node's health check failing and fully removed the node from the backend pool. +## Traffic Distribution + +The `spec.trafficDistribution` field within a Kubernetes Service allows you to +express preferences for how traffic should be routed to Service endpoints. +Implementations like kube-proxy use the `spec.trafficDistribution` field as a +guideline. The behavior associated with a given preference may subtly differ +between implementations. + +`PreferClose` with kube-proxy +: For kube-proxy, this means prioritizing sending traffic to endpoints within + the same zone as the client. The EndpointSlice controller updates + EndpointSlices with `hints` to communicate this preference, which kube-proxy + then uses for routing decisions. If a client's zone does not have any + available endpoints, traffic will be routed cluster-wide for that client. + +In the absence of any value for `trafficDistribution`, the default routing +strategy for kube-proxy is to distribute traffic to any endpoint in the cluster. + +### Comparison with `service.kubernetes.io/topology-mode: Auto` + +The `trafficDistribution` field with `PreferClose` and the +`service.kubernetes.io/topology-mode: Auto` annotation both aim to prioritize +same-zone traffic. However, there are key differences in their approaches: + +* `service.kubernetes.io/topology-mode: Auto`: Attempts to distribute traffic + proportionally across zones based on allocatable CPU resources. This heuristic + includes safeguards (such as the [fallback + behavior](/docs/concepts/services-networking/topology-aware-routing/#three-or-more-endpoints-per-zone) + for small numbers of endpoints) and could lead to the feature being disabled + in certain scenarios for load-balancing reasons. This approach sacrifices some + predictability in favor of potential load balancing. + +* `trafficDistribution: PreferClose`: This approach aims to be slightly simpler + and more predictable: "If there are endpoints in the zone, they will receive + all traffic for that zone, if there are no endpoints in a zone, the traffic + will be distributed to other zones". While the approach may offer more + predictability, it does mean that you are in control of managing a [potential + overload](#considerations-for-using-traffic-distribution-control). + +If the `service.kubernetes.io/topology-mode` annotation is set to `Auto`, it +will take precedence over `trafficDistribution`. (The annotation may be deprecated +in the future in favour of the `trafficDistribution` field). + +### Interaction with Traffic Policies + +When compared to the `trafficDistribution` field, the traffic policy fields +(`externalTrafficPolicy` and `internalTrafficPolicy`) are meant to offer a +stricter traffic locality requirements. Here's how `trafficDistribution` +interacts with them: + +* Precedence of Traffic Policies: For a given Service, if a traffic policy + (`externalTrafficPolicy` or `internalTrafficPolicy`) is set to `Local`, it + takes precedence over `trafficDistribution: PreferClose` for the corresponding + traffic type (external or internal, respectively). + +* `trafficDistribution` Influence: For a given Service, if a traffic policy + (`externalTrafficPolicy` or `internalTrafficPolicy`) is set to `Cluster` (the + default), or if the fields are not set, then `trafficDistribution: + PreferClose` guides the routing behavior for the corresponding traffic type + (external or internal, respectively). This means that an attempt will be made + to route traffic to an endpoint that is in the same zone as the client. + +### Considerations for using traffic distribution control + +* **Increased Probability of Overloaded Endpoints:** The `PreferClose` + heuristic will attempt to route traffic to the closest healthy endpoints + instead of spreading that traffic evenly across all endpoints. If you do not + have a sufficient number of endpoints within a zone, they may become + overloaded. This is especially likely if incoming traffic is not + proportionally distributed across zones. To mitigate this, consider the + following strategies: + + * [Pod Topology Spread + Constraints](/docs/concepts/scheduling-eviction/topology-spread-constraints/): + Use Pod Topology Spread Constraints to distribute your pods more evenly + across zones. + + * Zone-specific Deployments: If you expect to see skewed traffic patterns, + create a separate Deployment for each zone. This approach allows the + separate workloads to scale independently. There are also workload + management addons available from the ecosystem, outside the Kubernetes + project itself, that can help here. + +* **Implementation-specific behavior:** Each dataplane implementation may handle + this field slightly differently. If you're using an implementation other than + kube-proxy, refer the documentation specific to that implementation to + understand how this field is being handled. + ## {{% heading "whatsnext" %}} To learn more about Services, diff --git a/content/en/docs/reference/node/kubelet-config-directory-merging.md b/content/en/docs/reference/node/kubelet-config-directory-merging.md new file mode 100644 index 0000000000000..99ed1bc631203 --- /dev/null +++ b/content/en/docs/reference/node/kubelet-config-directory-merging.md @@ -0,0 +1,155 @@ +--- +content_type: "reference" +title: Kubelet Configuration Directory Merging +weight: 50 +--- + +When using the kubelet's `--config-dir` flag to specify a drop-in directory for +configuration, there is some specific behavior on how different types are +merged. + +Here are some examples of how different data types behave during configuration merging: + +### Structure Fields +There are two types of structure fields in a YAML structure: singular (or a +scalar type) and embedded (structures that contain scalar types). +The configuration merging process handles the overriding of singular and embedded struct fields to create a resulting kubelet configuration. + +For instance, you may want a baseline kubelet configuration for all nodes, but you may want to customize the `address` and `authorization` fields. +This can be done as follows: + +Main kubelet configuration file contents: +```yaml +apiVersion: kubelet.config.k8s.io/v1beta1 +kind: KubeletConfiguration +port: 20250 +authorization: + mode: Webhook + webhook: + cacheAuthorizedTTL: "5m" + cacheUnauthorizedTTL: "30s" +serializeImagePulls: false +address: "192.168.0.1" +``` + +Contents of a file in `--config-dir` directory: +```yaml +apiVersion: kubelet.config.k8s.io/v1beta1 +kind: KubeletConfiguration +authorization: + mode: AlwaysAllow + webhook: + cacheAuthorizedTTL: "8m" + cacheUnauthorizedTTL: "45s" +address: "192.168.0.8" +``` + +The resulting configuration will be as follows: +```yaml +apiVersion: kubelet.config.k8s.io/v1beta1 +kind: KubeletConfiguration +port: 20250 +serializeImagePulls: false +authorization: + mode: AlwaysAllow + webhook: + cacheAuthorizedTTL: "8m" + cacheUnauthorizedTTL: "45s" +address: "192.168.0.8" +``` + +### Lists +You can overide the slices/lists values of the kubelet configuration. +However, the entire list gets overridden during the merging process. +For example, you can override the `clusterDNS` list as follows: + +Main kubelet configuration file contents: +```yaml +apiVersion: kubelet.config.k8s.io/v1beta1 +kind: KubeletConfiguration +port: 20250 +serializeImagePulls: false +clusterDNS: + - "192.168.0.9" + - "192.168.0.8" +``` + +Contents of a file in `--config-dir` directory: +```yaml +apiVersion: kubelet.config.k8s.io/v1beta1 +kind: KubeletConfiguration +clusterDNS: + - "192.168.0.2" + - "192.168.0.3" + - "192.168.0.5" +``` + +The resulting configuration will be as follows: +```yaml +apiVersion: kubelet.config.k8s.io/v1beta1 +kind: KubeletConfiguration +port: 20250 +serializeImagePulls: false +clusterDNS: + - "192.168.0.2" + - "192.168.0.3" + - "192.168.0.5" +``` + +### Maps, including Nested Structures + +Individual fields in maps, regardless of their value types (boolean, string, etc.), can be selectively overridden. +However, for `map[string][]string`, the entire list associated with a specific field gets overridden. +Let's understand this better with an example, particularly on fields like `featureGates` and `staticPodURLHeader`: + +Main kubelet configuration file contents: +```yaml +apiVersion: kubelet.config.k8s.io/v1beta1 +kind: KubeletConfiguration +port: 20250 +serializeImagePulls: false +featureGates: + AllAlpha: false + MemoryQoS: true +staticPodURLHeader: + kubelet-api-support: + - "Authorization: 234APSDFA" + - "X-Custom-Header: 123" + custom-static-pod: + - "Authorization: 223EWRWER" + - "X-Custom-Header: 456" +``` + +Contents of a file in `--config-dir` directory: +```yaml +apiVersion: kubelet.config.k8s.io/v1beta1 +kind: KubeletConfiguration +featureGates: + MemoryQoS: false + KubeletTracing: true + DynamicResourceAllocation: true +staticPodURLHeader: + custom-static-pod: + - "Authorization: 223EWRWER" + - "X-Custom-Header: 345" +``` + +The resulting configuration will be as follows: +```yaml +apiVersion: kubelet.config.k8s.io/v1beta1 +kind: KubeletConfiguration +port: 20250 +serializeImagePulls: false +featureGates: + AllAlpha: false + MemoryQoS: false + KubeletTracing: true + DynamicResourceAllocation: true +staticPodURLHeader: + kubelet-api-support: + - "Authorization: 234APSDFA" + - "X-Custom-Header: 123" + custom-static-pod: + - "Authorization: 223EWRWER" + - "X-Custom-Header: 345" +``` diff --git a/content/en/docs/reference/setup-tools/kubeadm/implementation-details.md b/content/en/docs/reference/setup-tools/kubeadm/implementation-details.md index 33463afb8cad8..97494f69a5a2e 100644 --- a/content/en/docs/reference/setup-tools/kubeadm/implementation-details.md +++ b/content/en/docs/reference/setup-tools/kubeadm/implementation-details.md @@ -109,8 +109,6 @@ The user can skip specific preflight checks or all of them with the `--ignore-pr - [warning] if firewalld is active - [error] if API server bindPort or ports 10250/10251/10252 are used - [Error] if `/etc/kubernetes/manifest` folder already exists and it is not empty -- [Error] if `/proc/sys/net/bridge/bridge-nf-call-iptables` file does not exist/does not contain 1 -- [Error] if advertise address is ipv6 and `/proc/sys/net/bridge/bridge-nf-call-ip6tables` does not exist/does not contain 1. - [Error] if swap is on - [Error] if `conntrack`, `ip`, `iptables`, `mount`, `nsenter` commands are not present in the command path - [warning] if `ebtables`, `ethtool`, `socat`, `tc`, `touch`, `crictl` commands are not present in the command path diff --git a/content/en/docs/reference/setup-tools/kubeadm/kubeadm-init.md b/content/en/docs/reference/setup-tools/kubeadm/kubeadm-init.md index 0fbeb13e93040..2caa94d6764a1 100644 --- a/content/en/docs/reference/setup-tools/kubeadm/kubeadm-init.md +++ b/content/en/docs/reference/setup-tools/kubeadm/kubeadm-init.md @@ -1,7 +1,4 @@ --- -reviewers: -- luxas -- jbeda title: kubeadm init content_type: concept weight: 20 @@ -161,6 +158,7 @@ Feature | Default | Alpha | Beta | GA `EtcdLearnerMode` | `true` | 1.27 | 1.29 | - `PublicKeysECDSA` | `false` | 1.19 | - | - `RootlessControlPlane` | `false` | 1.22 | - | - +`WaitForAllControlPlaneComponents` | `false` | 1.30 | - | - {{< /table >}} {{< note >}} @@ -184,6 +182,16 @@ for `kube-apiserver`, `kube-controller-manager`, `kube-scheduler` and `etcd` to If the flag is not set, those components run as root. You can change the value of this feature gate before you upgrade to a newer version of Kubernetes. +`WaitForAllControlPlaneComponents` +: With this feature gate enabled kubeadm will wait for all control plane components (kube-apiserver, +kube-controller-manager, kube-scheduler) on a control plane node to report status 200 on their `/healthz` +endpoints. These checks are performed on `https://127.0.0.1:PORT/healthz`, where `PORT` is taken from +`--secure-port` of a component. If you specify custom `--secure-port` values in the kubeadm configuration +they will be respected. Without the feature gate enabled, kubeadm will only wait for the kube-apiserver +on a control plane node to become ready. The wait process starts right after the kubelet on the host +is started by kubeadm. You are advised to enable this feature gate in case you wish to observe a ready +state from all control plane components during the `kubeadm init` or `kubeadm join` command execution. + List of deprecated feature gates: {{< table caption="kubeadm deprecated feature gates" >}} diff --git a/content/en/docs/setup/production-environment/container-runtimes.md b/content/en/docs/setup/production-environment/container-runtimes.md index d2dc3497d3c3a..4154042e533cf 100644 --- a/content/en/docs/setup/production-environment/container-runtimes.md +++ b/content/en/docs/setup/production-environment/container-runtimes.md @@ -47,50 +47,33 @@ check the documentation for that version. ## Install and configure prerequisites -The following steps apply common settings for Kubernetes nodes on Linux. +### Network configuration -You can skip a particular setting if you're certain you don't need it. +By default, the Linux kernel does not allow IPv4 packets to be routed +between interfaces. Most Kubernetes cluster networking implementations +will change this setting (if needed), but some might expect the +administrator to do it for them. (Some might also expect other sysctl +parameters to be set, kernel modules to be loaded, etc; consult the +documentation for your specific network implementation.) -For more information, see -[Network Plugin Requirements](/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/#network-plugin-requirements) -or the documentation for your specific container runtime. +### Enable IPv4 packet forwarding {#prerequisite-ipv4-forwarding-optional} -### Forwarding IPv4 and letting iptables see bridged traffic - -Execute the below mentioned instructions: +To manually enable IPv4 packet forwarding: ```bash -cat < A subset of the kubelet's configuration parameters may be @@ -86,46 +96,195 @@ In the above example, this version is `kubelet.config.k8s.io/v1beta1`. ## Drop-in directory for kubelet configuration files {#kubelet-conf-d} -As of Kubernetes v1.28.0, the kubelet has been extended to support a drop-in configuration directory. The location of it can be specified with -`--config-dir` flag, and it defaults to `""`, or disabled, by default. +{{}} -You can only set `--config-dir` if you set the environment variable `KUBELET_CONFIG_DROPIN_DIR_ALPHA` for the kubelet process (the value of that variable does not matter). -For Kubernetes v{{< skew currentVersion >}}, the kubelet returns an error if you specify `--config-dir` without that variable set, and startup fails. -You cannot specify the drop-in configuration directory using the kubelet configuration file; only the CLI argument `--config-dir` can set it. +You can specify a drop-in configuration directory for the kubelet. By default, the kubelet does not look +for drop-in configuration files anywhere - you must specify a path. +For example: `--config-dir=/etc/kubernetes/kubelet.conf.d` + +For Kubernetes v1.28 to v1.29, you can only specify `--config-dir` if you also set +the environment variable `KUBELET_CONFIG_DROPIN_DIR_ALPHA` for the kubelet process (the value +of that variable does not matter). -One can use the kubelet configuration directory in a similar way to the kubelet config file. {{< note >}} -The suffix of a valid kubelet drop-in configuration file must be `.conf`. For instance: `99-kubelet-address.conf` +The suffix of a valid kubelet drop-in configuration file **must** be `.conf`. For instance: `99-kubelet-address.conf` {{< /note >}} -For instance, you may want a baseline kubelet configuration for all nodes, but you may want to customize the `address` field. This can be done as follows: +The kubelet processes files in its config drop-in directory by sorting the **entire file name** alphanumerically. +For instance, `00-kubelet.conf` is processed first, and then overridden with a file named `01-kubelet.conf`. -Main kubelet configuration file contents: -```yaml -apiVersion: kubelet.config.k8s.io/v1beta1 -kind: KubeletConfiguration -port: 20250 -serializeImagePulls: false -evictionHard: - memory.available: "200Mi" -``` +These files may contain partial configurations and might not be valid config files by themselves. +Validation is only performed on the final resulting configuration structure +stored internally in the kubelet. +This offers you flexibility in how you manage and combine kubelet configuration that comes from different sources. +However, it's important to note that the behavior varies based on the data type of the configuration fields. -Contents of a file in `--config-dir` directory: -```yaml -apiVersion: kubelet.config.k8s.io/v1beta1 -kind: KubeletConfiguration -address: "192.168.0.8" -``` +Different data types in the kubelet configuration structure merge differently. +See the [reference +document](/docs/reference/node/kubelet-config-directory-merging.md) for more +information. + +### Kubelet configuration merging order On startup, the kubelet merges configuration from: -* Command line arguments (lowest precedence). -* the kubelet configuration +* Feature gates specified over the command line (lowest precedence). +* The kubelet configuration. * Drop-in configuration files, according to sort order. -* Feature gates specified over the command line (highest precedence). +* Command line arguments excluding feature gates (highest precedence). + +{{< note >}} +The config drop-in dir mechanism for the kubelet is similar but different from how the `kubeadm` tool allows you to patch configuration. +The `kubeadm` tool uses a specific [patching strategy](/docs/setup/production-environment/tools/kubeadm/control-plane-flags/#patches) for its configuration, +whereas the only patch strategy for kubelet configuration drop-in files is `replace`. The kubelet determines the order of merges based on sorting the **suffixes** alphanumerically, +and replaces every field present in a higher priority file. +{{< /note >}} + +## Viewing the kubelet configuration + +Since the configuration could now be spread over multiple files with this feature, if someone wants to inspect the final actuated configuration, +they can follow these steps to inspect the kubelet configuration: + +1. Start a proxy server using [`kubectl proxy`](/docs/reference/kubectl/generated/kubectl-commands#proxy) in your terminal. -This produces the same outcome as if you used the [single configuration file](#create-the-config-file) used in the earlier example. +```bash +kubectl proxy +``` + +Which gives output like: + +```bash +Starting to serve on 127.0.0.1:8001 + +``` +2. Open another terminal window and use `curl` to fetch the kubelet configuration. +Replace `` with the actual name of your node: + +```bash +curl -X GET http://127.0.0.1:8001/api/v1/nodes//proxy/configz | jq . +``` +```bash +{ + "kubeletconfig": { + "enableServer": true, + "staticPodPath": "/var/run/kubernetes/static-pods", + "syncFrequency": "1m0s", + "fileCheckFrequency": "20s", + "httpCheckFrequency": "20s", + "address": "192.168.1.16", + "port": 10250, + "readOnlyPort": 10255, + "tlsCertFile": "/var/lib/kubelet/pki/kubelet.crt", + "tlsPrivateKeyFile": "/var/lib/kubelet/pki/kubelet.key", + "rotateCertificates": true, + "authentication": { + "x509": { + "clientCAFile": "/var/run/kubernetes/client-ca.crt" + }, + "webhook": { + "enabled": true, + "cacheTTL": "2m0s" + }, + "anonymous": { + "enabled": true + } + }, + "authorization": { + "mode": "AlwaysAllow", + "webhook": { + "cacheAuthorizedTTL": "5m0s", + "cacheUnauthorizedTTL": "30s" + } + }, + "registryPullQPS": 5, + "registryBurst": 10, + "eventRecordQPS": 50, + "eventBurst": 100, + "enableDebuggingHandlers": true, + "healthzPort": 10248, + "healthzBindAddress": "127.0.0.1", + "oomScoreAdj": -999, + "clusterDomain": "cluster.local", + "clusterDNS": [ + "10.0.0.10" + ], + "streamingConnectionIdleTimeout": "4h0m0s", + "nodeStatusUpdateFrequency": "10s", + "nodeStatusReportFrequency": "5m0s", + "nodeLeaseDurationSeconds": 40, + "imageMinimumGCAge": "2m0s", + "imageMaximumGCAge": "0s", + "imageGCHighThresholdPercent": 85, + "imageGCLowThresholdPercent": 80, + "volumeStatsAggPeriod": "1m0s", + "cgroupsPerQOS": true, + "cgroupDriver": "systemd", + "cpuManagerPolicy": "none", + "cpuManagerReconcilePeriod": "10s", + "memoryManagerPolicy": "None", + "topologyManagerPolicy": "none", + "topologyManagerScope": "container", + "runtimeRequestTimeout": "2m0s", + "hairpinMode": "promiscuous-bridge", + "maxPods": 110, + "podPidsLimit": -1, + "resolvConf": "/run/systemd/resolve/resolv.conf", + "cpuCFSQuota": true, + "cpuCFSQuotaPeriod": "100ms", + "nodeStatusMaxImages": 50, + "maxOpenFiles": 1000000, + "contentType": "application/vnd.kubernetes.protobuf", + "kubeAPIQPS": 50, + "kubeAPIBurst": 100, + "serializeImagePulls": true, + "evictionHard": { + "imagefs.available": "15%", + "memory.available": "100Mi", + "nodefs.available": "10%", + "nodefs.inodesFree": "5%" + }, + "evictionPressureTransitionPeriod": "1m0s", + "enableControllerAttachDetach": true, + "makeIPTablesUtilChains": true, + "iptablesMasqueradeBit": 14, + "iptablesDropBit": 15, + "featureGates": { + "AllAlpha": false + }, + "failSwapOn": false, + "memorySwap": {}, + "containerLogMaxSize": "10Mi", + "containerLogMaxFiles": 5, + "configMapAndSecretChangeDetectionStrategy": "Watch", + "enforceNodeAllocatable": [ + "pods" + ], + "volumePluginDir": "/usr/libexec/kubernetes/kubelet-plugins/volume/exec/", + "logging": { + "format": "text", + "flushFrequency": "5s", + "verbosity": 3, + "options": { + "json": { + "infoBufferSize": "0" + } + } + }, + "enableSystemLogHandler": true, + "enableSystemLogQuery": false, + "shutdownGracePeriod": "0s", + "shutdownGracePeriodCriticalPods": "0s", + "enableProfilingHandler": true, + "enableDebugFlagsHandler": true, + "seccompDefault": false, + "memoryThrottlingFactor": 0.9, + "registerNode": true, + "localStorageCapacityIsolation": true, + "containerRuntimeEndpoint": "unix:///var/run/crio/crio.sock" + } +} +``` @@ -133,4 +292,6 @@ This produces the same outcome as if you used the [single configuration file](#c - Learn more about kubelet configuration by checking the [`KubeletConfiguration`](/docs/reference/config-api/kubelet-config.v1beta1/) - reference. \ No newline at end of file + reference. +- Learn more about kubelet configuration merging in the + [reference document](/docs/reference/node/kubelet-config-directory-merging.md). \ No newline at end of file diff --git a/content/en/docs/tasks/configure-pod-container/security-context.md b/content/en/docs/tasks/configure-pod-container/security-context.md index 67f40e884ceba..b176d20df51c5 100644 --- a/content/en/docs/tasks/configure-pod-container/security-context.md +++ b/content/en/docs/tasks/configure-pod-container/security-context.md @@ -440,7 +440,17 @@ To assign SELinux labels, the SELinux security module must be loaded on the host ### Efficient SELinux volume relabeling -{{< feature-state for_k8s_version="v1.27" state="beta" >}} +{{< feature-state feature_gate_name="SELinuxMountReadWriteOncePod" >}} + +{{< note >}} +Kubernetes v1.27 introduced an early limited form of this behavior that was only applicable +to volumes (and PersistentVolumeClaims) using the `ReadWriteOncePod` access mode. + +As an alpha feature, you can enable the `SELinuxMount` +[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to widen that +performance improvement to other kinds of PersistentVolumeClaims, as explained in detail +below. +{{< /note >}} By default, the container runtime recursively assigns SELinux label to all files on all Pod volumes. To speed up this process, Kubernetes can change the @@ -451,7 +461,9 @@ To benefit from this speedup, all these conditions must be met: * The [feature gates](/docs/reference/command-line-tools-reference/feature-gates/) `ReadWriteOncePod` and `SELinuxMountReadWriteOncePod` must be enabled. -* Pod must use PersistentVolumeClaim with `accessModes: ["ReadWriteOncePod"]`. +* Pod must use PersistentVolumeClaim with applicable `accessModes` and [feature gates](/docs/reference/command-line-tools-reference/feature-gates/): + * Either the volume has `accessModes: ["ReadWriteOncePod"]`, and feature gate `SELinuxMountReadWriteOncePod` is enabled. + * Or the volume can use any other access modes and both feature gates `SELinuxMountReadWriteOncePod` and `SELinuxMount` must be enabled. * Pod (or all its Containers that use the PersistentVolumeClaim) must have `seLinuxOptions` set. * The corresponding PersistentVolume must be either: @@ -465,13 +477,56 @@ runtime recursively changes the SELinux label for all inodes (files and directo in the volume. The more files and directories in the volume, the longer that relabelling takes. +## Managing access to the `/proc` filesystem {#proc-access} + +{{< feature-state feature_gate_name="ProcMountType" >}} + +For runtimes that follow the OCI runtime specification, containers default to running in a mode where +there are multiple paths that are both masked and read-only. +The result of this is the container has these paths present inside the container's mount namespace, and they can function similarly to if +the container was an isolated host, but the container process cannot write to +them. The list of masked and read-only paths are as follows: + +- Masked Paths: + - `/proc/asound` + - `/proc/acpi` + - `/proc/kcore` + - `/proc/keys` + - `/proc/latency_stats` + - `/proc/timer_list` + - `/proc/timer_stats` + - `/proc/sched_debug` + - `/proc/scsi` + - `/sys/firmware` + +- Read-Only Paths: + - `/proc/bus` + - `/proc/fs` + - `/proc/irq` + - `/proc/sys` + - `/proc/sysrq-trigger` + + +For some Pods, you might want to bypass that default masking of paths. +The most common context for wanting this is if you are trying to run containers within +a Kubernetes container (within a pod). + +The `securityContext` field `procMount` allows a user to request a container's `/proc` +be `Unmasked`, or be mounted as read-write by the container process. This also +applies to `/sys/firmware` which is not in `/proc`. + +```yaml +... +securityContext: + procMount: Unmasked +``` + {{< note >}} - -If you are running Kubernetes v1.25, refer to the v1.25 version of this task page: -[Configure a Security Context for a Pod or Container](https://v1-25.docs.kubernetes.io/docs/tasks/configure-pod-container/security-context/) (v1.25). -There is an important note in that documentation about a situation where the kubelet -can lose track of volume labels after restart. This deficiency has been fixed -in Kubernetes 1.26. +Setting `procMount` to Unmasked requires the `spec.hostUsers` value in the pod +spec to be `false`. In other words: a container that wishes to have an Unmasked +`/proc` or unmasked `/sys` must also be in a +[user namespace](/docs/concepts/workloads/pods/user-namespaces/). +Kubernetes v1.12 to v1.29 did not enforce that requirement. {{< /note >}} ## Discussion @@ -520,3 +575,7 @@ kubectl delete pod security-context-demo-4 * For more information about security mechanisms in Linux, see [Overview of Linux Kernel Security Features](https://www.linux.com/learn/overview-linux-kernel-security-features) (Note: Some information is out of date) +* Read about [User Namespaces](/docs/concepts/workloads/pods/user-namespaces/) + for Linux pods. +* [Masked Paths in the OCI Runtime + Specification](https://github.com/opencontainers/runtime-spec/blob/f66aad47309/config-linux.md#masked-paths) \ No newline at end of file diff --git a/content/en/docs/tasks/configure-pod-container/user-namespaces.md b/content/en/docs/tasks/configure-pod-container/user-namespaces.md index f25d0f0da39b9..e719389df93e7 100644 --- a/content/en/docs/tasks/configure-pod-container/user-namespaces.md +++ b/content/en/docs/tasks/configure-pod-container/user-namespaces.md @@ -7,7 +7,7 @@ min-kubernetes-server-version: v1.25 --- -{{< feature-state for_k8s_version="v1.25" state="alpha" >}} +{{< feature-state for_k8s_version="v1.30" state="beta" >}} This page shows how to configure a user namespace for pods. This allows you to isolate the user running inside the container from the one in the host. @@ -57,10 +57,6 @@ If you have a mixture of nodes and only some of the nodes provide user namespace Pods, you also need to ensure that the user namespace Pods are [scheduled](/docs/concepts/scheduling-eviction/assign-pod-node/) to suitable nodes. -Please note that **if your container runtime doesn't support user namespaces, the -`hostUsers` field in the pod spec will be silently ignored and the pod will be -created without user namespaces.** - ## Run a Pod that uses a user namespace {#create-pod} @@ -82,27 +78,42 @@ to `false`. For example: kubectl attach -it userns bash ``` -And run the command. The output is similar to this: +Run this command: -```none +```shell readlink /proc/self/ns/user +``` + +The output is similar to: + +```shell user:[4026531837] +``` + +Also run: + +```shell cat /proc/self/uid_map -0 0 4294967295 ``` -Then, open a shell in the host and run the same command. +The output is similar to: +```shell +0 833617920 65536 +``` + +Then, open a shell in the host and run the same commands. + +The `readlink` command shows the user namespace the process is running in. It +should be different when it is run on the host and inside the container. -The output must be different. This means the host and the pod are using a -different user namespace. When user namespaces are not enabled, the host and the -pod use the same user namespace. +The last number of the `uid_map` file inside the container must be 65536, on the +host it must be a bigger number. If you are running the kubelet inside a user namespace, you need to compare the output from running the command in the pod to the output of running in the host: -```none +```shell readlink /proc/$pid/ns/user -user:[4026534732] ``` replacing `$pid` with the kubelet PID. diff --git a/content/en/docs/tasks/debug/debug-cluster/_index.md b/content/en/docs/tasks/debug/debug-cluster/_index.md index a10b0bdcff7d9..cde6043c0327d 100644 --- a/content/en/docs/tasks/debug/debug-cluster/_index.md +++ b/content/en/docs/tasks/debug/debug-cluster/_index.md @@ -203,7 +203,7 @@ status: type: PIDPressure - lastHeartbeatTime: "2022-02-17T22:20:15Z" lastTransitionTime: "2022-02-17T22:15:15Z" - message: kubelet is posting ready status. AppArmor enabled + message: kubelet is posting ready status reason: KubeletReady status: "True" type: Ready @@ -330,4 +330,3 @@ This is an incomplete list of things that could go wrong, and how to adjust your * Use `crictl` to [debug Kubernetes nodes](/docs/tasks/debug/debug-cluster/crictl/) * Get more information about [Kubernetes auditing](/docs/tasks/debug/debug-cluster/audit/) * Use `telepresence` to [develop and debug services locally](/docs/tasks/debug/debug-cluster/local-debugging/) - diff --git a/content/en/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions.md b/content/en/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions.md index f2e3e5d1e0ae2..867b4180eed48 100644 --- a/content/en/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions.md +++ b/content/en/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions.md @@ -719,12 +719,13 @@ crontab "my-new-cron-object" created ``` ### Validation ratcheting -{{< feature-state state="alpha" for_k8s_version="v1.28" >}} +{{< feature-state feature_gate_name="CRDValidationRatcheting" >}} -You need to enable the `CRDValidationRatcheting` +If you are using a version of Kubernetes older than v1.30, you need to explicitly +enable the `CRDValidationRatcheting` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to use this behavior, which then applies to all CustomResourceDefinitions in your -cluster. +cluster. Provided you enabled the feature gate, Kubernetes implements _validation racheting_ for CustomResourceDefinitions. The API server is willing to accept updates to resources that @@ -751,10 +752,12 @@ validations are not supported by ratcheting under the implementation in Kubernet - `x-kubernetes-validations` For Kubernetes 1.28, CRD validation rules](#validation-rules) are ignored by ratcheting. Starting with Alpha 2 in Kubernetes 1.29, `x-kubernetes-validations` - are ratcheted. + are ratcheted only if they do not refer to `oldSelf`. Transition Rules are never ratcheted: only errors raised by rules that do not - use `oldSelf` will be automatically ratcheted if their values are unchanged. + use `oldSelf` will be automatically ratcheted if their values are unchanged. + + To write custom ratcheting logic for CEL expressions, check out [optionalOldSelf](#field-optional-oldself). - `x-kubernetes-list-type` Errors arising from changing the list type of a subschema will not be ratcheted. For example adding `set` onto a list with duplicates will always @@ -772,8 +775,10 @@ validations are not supported by ratcheting under the implementation in Kubernet To remove a previously specified `additionalProperties` validation will not be ratcheted. - `metadata` - Errors arising from changes to fields within an object's `metadata` are not - ratcheted. + Errors that come from Kubernetes' built-in validation of an object's `metadata` + are not ratcheted (such as object name, or characters in a label value). + If you specify your own additional rules for the metadata of a custom resource, + that additional validation will be ratcheted. ### Validation rules @@ -1177,10 +1182,11 @@ Setting `fieldPath` is optional. #### The `optionalOldSelf` field {#field-optional-oldself} -{{< feature-state state="alpha" for_k8s_version="v1.29" >}} +{{< feature-state feature_gate_name="CRDValidationRatcheting" >}} -The feature [CRDValidationRatcheting](#validation-ratcheting) must be enabled in order to -make use of this field. +If your cluster does not have [CRD validation ratcheting](#validation-ratcheting) enabled, +the CustomResourceDefinition API doesn't include this field, and trying to set it may result +in an error. The `optionalOldSelf` field is a boolean field that alters the behavior of [Transition Rules](#transition-rules) described below. Normally, a transition rule will not evaluate if `oldSelf` cannot be determined: @@ -1624,6 +1630,96 @@ my-new-cron-object * * * * * 1 7s The `NAME` column is implicit and does not need to be defined in the CustomResourceDefinition. {{< /note >}} +### Field selectors + +[Field Selectors](/docs/concepts/overview/working-with-objects/field-selectors/) +let clients select custom resources based on the value of one or more resource +fields. + +All custom resources support the `metadata.name` and `metadata.namespace` field +selectors. + +Fields declared in a {{< glossary_tooltip term_id="CustomResourceDefinition" text="CustomResourceDefinition" >}} +may also be used with field selectors when included in the `spec.versions[*].selectableFields` field of the +{{< glossary_tooltip term_id="CustomResourceDefinition" text="CustomResourceDefinition" >}}. + +#### Selectable fields for custom resources {#crd-selectable-fields} + +{{< feature-state state="alpha" for_k8s_version="v1.30" >}} +{{< feature-state feature_gate_name="CustomResourceFieldSelectors" >}} + +You need to enable the `CustomResourceFieldSelectors` +[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to +use this behavior, which then applies to all CustomResourceDefinitions in your +cluster. + +The `spec.versions[*].selectableFields` field of a {{< glossary_tooltip term_id="CustomResourceDefinition" text="CustomResourceDefinition" >}} may be used to +declare which other fields in a custom resource may be used in field selectors. +The following example adds the `.spec.color` and `.spec.size` fields as +selectable fields. + +Save the CustomResourceDefinition to `shirt-resource-definition.yaml`: + +{{% code_sample file="customresourcedefinition/shirt-resource-definition.yaml" %}} + +Create the CustomResourceDefinition: + +```shell +kubectl apply -f https://k8s.io/examples/customresourcedefinition/shirt-resource-definition.yaml +``` + +Define some Shirts by editing `shirt-resources.yaml`; for example: + +{{% code_sample file="customresourcedefinition/shirt-resources.yaml" %}} + +Create the custom resources: + +```shell +kubectl apply -f https://k8s.io/examples/customresourcedefinition/shirt-resources.yaml +``` + +Get all the resources: + +```shell +kubectl get shirts.stable.example.com +``` + +The output is: + +``` +NAME COLOR SIZE +example1 blue S +example2 blue M +example3 green M +``` + +Fetch blue shirts (retrieve Shirts with a `color` of `blue`): + +```shell +kubectl get shirts.stable.example.com --field-selector spec.color=blue +``` + +Should output: + +``` +NAME COLOR SIZE +example1 blue S +example2 blue M +``` + +Get only resources with a `color` of `green` and a `size` of `M`: + +```shell +kubectl get shirts.stable.example.com --field-selector spec.color=green,spec.size=M +``` + +Should output: + +``` +NAME COLOR SIZE +example2 blue M +``` + #### Priority Each column includes a `priority` field. Currently, the priority diff --git a/content/en/docs/tasks/inject-data-application/define-environment-variable-container.md b/content/en/docs/tasks/inject-data-application/define-environment-variable-container.md index 73182d625f0d9..2b10561f4097f 100644 --- a/content/en/docs/tasks/inject-data-application/define-environment-variable-container.md +++ b/content/en/docs/tasks/inject-data-application/define-environment-variable-container.md @@ -102,6 +102,11 @@ Honorable`, and `Kubernetes`, respectively. The environment variable `MESSAGE` combines the set of all these environment variables and then uses it as a CLI argument passed to the `env-print-demo` container. +Environment variable names consist of letters, numbers, underscores, +dots, or hyphens, but the first character cannot be a digit. +If the `RelaxedEnvironmentVariableValidation` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled, +all [printable ASCII characters](https://www.ascii-code.com/characters/printable-characters) except "=" may be used for environment variable names. + ```yaml apiVersion: v1 kind: Pod diff --git a/content/en/docs/tasks/manage-kubernetes-objects/storage-version-migration.md b/content/en/docs/tasks/manage-kubernetes-objects/storage-version-migration.md new file mode 100644 index 0000000000000..b60ef782ee998 --- /dev/null +++ b/content/en/docs/tasks/manage-kubernetes-objects/storage-version-migration.md @@ -0,0 +1,313 @@ +--- +title: Migrate Kubernetes Objects Using Storage Version Migration + +reviewers: + - deads2k + - jpbetz + - enj + - nilekhc + +content_type: task +min-kubernetes-server-version: v1.30 +weight: 60 +--- + + +{{< feature-state feature_gate_name="StorageVersionMigrator" >}} + +Kubernetes relies on API data being actively re-written, to support some +maintenance activities related to at rest storage. Two prominent examples are +the versioned schema of stored resources (that is, the preferred storage schema +changing from v1 to v2 for a given resource) and encryption at rest +(that is, rewriting stale data based on a change in how the data should be encrypted). + +## {{% heading "prerequisites" %}} + +Install [`kubectl`](/docs/tasks/tools/#kubectl). + +{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}} + + + + +## Re-encrypt Kubernetes secrets using storage version migration +- To begin with, [configure KMS provider](/docs/tasks/administer-cluster/kms-provider/) + to encrypt data at rest in etcd using following encryption configuration. + ```yaml + kind: EncryptionConfiguration + apiVersion: apiserver.config.k8s.io/v1 + resources: + - resources: + - secrets + providers: + - aescbc: + keys: + - name: key1 + secret: c2VjcmV0IGlzIHNlY3VyZQ== + ``` + Make sure to enable automatic reload of encryption +configuration file by setting `--encryption-provider-config-automatic-reload` to true. +- Create a Secret using kubectl. + ```shell + kubectl create secret generic my-secret --from-literal=key1=supersecret + ``` +- [Verify](/docs/tasks/administer-cluster/kms-provider/#verifying-that-the-data-is-encrypted) + the serialized data for that Secret object is prefixed with `k8s:enc:aescbc:v1:key1`. +- Update the encryption configuration file as follows to rotate the encryption key. + ```yaml + kind: EncryptionConfiguration + apiVersion: apiserver.config.k8s.io/v1 + resources: + - resources: + - secrets + providers: + - aescbc: + keys: + - name: key2 + secret: c2VjcmV0IGlzIHNlY3VyZSwgaXMgaXQ/ + - aescbc: + keys: + - name: key1 + secret: c2VjcmV0IGlzIHNlY3VyZQ== + ``` +- To ensure that previously created secret `my-secert` is re-encrypted +with new key `key2`, you will use _Storage Version Migration_. +- Create a StorageVersionMigration manifest named `migrate-secret.yaml` as follows: + ```yaml + kind: StorageVersionMigration + apiVersion: storagemigration.k8s.io/v1alpha1 + metadata: + name: secrets-migration + spec: + resource: + group: "" + version: v1 + resource: secrets + ``` + Create the object using _kubectl_ as follows: + ```shell + kubectl apply -f migrate-secret.yaml + ``` +- Monitor migration of Secrets by checking the `.status` of the StorageVersionMigration. + A successful migration should have its +`Succeeded` condition set to true. Get the StorageVersionMigration object +as follows: + ```shell + kubectl get storageversionmigration.storagemigration.k8s.io/secrets-migration -o yaml + ``` + + The output is similar to: + ```yaml + kind: StorageVersionMigration + apiVersion: storagemigration.k8s.io/v1alpha1 + metadata: + name: secrets-migration + uid: 628f6922-a9cb-4514-b076-12d3c178967c + resourceVersion: '90' + creationTimestamp: '2024-03-12T20:29:45Z' + spec: + resource: + group: "" + version: v1 + resource: secrets + status: + conditions: + - type: Running + status: 'False' + lastUpdateTime: '2024-03-12T20:29:46Z' + reason: StorageVersionMigrationInProgress + - type: Succeeded + status: 'True' + lastUpdateTime: '2024-03-12T20:29:46Z' + reason: StorageVersionMigrationSucceeded + resourceVersion: '84' + ``` +- [Verify](/docs/tasks/administer-cluster/kms-provider/#verifying-that-the-data-is-encrypted) + the stored secret is now prefixed with `k8s:enc:aescbc:v1:key2`. + +## Update the preferred storage schema of a CRD +Consider a scenario where a {{< glossary_tooltip term_id="CustomResourceDefinition" text="CustomResourceDefinition" >}} +(CRD) is created to serve custom resources (CRs) and is set as the preferred storage schema. When it's time +to introduce v2 of the CRD, it can be added for serving only with a conversion +webhook. This enables a smoother transition where users can create CRs using +either the v1 or v2 schema, with the webhook in place to perform the necessary +schema conversion between them. Before setting v2 as the preferred storage schema +version, it's important to ensure that all existing CRs stored as v1 are migrated to v2. +This migration can be achieved through _Storage Version Migration_ to migrate all CRs from v1 to v2. + +- Create a manifest for the CRD, named `test-crd.yaml`, as follows: + ```yaml + apiVersion: apiextensions.k8s.io/v1 + kind: CustomResourceDefinition + metadata: + name: selfierequests.stable.example.com + spec: + group: stable.example.com + names: + plural: SelfieRequests + singular: SelfieRequest + kind: SelfieRequest + listKind: SelfieRequestList + scope: Namespaced + versions: + - name: v1 + served: true + storage: true + schema: + openAPIV3Schema: + type: object + properties: + hostPort: + type: string + conversion: + strategy: Webhook + webhook: + clientConfig: + url: https://127.0.0.1:9443/crdconvert + caBundle: + conversionReviewVersions: + - v1 + - v2 + ``` + Create CRD using kubectl + ```shell + kubectl apply -f test-crd.yaml + ``` +- Create a manifest for an example testcrd. Name the manifest `cr1.yaml` and use these contents: + ```yaml + apiVersion: stable.example.com/v1 + kind: SelfieRequest + metadata: + name: cr1 + namespace: default + ``` + Create CR using kubectl + ```shell + kubectl apply -f cr1.yaml + ``` +- Verify that CR is written and stored as v1 by getting the object from etcd. + ```shell + ETCDCTL_API=3 etcdctl get /kubernetes.io/stable.example.com/testcrds/default/cr1 [...] | hexdump -C + ``` + where `[...]` contains the additional arguments for connecting to the etcd server. +- Update the CRD `test-crd.yaml` to include v2 version for serving and storage + and v1 as serving only, as follows: + ```yaml + apiVersion: apiextensions.k8s.io/v1 + kind: CustomResourceDefinition + metadata: + name: selfierequests.stable.example.com + spec: + group: stable.example.com + names: + plural: SelfieRequests + singular: SelfieRequest + kind: SelfieRequest + listKind: SelfieRequestList + scope: Namespaced + versions: + - name: v2 + served: true + storage: true + schema: + openAPIV3Schema: + type: object + properties: + host: + type: string + port: + type: string + - name: v1 + served: true + storage: false + schema: + openAPIV3Schema: + type: object + properties: + hostPort: + type: string + conversion: + strategy: Webhook + webhook: + clientConfig: + url: 'https://127.0.0.1:9443/crdconvert' + caBundle: + conversionReviewVersions: + - v1 + - v2 + ``` + Update CRD using kubectl + ```shell + kubectl apply -f test-crd.yaml + ``` +- Create CR resource file with name `cr2.yaml` as follows: + ```yaml + apiVersion: stable.example.com/v2 + kind: SelfieRequest + metadata: + name: cr2 + namespace: default + ``` +- Create CR using kubectl + ```shell + kubectl apply -f cr2.yaml + ``` +- Verify that CR is written and stored as v2 by getting the object from etcd. + ```shell + ETCDCTL_API=3 etcdctl get /kubernetes.io/stable.example.com/testcrds/default/cr2 [...] | hexdump -C + ``` + where `[...]` contains the additional arguments for connecting to the etcd server. +- Create a StorageVersionMigration manifest named `migrate-crd.yaml`, with the contents as follows: + ```yaml + kind: StorageVersionMigration + apiVersion: storagemigration.k8s.io/v1alpha1 + metadata: + name: crdsvm + spec: + resource: + group: stable.example.com + version: v1 + resource: SelfieRequest + ``` + Create the object using _kubectl_ as follows: + ```shell + kubectl apply -f migrate-crd.yaml + ``` +- Monitor migration of secrets using status. Successful migration should have + `Succeeded` condition set to "True" in the status field. Get the migration resource + as follows: + ```shell + kubectl get storageversionmigration.storagemigration.k8s.io/crdsvm -o yaml + ``` + + The output is similar to: + ```yaml + kind: StorageVersionMigration + apiVersion: storagemigration.k8s.io/v1alpha1 + metadata: + name: crdsvm + uid: 13062fe4-32d7-47cc-9528-5067fa0c6ac8 + resourceVersion: '111' + creationTimestamp: '2024-03-12T22:40:01Z' + spec: + resource: + group: stable.example.com + version: v1 + resource: testcrds + status: + conditions: + - type: Running + status: 'False' + lastUpdateTime: '2024-03-12T22:40:03Z' + reason: StorageVersionMigrationInProgress + - type: Succeeded + status: 'True' + lastUpdateTime: '2024-03-12T22:40:03Z' + reason: StorageVersionMigrationSucceeded + resourceVersion: '106' + ``` +- Verify that previously created cr1 is now written and stored as v2 by getting the object from etcd. + ```shell + ETCDCTL_API=3 etcdctl get /kubernetes.io/stable.example.com/testcrds/default/cr1 [...] | hexdump -C + ``` + where `[...]` contains the additional arguments for connecting to the etcd server. diff --git a/content/en/docs/tasks/run-application/horizontal-pod-autoscale.md b/content/en/docs/tasks/run-application/horizontal-pod-autoscale.md index e3b736a8205d8..f25317fef9a0d 100644 --- a/content/en/docs/tasks/run-application/horizontal-pod-autoscale.md +++ b/content/en/docs/tasks/run-application/horizontal-pod-autoscale.md @@ -278,12 +278,12 @@ pod usage is still within acceptable limits. ### Container resource metrics -{{< feature-state for_k8s_version="v1.27" state="beta" >}} +{{< feature-state feature_gate_name="HPAContainerMetrics" >}} The HorizontalPodAutoscaler API also supports a container metric source where the HPA can track the resource usage of individual containers across a set of Pods, in order to scale the target resource. This lets you configure scaling thresholds for the containers that matter most in a particular Pod. -For example, if you have a web application and a logging sidecar, you can scale based on the resource +For example, if you have a web application and a sidecar container that provides logging, you can scale based on the resource use of the web application, ignoring the sidecar container and its resource use. If you revise the target resource to have a new Pod specification with a different set of containers, diff --git a/content/en/docs/tutorials/security/apparmor.md b/content/en/docs/tutorials/security/apparmor.md index 49e9f641fd3be..c7c51d92365d1 100644 --- a/content/en/docs/tutorials/security/apparmor.md +++ b/content/en/docs/tutorials/security/apparmor.md @@ -8,7 +8,7 @@ weight: 30 -{{< feature-state for_k8s_version="v1.4" state="beta" >}} +{{< feature-state feature_gate_name="AppArmor" >}} [AppArmor](https://apparmor.net/) is a Linux kernel security module that supplements the standard Linux user and group based @@ -54,7 +54,7 @@ Nodes before proceeding: Y ``` - The Kubelet verifies that AppArmor is enabled on the host before admitting a pod with AppArmor + The kubelet verifies that AppArmor is enabled on the host before admitting a pod with AppArmor explicitly configured. 3. Container runtime supports AppArmor -- All common Kubernetes-supported container @@ -64,7 +64,7 @@ Nodes before proceeding: 4. Profile is loaded -- AppArmor is applied to a Pod by specifying an AppArmor profile that each container should be run with. If any of the specified profiles are not loaded in the - kernel, the Kubelet will reject the Pod. You can view which profiles are loaded on a + kernel, the kubelet will reject the Pod. You can view which profiles are loaded on a node by checking the `/sys/kernel/security/apparmor/profiles` file. For example: ```shell @@ -85,25 +85,26 @@ Nodes before proceeding: ## Securing a Pod {{< note >}} -AppArmor is currently in beta, so options are specified as annotations. Once support graduates to -general availability, the annotations will be replaced with first-class fields. +Prior to Kubernetes v1.30, AppArmor was specified through annotations. Use the documentation version +selector to view the documentation with this deprecated API. {{< /note >}} -AppArmor profiles are specified *per-container*. To specify the AppArmor profile to run a Pod -container with, add an annotation to the Pod's metadata: +AppArmor profiles can be specified at the pod level or container level. The container AppArmor +profile takes precedence over the pod profile. ```yaml -container.apparmor.security.beta.kubernetes.io/: +securityContext: + appArmorProfile: + type: ``` -Where `` is the name of the container to apply the profile to, and `` -specifies the profile to apply. The `` can be one of: +Where `` is one of: -* `runtime/default` to apply the runtime's default profile -* `localhost/` to apply the profile loaded on the host with the name `` -* `unconfined` to indicate that no profiles will be loaded +* `RuntimeDefault` to use the runtime's default profile +* `Localhost` to use a profile loaded on the host (see below) +* `Unconfined` to run without AppArmor -See the [API Reference](#api-reference) for the full details on the annotation and profile name formats. +See the [API Reference](#api-reference) for the full details on the AppArmor profile API. To verify that the profile was applied, you can check that the container's root process is running with the correct profile by examining its proc attr: @@ -115,14 +116,14 @@ kubectl exec -- cat /proc/1/attr/current The output should look something like this: ``` -k8s-apparmor-example-deny-write (enforce) +cri-containerd.apparmor.d (enforce) ``` ## Example *This example assumes you have already set up a cluster with AppArmor support.* -First, load the profile you want to use onto your Nodes. This profile denies all file writes: +First, load the profile you want to use onto your Nodes. This profile blocks all file write operations: ``` #include @@ -197,9 +198,11 @@ apiVersion: v1 kind: Pod metadata: name: hello-apparmor-2 - annotations: - container.apparmor.security.beta.kubernetes.io/hello: localhost/k8s-apparmor-example-allow-write spec: + securityContext: + appArmorProfile: + type: Localhost + localhostProfile: k8s-apparmor-example-allow-write containers: - name: hello image: busybox:1.28 @@ -243,7 +246,7 @@ An Event provides the error message with the reason, the specific wording is run ### Setting up Nodes with profiles -Kubernetes does not currently provide any built-in mechanisms for loading AppArmor profiles onto +Kubernetes {{< skew currentVersion >}} does not provide any built-in mechanisms for loading AppArmor profiles onto Nodes. Profiles can be loaded through custom infrastructure or tools like the [Kubernetes Security Profiles Operator](https://github.com/kubernetes-sigs/security-profiles-operator). @@ -270,29 +273,31 @@ logs or through `journalctl`. More information is provided in [AppArmor failures](https://gitlab.com/apparmor/apparmor/wikis/AppArmor_Failures). -## API Reference +## Specifying AppArmor confinement + +{{< caution >}} +Prior to Kubernetes v1.30, AppArmor was specified through annotations. Use the documentation version +selector to view the documentation with this deprecated API. +{{< /caution >}} -### Pod Annotation +### AppArmor profile within security context {#appArmorProfile} -Specifying the profile a container will run with: +You can specify the `appArmorProfile` on either a container's `securityContext` or on a Pod's +`securityContext`. If the profile is set at the pod level, it will be used as the default profile +for all containers in the pod (including init, sidecar, and ephemeral containers). If both a pod & container +AppArmor profile are set, the container's profile will be used. -- **key**: `container.apparmor.security.beta.kubernetes.io/` - Where `` matches the name of a container in the Pod. - A separate profile can be specified for each container in the Pod. -- **value**: a profile reference, described below +An AppArmor profile has 2 fields: -### Profile Reference +`type` _(required)_ - indicates which kind of AppArmor profile will be applied. Valid options are: + - `Localhost` - a profile pre-loaded on the node (specified by `localhostProfile`). + - `RuntimeDefault` - the container runtime's default profile. + - `Unconfined` - no AppArmor enforcement. -- `runtime/default`: Refers to the default runtime profile. - - Equivalent to not specifying a profile, except it still requires AppArmor to be enabled. - - In practice, many container runtimes use the same OCI default profile, defined here: - https://github.com/containers/common/blob/main/pkg/apparmor/apparmor_linux_template.go -- `localhost/`: Refers to a profile loaded on the node (localhost) by name. - - The possible profile names are detailed in the - [core policy reference](https://gitlab.com/apparmor/apparmor/wikis/AppArmor_Core_Policy_Reference#profile-names-and-attachment-specifications). -- `unconfined`: This effectively disables AppArmor on the container. +`localhostProfile` - The name of a profile loaded on the node that should be used. +The profile must be preconfigured on the node to work. +This option must be provided if and only if the `type` is `Localhost`. -Any other profile reference format is invalid. ## {{% heading "whatsnext" %}} diff --git a/content/en/examples/access/deployment-replicas-policy.yaml b/content/en/examples/access/deployment-replicas-policy.yaml index e12a8a0961fad..d7d11e9a20ced 100644 --- a/content/en/examples/access/deployment-replicas-policy.yaml +++ b/content/en/examples/access/deployment-replicas-policy.yaml @@ -1,4 +1,4 @@ -apiVersion: admissionregistration.k8s.io/v1alpha1 +apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicy metadata: name: "deploy-replica-policy.example.com" diff --git a/content/en/examples/access/image-matches-namespace-environment.policy.yaml b/content/en/examples/access/image-matches-namespace-environment.policy.yaml index 1a3da26898608..cf7508d253091 100644 --- a/content/en/examples/access/image-matches-namespace-environment.policy.yaml +++ b/content/en/examples/access/image-matches-namespace-environment.policy.yaml @@ -2,7 +2,7 @@ # Except for "exempt" deployments, or any containers that do not belong to the "example.com" organization (e.g. common sidecars). # For example, if the namespace has a label of {"environment": "staging"}, all container images must be either staging.example.com/* # or do not contain "example.com" at all, unless the deployment has {"exempt": "true"} label. -apiVersion: admissionregistration.k8s.io/v1beta1 +apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicy metadata: name: "image-matches-namespace-environment.policy.example.com" diff --git a/content/en/examples/access/validating-admission-policy-audit-annotation.yaml b/content/en/examples/access/validating-admission-policy-audit-annotation.yaml index 127720b09654d..1c422a825447f 100644 --- a/content/en/examples/access/validating-admission-policy-audit-annotation.yaml +++ b/content/en/examples/access/validating-admission-policy-audit-annotation.yaml @@ -1,4 +1,4 @@ -apiVersion: admissionregistration.k8s.io/v1alpha1 +apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicy metadata: name: "demo-policy.example.com" diff --git a/content/en/examples/access/validating-admission-policy-match-conditions.yaml b/content/en/examples/access/validating-admission-policy-match-conditions.yaml index 9a49adf15212c..eafebd2c2c274 100644 --- a/content/en/examples/access/validating-admission-policy-match-conditions.yaml +++ b/content/en/examples/access/validating-admission-policy-match-conditions.yaml @@ -1,4 +1,4 @@ -apiVersion: admissionregistration.k8s.io/v1alpha1 +apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicy metadata: name: "demo-policy.example.com" diff --git a/content/en/examples/controllers/job-success-policy.yaml b/content/en/examples/controllers/job-success-policy.yaml new file mode 100644 index 0000000000000..1f7927b2f34fc --- /dev/null +++ b/content/en/examples/controllers/job-success-policy.yaml @@ -0,0 +1,25 @@ +apiVersion: batch/v1 +kind: Job +spec: + parallelism: 10 + completions: 10 + completionMode: Indexed # Required for the success policy + successPolicy: + rules: + - succeededIndexes: 0,2-3 + succeededCount: 1 + template: + spec: + containers: + - name: main + image: python + command: # Provided that at least one of the Pods with 0, 2, and 3 indexes has succeeded, + # the overall Job is a success. + - python3 + - -c + - | + import os, sys + if os.environ.get("JOB_COMPLETION_INDEX") == "2": + sys.exit(0) + else: + sys.exit(1) diff --git a/content/en/examples/customresourcedefinition/shirt-resource-definition.yaml b/content/en/examples/customresourcedefinition/shirt-resource-definition.yaml new file mode 100644 index 0000000000000..111d0d74896dc --- /dev/null +++ b/content/en/examples/customresourcedefinition/shirt-resource-definition.yaml @@ -0,0 +1,36 @@ +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + name: shirts.stable.example.com +spec: + group: stable.example.com + scope: Namespaced + names: + plural: shirts + singular: shirt + kind: Shirt + versions: + - name: v1 + served: true + storage: true + schema: + openAPIV3Schema: + type: object + properties: + spec: + type: object + properties: + color: + type: string + size: + type: string + selectableFields: + - jsonPath: .spec.color + - jsonPath: .spec.size + additionalPrinterColumns: + - jsonPath: .spec.color + name: Color + type: string + - jsonPath: .spec.size + name: Size + type: string diff --git a/content/en/examples/customresourcedefinition/shirt-resources.yaml b/content/en/examples/customresourcedefinition/shirt-resources.yaml new file mode 100644 index 0000000000000..8a123333cc715 --- /dev/null +++ b/content/en/examples/customresourcedefinition/shirt-resources.yaml @@ -0,0 +1,24 @@ +--- +apiVersion: stable.example.com/v1 +kind: Shirt +metadata: + name: example1 +spec: + color: blue + size: S +--- +apiVersion: stable.example.com/v1 +kind: Shirt +metadata: + name: example2 +spec: + color: blue + size: M +--- +apiVersion: stable.example.com/v1 +kind: Shirt +metadata: + name: example3 +spec: + color: green + size: M diff --git a/content/en/examples/pods/security/hello-apparmor.yaml b/content/en/examples/pods/security/hello-apparmor.yaml index 8fe23590be37a..a434db1d15bff 100644 --- a/content/en/examples/pods/security/hello-apparmor.yaml +++ b/content/en/examples/pods/security/hello-apparmor.yaml @@ -2,10 +2,11 @@ apiVersion: v1 kind: Pod metadata: name: hello-apparmor - annotations: - # Tell Kubernetes to apply the AppArmor profile "k8s-apparmor-example-deny-write". - container.apparmor.security.beta.kubernetes.io/hello: localhost/k8s-apparmor-example-deny-write spec: + securityContext: + appArmorProfile: + type: Localhost + localhostProfile: k8s-apparmor-example-deny-write containers: - name: hello image: busybox:1.28 diff --git a/content/en/examples/storage/rro.yaml b/content/en/examples/storage/rro.yaml new file mode 100644 index 0000000000000..1ffc6b038971b --- /dev/null +++ b/content/en/examples/storage/rro.yaml @@ -0,0 +1,28 @@ +apiVersion: v1 +kind: Pod +metadata: + name: rro +spec: + volumes: + - name: mnt + hostPath: + # tmpfs is mounted on /mnt/tmpfs + path: /mnt + containers: + - name: busybox + image: busybox + args: ["sleep", "infinity"] + volumeMounts: + # /mnt-rro/tmpfs is not writable + - name: mnt + mountPath: /mnt-rro + readOnly: true + mountPropagation: None + recursiveReadOnly: Enabled + # /mnt-ro/tmpfs is writable + - name: mnt + mountPath: /mnt-ro + readOnly: true + # /mnt-rw/tmpfs is writable + - name: mnt + mountPath: /mnt-rw diff --git a/content/en/examples/validatingadmissionpolicy/basic-example-binding.yaml b/content/en/examples/validatingadmissionpolicy/basic-example-binding.yaml index 52de1049c253e..9ad4c6a319a05 100644 --- a/content/en/examples/validatingadmissionpolicy/basic-example-binding.yaml +++ b/content/en/examples/validatingadmissionpolicy/basic-example-binding.yaml @@ -1,4 +1,4 @@ -apiVersion: admissionregistration.k8s.io/v1beta1 +apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicyBinding metadata: name: "demo-binding-test.example.com" @@ -8,4 +8,4 @@ spec: matchResources: namespaceSelector: matchLabels: - environment: test \ No newline at end of file + environment: test diff --git a/content/en/examples/validatingadmissionpolicy/basic-example-policy.yaml b/content/en/examples/validatingadmissionpolicy/basic-example-policy.yaml index bfdb9ee214184..720839fd480f2 100644 --- a/content/en/examples/validatingadmissionpolicy/basic-example-policy.yaml +++ b/content/en/examples/validatingadmissionpolicy/basic-example-policy.yaml @@ -1,4 +1,4 @@ -apiVersion: admissionregistration.k8s.io/v1beta1 +apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicy metadata: name: "demo-policy.example.com" diff --git a/content/en/examples/validatingadmissionpolicy/binding-with-param-prod.yaml b/content/en/examples/validatingadmissionpolicy/binding-with-param-prod.yaml index a2186ee86234c..b0ad21916235f 100644 --- a/content/en/examples/validatingadmissionpolicy/binding-with-param-prod.yaml +++ b/content/en/examples/validatingadmissionpolicy/binding-with-param-prod.yaml @@ -1,4 +1,4 @@ -apiVersion: admissionregistration.k8s.io/v1beta1 +apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicyBinding metadata: name: "replicalimit-binding-nontest" diff --git a/content/en/examples/validatingadmissionpolicy/binding-with-param.yaml b/content/en/examples/validatingadmissionpolicy/binding-with-param.yaml index cad7a5b02f4bf..596d21e459f3a 100644 --- a/content/en/examples/validatingadmissionpolicy/binding-with-param.yaml +++ b/content/en/examples/validatingadmissionpolicy/binding-with-param.yaml @@ -1,4 +1,4 @@ -apiVersion: admissionregistration.k8s.io/v1beta1 +apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicyBinding metadata: name: "replicalimit-binding-test.example.com" diff --git a/content/en/examples/validatingadmissionpolicy/failure-policy-ignore.yaml b/content/en/examples/validatingadmissionpolicy/failure-policy-ignore.yaml index 53e3990a1ffff..04fbf2ce9d26b 100644 --- a/content/en/examples/validatingadmissionpolicy/failure-policy-ignore.yaml +++ b/content/en/examples/validatingadmissionpolicy/failure-policy-ignore.yaml @@ -1,4 +1,4 @@ -apiVersion: admissionregistration.k8s.io/v1beta1 +apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicy spec: ... diff --git a/content/en/examples/validatingadmissionpolicy/policy-with-param.yaml b/content/en/examples/validatingadmissionpolicy/policy-with-param.yaml index c493115987bd4..03977e275d494 100644 --- a/content/en/examples/validatingadmissionpolicy/policy-with-param.yaml +++ b/content/en/examples/validatingadmissionpolicy/policy-with-param.yaml @@ -1,4 +1,4 @@ -apiVersion: admissionregistration.k8s.io/v1beta1 +apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicy metadata: name: "replicalimit-policy.example.com" diff --git a/content/en/examples/validatingadmissionpolicy/typechecking-multiple-match.yaml b/content/en/examples/validatingadmissionpolicy/typechecking-multiple-match.yaml index 77a49d192c558..620fea458cae4 100644 --- a/content/en/examples/validatingadmissionpolicy/typechecking-multiple-match.yaml +++ b/content/en/examples/validatingadmissionpolicy/typechecking-multiple-match.yaml @@ -1,4 +1,4 @@ -apiVersion: admissionregistration.k8s.io/v1beta1 +apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicy metadata: name: "replica-policy.example.com" diff --git a/content/en/examples/validatingadmissionpolicy/typechecking.yaml b/content/en/examples/validatingadmissionpolicy/typechecking.yaml index f088420811c54..a44fdc30893fb 100644 --- a/content/en/examples/validatingadmissionpolicy/typechecking.yaml +++ b/content/en/examples/validatingadmissionpolicy/typechecking.yaml @@ -1,4 +1,4 @@ -apiVersion: admissionregistration.k8s.io/v1beta1 +apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicy metadata: name: "deploy-replica-policy.example.com" @@ -12,4 +12,4 @@ spec: validations: - expression: "object.replicas > 1" # should be "object.spec.replicas > 1" message: "must be replicated" - reason: Invalid \ No newline at end of file + reason: Invalid diff --git a/data/releases/schedule.yaml b/data/releases/schedule.yaml index 2704d8461be58..022692f89e1bc 100644 --- a/data/releases/schedule.yaml +++ b/data/releases/schedule.yaml @@ -2,6 +2,14 @@ # This file helps to populate the /releases page, and is also parsed to find out the # latest patch version for a minor release. schedules: +- release: 1.30 + releaseDate: 2024-04-17 + next: + release: 1.30.1 + cherryPickDeadline: 2024-05-10 + targetDate: 2024-05-15 + maintenanceModeStartDate: 2025-04-28 + endOfLifeDate: 2025-06-28 - release: 1.29 releaseDate: 2023-12-13 next: diff --git a/hugo.toml b/hugo.toml index a7f92d47f57c7..3a16fc84d7457 100644 --- a/hugo.toml +++ b/hugo.toml @@ -142,9 +142,9 @@ time_format_default = "January 02, 2006 at 3:04 PM PST" description = "Production-Grade Container Orchestration" showedit = true -latest = "v1.29" +latest = "v1.30" -version = "v1.29" +version = "v1.30" githubbranch = "main" docsbranch = "main" deprecated = false @@ -184,35 +184,35 @@ js = [ ] [[params.versions]] -version = "v1.29" -githubbranch = "v1.29.0" +version = "v1.30" +githubbranch = "v1.30.0" docsbranch = "main" url = "https://kubernetes.io" +[[params.versions]] +version = "v1.29" +githubbranch = "v1.29.3" +docsbranch = "release-1.29" +url = "https://v1-29.docs.kubernetes.io" + [[params.versions]] version = "v1.28" -githubbranch = "v1.28.4" +githubbranch = "v1.28.8" docsbranch = "release-1.28" url = "https://v1-28.docs.kubernetes.io" [[params.versions]] version = "v1.27" -githubbranch = "v1.27.8" +githubbranch = "v1.27.12" docsbranch = "release-1.27" url = "https://v1-27.docs.kubernetes.io" [[params.versions]] version = "v1.26" -githubbranch = "v1.26.11" +githubbranch = "v1.26.15" docsbranch = "release-1.26" url = "https://v1-26.docs.kubernetes.io" -[[params.versions]] -version = "v1.25" -githubbranch = "v1.25.16" -docsbranch = "release-1.25" -url = "https://v1-25.docs.kubernetes.io" - # User interface configuration [params.ui] # Enable to show the side bar menu in its compact state. diff --git a/static/_redirects b/static/_redirects index fbae658eaf365..ffa1dc1a6df42 100644 --- a/static/_redirects +++ b/static/_redirects @@ -79,6 +79,7 @@ /docs/concepts/jobs/run-to-completion-finite-workloads/ /docs/concepts/workloads/controllers/job/ 301 /id/docs/concepts/jobs/run-to-completion-finite-workloads/ /id/docs/concepts/workloads/controllers/job/ 301 /docs/concepts/nodes/node/ /docs/concepts/architecture/nodes/ 301 +/docs/storage-force-detach-on-timeout/ /docs/concepts/architecture/nodes/#storage-force-detach-on-timeout 302 /docs/concepts/services-networking/connect-applications-service/ /docs/tutorials/services/connect-applications-service/ 301 /docs/concepts/object-metadata/annotations/ /docs/concepts/overview/working-with-objects/annotations/ 301 /docs/concepts/overview/ /docs/concepts/overview/what-is-kubernetes/ 301