diff --git a/keps/prod-readiness/sig-node/2862.yaml b/keps/prod-readiness/sig-node/2862.yaml index 144d50ceca6..287cda340a3 100644 --- a/keps/prod-readiness/sig-node/2862.yaml +++ b/keps/prod-readiness/sig-node/2862.yaml @@ -4,3 +4,5 @@ kep-number: 2862 alpha: approver: "@jpbetz" +beta: + approver: "@jpbetz" diff --git a/keps/sig-node/2862-fine-grained-kubelet-authz/README.md b/keps/sig-node/2862-fine-grained-kubelet-authz/README.md index fdb4234b9e0..f23917ff3c7 100644 --- a/keps/sig-node/2862-fine-grained-kubelet-authz/README.md +++ b/keps/sig-node/2862-fine-grained-kubelet-authz/README.md @@ -784,6 +784,11 @@ rollout. Similarly, consider large clusters and how enablement/disablement will rollout across nodes. --> +We have designed a fallback mechanism that prevents from failed rollouts or rollbacks +from impacting an already running workloads ability to interact with the kubelet API. + +Please see the [Design Details](#design-details) section for more information. + ###### What specific metrics should inform a rollback? +Increase in failed requests to kubelet API from workloads. + ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? +We have tested the following upgrade scenarios manually: + +|Scenario| Result | +| -------|--------| +| Upgrade both kubelet and kube-apiserver so that feature gate is enabled in both. | workloads and kube-apiserver are able to reach kubelet| +| Upgrade only kubelet to enable the feature-gate | workloads and kube-apiserver are able to reach kubelet | +| Updrade only kube-apiserver to enable the feature-gate | workloads and kube-apiserver are able to reach kubelet | + +We have tested the following rollback scenarios manually: + +|Scenario| Result | +| -------|--------| +| Rollback both kubelet and kube-apiserver so that feature gate is disabled in both. | workloads and kube-apiserver are able to reach kubelet| +| Rollback only kubelet to disable the feature-gate | workloads and kube-apiserver are able to reach kubelet | +| Rollback only kube-apiserver to disable the feature-gate | workloads and kube-apiserver are able to reach kubelet | + ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? +No. ### Monitoring Requirements @@ -822,6 +846,28 @@ checking if there are objects with field X set) may be a last resort. Avoid logs or events for this purpose. --> +Users can check if this feature is enabled in kube-apiserver by running the +following command: + +```sh +kubectl get --raw /metrics | grep kubernetes_feature_enabled | grep KubeletFineGrainedAuthz +``` + +Users can check if this feature is nabled in the kubelet by running the +following command in a pod that is running on the node: + +If readonly port is enabled: +```sh +curl http://:10255/metrics | grep kubernetes_feature_enabled | grep KubeletFineGrainedAuthz +``` + +If readonly port is not enabled: +```sh +curl -k https://$MY_NODE_IP:10250/metrics | grep kubernetes_feature_enabled | grep KubeletFineGrainedAuthz +``` + +NOTE: for port 10250 the pod will need to have the right RBAC bindings (if RBAC is enabled) to view the metrics. + ###### How can someone using this feature know that it is working for their instance? +Same SLOs as the kubelet API currently offers. + ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? +No. + ### Dependencies +This feature only comes into play if kubelet authotization mode is set to Webhook. + ### Scalability +If requests to kubelet API start failing due to authorization issues users can +disabled the feature-gate. + +Users can check the kubernetes Audit logs for SubjectAccessReview requests +created by `system:nodes:*` and check the reason they failed. + ###### What steps should be taken if SLOs are not being met to determine the problem? +1. Check that the feature gate is enabled in kube-apiserver and kubelet. +2. Check that the workload has the right permissions. Requesets are expected to +fail if you are using fine-grained subresources but the feature gate is not enabled +in kubelet. +3. Check the audit logs for SubjectAccessReview requests created by `system:nodes:*` +and check the reason these requests failed. +4. Check kubelet logs. + ## Implementation History +2024-09-28: [KEP-2862](https://github.com/kubernetes/enhancements/pull/4760) merged as implementable and PRR approved for ALPHA. +2024-10-17: Alpha Code implementation [PR](https://github.com/kubernetes/kubernetes/pull/126347) merged. +2024-10-22: Alpha Documentation [PR](https://github.com/kubernetes/website/pull/48412) merged. + ## Drawbacks