Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: KEP-3973: Consider Terminating Pods in replica sets #3974

Closed
wants to merge 2 commits into from

Conversation

kannon92
Copy link
Contributor

@kannon92 kannon92 commented May 1, 2023

  • One-line PR description:
    Users can consider terminating pods in their count of active replicas when controlling rollouts.
  • Other comments:

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 1, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kannon92
Once this PR has been reviewed and has the lgtm label, please assign kow3ns for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels May 1, 2023
@k8s-ci-robot k8s-ci-robot requested review from kow3ns and soltysh May 1, 2023 20:05
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 1, 2023
@kannon92 kannon92 changed the title KEP-3973: Consider Terminating Pods in replica sets WIP: KEP-3973: Consider Terminating Pods in replica sets May 16, 2023
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 16, 2023
should be approved by the remaining approvers and/or the owning SIG (or
SIG Architecture for cross-cutting KEPs).
-->
# KEP-3973: Consider Terminating Pods in Replica Set Active Count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# KEP-3973: Consider Terminating Pods in Replica Set Active Count
# KEP-3973: Consider Terminating Pods in Deployments

this KEP should explore pod management holistically for deployments and transitively for replica sets to correctly account for terminating pods

// as active. If the field is true, then the Deployment controller will include active pods
// to mean running or terminating pods
// +optional
TerminatingAsActive *bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in a KEP earlier I do not think it is useful to have this field on a replica set: #3940 (comment) .

// as active. If the field is true, then the Deployment controller will include active pods
// to mean running or terminating pods
// +optional
TerminatingAsActive *bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update this to an enum to reflect discussions in #3940 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pasting some relevant discussions here for when I come back to this KEP: #3940 (comment)

## Summary

Currently, Deployments start replacement Pods as soon as previously created Pods are marked for terminating. Terminating pods are in a transitory state where they are neither active nor really fully terminated.
This KEP proposes a new field for the Deployment/ReplicaSet controllers that counts terminating
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should describe the expected behaviour at the API level here, rather than an internal implementation that is not certain at this time. At the API level there is no such concept as active replicas.

#### Open Questions on Deployment Controller

The Deployment API is open for discussion. We put the field in Deployment/ReplicaSet because it is related to RolloutStrategy.
It is not clear if `recreate` and/or `rollingupdate` need this API for both rollout options.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a result of #3940 (comment) we can safely say that the feature should be implemented for both of these strategies, so we can put the field into the .spec

The Deployment API is open for discussion. We put the field in Deployment/ReplicaSet because it is related to RolloutStrategy.
It is not clear if `recreate` and/or `rollingupdate` need this API for both rollout options.

Another open question is if we want to include Deployments in the initial release of this feature. There is some discussion about releasing the Job API first and then follow up with Deployment.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover from the older KEP?

- TerminatingAsActive
- ActiveUntilTerminal
- DelayPodRecreationUntilTerminal
- ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

choices for the field and values: #3940 (comment)

}
```

### Implementation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation part will be done in a deployment and not in a replica set and is probably not feasible to write it fully here. So IMO the expected behaviours and replicasets transitiions should be discussed mainly here. And then we can decide if some important parts of the implementation should be highlighted.

@dejanzele
Copy link
Contributor

As discussed with @kannon92, I will be taking over this KEP.

/assign dejanzele

@atiratree
Copy link
Member

@dejanzele I had a similar discussion with @kannon92 some time ago about this as well. And I have the KEP mostly written, just need to finish it..

@dejanzele
Copy link
Contributor

@atiratree aha, ok, I am open to helping with implementation if needed

@atiratree
Copy link
Member

@dejanzele Thanks, I will try to revisit the KEP this week. Let's then see how the KEP and the implementation phase will go.

@atiratree
Copy link
Member

@kannon92 @dejanzele I have opened the new version of the KEP in #4357. Reviews would be appreciated.

@kannon92
Copy link
Contributor Author

/close

@k8s-ci-robot
Copy link
Contributor

@kannon92: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants