Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate DRA job configs from a Jinja template #34010

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

bart0sh
Copy link
Contributor

@bart0sh bart0sh commented Dec 19, 2024

  • Implemented job configs generation
  • added make rules to generate and verify generated jobs
  • generated DRA canary jobs

/cc @pohly @kannon92 @SergeyKanzhelev @haircommander

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/config Issues or PRs related to code in /config size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/jobs sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Dec 19, 2024
@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch from 7c3a83c to 2f75bbd Compare December 19, 2024 13:37
@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch 3 times, most recently from d030275 to c5999e8 Compare December 19, 2024 14:13
# on a kind cluster with containerd updated to a version with CDI support.
#
# Compared to ci-kind-dra, this one enables all DRA-related features.
[ci-kind-dra-all]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it so that we have common settings for normal periodics, normal presubmits, and canary presubmits?

There's still going to be a lot of duplication if we have to have three copies of this section and the ones below.

The same applies to the actual .jinja template. The entries in the periodics and presubmits should be built from a single source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. This makes sense. Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Now gen.py generates 3 files: dynamic-resource-allocation-canary.yaml, dynamic-resource-allocation-pull.yaml and dynamic-resource-allocation-ci.yaml from dynamic-resource-allocation.conf and dynamic-resource-allocation.jinja

PTAL.

@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch 3 times, most recently from 3259e4d to 499379c Compare December 20, 2024 12:38
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 20, 2024
@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch from 499379c to 2e1e253 Compare December 20, 2024 15:08
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 20, 2024
Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very promising.

How to solve indention was my biggest concern when thinking about how to use Jinja. I am not sure whether this is addressed here (need to check test results).

config/jobs/kubernetes/sig-node/Makefile Outdated Show resolved Hide resolved
testgrid-tab-name: {{job_name}}
description: {{description}}
testgrid-alert-email: {{testgrid_alert_email}}
fork-per-release: "true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Canaries shouldn't get forked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@bart0sh
Copy link
Contributor Author

bart0sh commented Dec 20, 2024

@pohly @kannon92 @SergeyKanzhelev @haircommander

Looks very promising.

Thank you. After fixing review comments, I'm going to remove -pull and -ci yamls from this PR, so we can only test -canary.
It would be great if SIG-Node folks would look at this and confirm that this approach is at least acceptable.

I personally like it. Using it would allow us to

  • have presubmit job for every periodic
  • keep them synchronized
  • easily generate canary jobs for testing purposes (e.g. kubetest2)
  • make less mistakes as job configs are automatically generated
  • do less typing and copypasting :)
    etc.

WDYT guys?

@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch 2 times, most recently from 4234630 to 28eda1b Compare December 20, 2024 21:33
@bart0sh bart0sh mentioned this pull request Jan 4, 2025
@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch from cf7fa2d to 2ed5a55 Compare January 4, 2025 12:00
@bart0sh
Copy link
Contributor Author

bart0sh commented Jan 4, 2025

@pohly

Can we make this PR complete (= generates everything) and then merge the generated canary jobs in advance via a second PR?

done: #34070

@bart0sh
Copy link
Contributor Author

bart0sh commented Jan 4, 2025

@pohly BTW, current implementation makes canary job changes a bit awkward. We'll have to either complicate template with {%- if kind == "canary" %} blocks or create PRs without changes in the template and the config, as I did in #34070.

Does this make sense to you?

@pohly
Copy link
Contributor

pohly commented Jan 4, 2025

There's a genuine unit test failure:

hack/generate-jobs.py:79:0: W0311: Bad indentation. Found 25 spaces, expected 24 (bad-indentation)

@pohly
Copy link
Contributor

pohly commented Jan 4, 2025

BTW, current implementation makes canary job changes a bit awkward. We'll have to either complicate template with {%- if kind == "canary" %} blocks or create PRs without changes in the template and the config

If someone modifies the template locally and then submits only the updated canary YAML, it is impossible for others to review or replicate how they where generated. It may also be harder to verify that the changes for the canary jobs then get applied as tested to the actual jobs.

I think I prefer the approach with {%- if kind == "canary" %}. Moving experimental changes from the canary jobs to the production jobs then should consists only of removing those if checks and the else branch.

@elieser1101 can be our first guinea pig user of this approach for producing canary jobs which use kubetest2.

@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch from 2ed5a55 to 1da3e45 Compare January 4, 2025 14:54
config/jobs/kubernetes/sig-node/dra-ci.yaml Outdated Show resolved Hide resolved
config/jobs/kubernetes/sig-node/dra-ci.yaml Outdated Show resolved Hide resolved
config/jobs/kubernetes/sig-node/dra-ci.yaml Outdated Show resolved Hide resolved
make WHAT="github.com/onsi/ginkgo/v2/ginkgo k8s.io/kubernetes/test/e2e/e2e.test"
curl -sSL https://kind.sigs.k8s.io/dl/latest/linux-amd64.tgz | tar xvfz - -C "${PATH%%:*}/" kind
kind build node-image --image=dra/node:latest .
trap 'kind export logs "${ARTIFACTS}/kind"; kind delete cluster' EXIT
# Which DRA features exist can change over time.
features=( $(grep '"DRA' pkg/features/kube_features.go | sed 's/.*"\(.*\)"/\1/') )
echo "Enabling DRA feature(s): ${features[*]}."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the debug output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your PR they're enabled only for canary jobs as far as I can see. Should I enable them for all kind jobs instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the "all" jobs, right? Keeping them there and only there is fine. It doesn't matter whether they are canary or normal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, makes sense, thanks!

@@ -0,0 +1,120 @@
{%- if header %}{%- if kind == "ci" %}periodics:{%- else %}presubmits:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this {%- if header %} be dropped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you suggest how to do it? I don't like this trick either, but couldn't find a better way to make periodics: and presubmits: appear in a result content only once, at the beginning of the file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that is the purpose - that wasn't clear to me. Can you perhaps add a comment to the dra.jinja explaining this trick?

I don't have a better alternative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed the variable and added a comment

config/jobs/kubernetes/sig-node/dra.jinja Outdated Show resolved Hide resolved
curl -sSL https://kind.sigs.k8s.io/dl/latest/linux-amd64.tgz | tar xvfz - -C "${PATH%%:*}/" kind
kind build node-image --image=dra/node:latest .
trap 'kind export logs "${ARTIFACTS}/kind"; kind delete cluster' EXIT
{%- if kind == "canary" %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this check "canary"?

This seems to overload the meaning of "canary":

  • For canary PR jobs (= dra-canary.yaml).
  • The "all features enabled" CI and presubmit jobs.

Let's use "canary" for "part of dra-canary.yaml" and something else for feature gates. How about "alpha"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with_all_features was set to true only for two canary jobs in your PR.
However, I agree that a separate option would be more appropriate here. How about all_features?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me.


echo "Verifying generated jobs"
hack/run-in-python-container.sh \
python3 hack/generate-jobs.py config/jobs/kubernetes/sig-node/*.conf --only-verify
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can start with only working on "our" (= SIG Node) jobs here for now. But in a future PR this should get extended to other generated jobs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Fortunately it's just a matter of adding new patterns to the command line. generate-jobs.py already supports this.

@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch from 1da3e45 to 5f4f709 Compare January 7, 2025 09:37
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 7, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bart0sh
Once this PR has been reviewed and has the lgtm label, please assign mpherman2 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch 2 times, most recently from 5e5c437 to 51f4171 Compare January 7, 2025 11:17
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 7, 2025
@bart0sh
Copy link
Contributor Author

bart0sh commented Jan 7, 2025

/hold
until all canary jobs pass for the test PR

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 7, 2025
@bart0sh
Copy link
Contributor Author

bart0sh commented Jan 7, 2025

@pohly Can you look at failing canary jobs and suggest how to fix them? I'm looking as well, but couldn't find a reason yet.

@@ -0,0 +1,129 @@
{# `beginning` variable is set to `true` by the generate-job.py for the first processed job and to `false` for the rest of the jobs #}
{%- if beginning %}{%- if kind == "ci" %}periodics:{%- else %}presubmits:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually like header better: "head" and "tail" is used for the top and bottom of files, but "beginning" could be for anything.

Perhaps set "header" to:

# GENERATED FILE - DO NOT EDIT!
# 
# Instead, modify dra.jinja and run `make generate-jobs`.

and then inject that value here if non-empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion. Done. I defined header in the script as it's generic enough. Previously it was defined for the first job in the template, which is harder to understand in my opinion.

@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch from 51f4171 to 7f1008f Compare January 7, 2025 12:52
@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch 3 times, most recently from 0981b0d to ecd5495 Compare January 7, 2025 13:28
@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch from ecd5495 to 34b1edd Compare January 7, 2025 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/config Issues or PRs related to code in /config area/jobs area/testgrid cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Status: PRs - Needs Reviewer
Development

Successfully merging this pull request may close these issues.

4 participants