Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace sample inventory by auto-generated variable references from defaults #11897

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

VannTen
Copy link
Contributor

@VannTen VannTen commented Jan 16, 2025

What type of PR is this?
/kind cleanup

What this PR does / why we need it:

  • Remove sample inventory
  • Generate variables reference from all defaults files

The two mains reasons to do this are:

Which issue(s) this PR fixes:
Fixes #11876

Special notes for your reviewer:

/label ci-extended
@MrFreezeex
@ant31

Does this PR introduce a user-facing change?:

The sample inventory is removed

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. ci-extended Run additional tests labels Jan 16, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: VannTen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 16, 2025
@VannTen
Copy link
Contributor Author

VannTen commented Jan 16, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Jan 16, 2025
@VannTen VannTen force-pushed the cleanup/remove_sample_inventory branch 5 times, most recently from c3aebda to 09c9998 Compare January 17, 2025 08:54
This is an alternative to the sample inventory, which does not need
manual updating.
Sometimes the change done by pre-commit are not obvious, this should
help.
@VannTen VannTen force-pushed the cleanup/remove_sample_inventory branch from 09c9998 to 008c8c9 Compare January 17, 2025 09:54
@VannTen VannTen marked this pull request as ready for review January 17, 2025 09:58
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 17, 2025
@ant31
Copy link
Contributor

ant31 commented Jan 17, 2025

It makes maintenance easier (no need to sync variables back to the samples) but not for users.
From 'nicely' separated variables by domain (auth / network / gpu...) we get a single ~7000 lines file reference.
It removes the quick "cp -r samples", tweak two vars and ready to go.

The sample was here to show-up the variables that users most likely want to edit (e.g cluster version, oidc...) and not all of them for an easier on boarding / maintenance.

About #10645

Even if some users are smart and store in a custom git repository files only from the inventory/mycluster directory, there are still a lot of files!

How is it an issue to commit ~20 files ? Why is it better to commit 2? The workflow stays the same (commit/push/pull etc...)
Also I would rather have 20 files with 20 vars in them well organized, than one with 400.

@ant31
Copy link
Contributor

ant31 commented Jan 17, 2025

To balance both side (show users subset of things to 'edit' and easier maintenance) something like:

generate-inventory myclusterA --cloud gcp --network cilium --with-gpu --with-oidc --container=containerd

and would create something like:

.myclusterA/group_vars/k8s_cluster/k8s-cluster.yml
.myclusterA/group_vars/k8s_cluster/addons.yml
.myclusterA/group_vars/k8s_cluster/k8s-net-cilium.yml
.myclusterA/group_vars/etcd.yml
.myclusterA/group_vars/all
.myclusterA/group_vars/all/gcp.yml
.myclusterA/group_vars/all/containerd.yml
.myclusterA/group_vars/all/etcd.yml
.myclusterA/group_vars/all/gpu.yml
.myclusterA/group_vars/all/all.yml
.myclusterA/inventory.ini

With all lines 'commented' to show that those are already set in default and can be removed if needed

  1. Later it can help also existing users during upgrades as we have would have more control on var inside inventories:
upgrade-inventory myclusterA 

@VannTen
Copy link
Contributor Author

VannTen commented Jan 17, 2025

It removes the quick "cp -r samples", tweak two vars and ready to go.

Yes, that's the point, you still should only need to add two vars, for all the rest, kubespray defaults should handle it.
Instead of editing the file you're just creating it.

The sample was here to show-up the variables that users most likely want to edit (e.g cluster version, oidc...) and not all of them for an easier on boarding / maintenance.

There is lots of stuff in the inventory/sample, and the majority is setup specific (cloud provider / container engine / network plugin). So while it certainly started that way, IMO is does not reflect the current state.

$ find inventory/sample/ -type f -exec wc -l {} \;
139 inventory/sample/group_vars/all/all.yml
9 inventory/sample/group_vars/all/aws.yml
40 inventory/sample/group_vars/all/azure.yml
59 inventory/sample/group_vars/all/containerd.yml
2 inventory/sample/group_vars/all/coreos.yml
9 inventory/sample/group_vars/all/cri-o.yml
59 inventory/sample/group_vars/all/docker.yml
16 inventory/sample/group_vars/all/etcd.yml
10 inventory/sample/group_vars/all/gcp.yml
22 inventory/sample/group_vars/all/hcloud.yml
17 inventory/sample/group_vars/all/huaweicloud.yml
55 inventory/sample/group_vars/all/oci.yml
116 inventory/sample/group_vars/all/offline.yml
69 inventory/sample/group_vars/all/openstack.yml
24 inventory/sample/group_vars/all/upcloud.yml
32 inventory/sample/group_vars/all/vsphere.yml
38 inventory/sample/group_vars/etcd.yml
283 inventory/sample/group_vars/k8s_cluster/addons.yml
386 inventory/sample/group_vars/k8s_cluster/k8s-cluster.yml
132 inventory/sample/group_vars/k8s_cluster/k8s-net-calico.yml
380 inventory/sample/group_vars/k8s_cluster/k8s-net-cilium.yml
51 inventory/sample/group_vars/k8s_cluster/k8s-net-custom-cni.yml
18 inventory/sample/group_vars/k8s_cluster/k8s-net-flannel.yml
63 inventory/sample/group_vars/k8s_cluster/k8s-net-kube-ovn.yml
73 inventory/sample/group_vars/k8s_cluster/k8s-net-kube-router.yml
6 inventory/sample/group_vars/k8s_cluster/k8s-net-macvlan.yml
64 inventory/sample/group_vars/k8s_cluster/k8s-net-weave.yml
20 inventory/sample/inventory.ini

About #10645

Even if some users are smart and store in a custom git repository files only from the inventory/mycluster directory, there are still a lot of files!

How is it an issue to commit ~20 files ? Why is it better to commit 2? The workflow stays the same (commit/push/pull etc...) Also I would rather have 20 files with 20 vars in them well organized, than one with 400.

The issue is not having 20 files, it's having more than you need and not knowing if you need it.

If you copy the sample inventory, unless you're recording everything in your commit message (which most people don't, and most people don't look at git history either), 6 months after, you won't remember if the variable you have in your inventory is there because you put it deliberately, or because it was in the sample inventory.
(I'm speaking of experience here. When I first started working on the clusters I'm currently managing, the sample inventory was copied for each cluster, and it was a little nightmare to figure out what was needed and what wasn't

generate-inventory
update-inventory

I think this would be very hard to maintain effectively, especially update-inventory (unless we only support unmodified inventory, but franky that would make it a bit useless). How would we ensure we distinguishes between just a copied default and an intentional choices ?

@ant31
Copy link
Contributor

ant31 commented Jan 17, 2025

There is lots of stuff in the inventory/sample, and the majority is setup specific (cloud provider / container engine / network plugin). So while it certainly started that way, IMO is does not reflect the current state.

Then I would rather go in that direction.

The issue is not having 20 files, it's having more than you need and not knowing if you need it.

Then it goes in the wrong direction, as a new user you would have no idea what you need what you don't need because we won't cherry pick few of them anymore (can argue there are too many already). They would just have a list of the 7000 knobs they can tweak.

If you copy the sample inventory, unless you're recording everything in your commit message (which most people don't, and most people don't look at git history either), 6 months after, you won't remember if the variable you have in your inventory is there because you put it deliberately, or because it was in the sample inventory.

Any non-commented value is intentional.

I think this would be very hard to maintain effectively, especially update-inventory (unless we only support unmodified inventory, but franky that would make it a bit useless). How would we ensure we distinguishes between just a copied default and an intentional choices ?

If it's set it's intentional.

Another side effect is that changing a default in a role could break clusters, by copying a subset of default in the inventory, that the user "should" have reviewed it freeze the user configuration to a certain state.

Let's keep providing a curated sample inventory (generated or not), I agree the current one is not optimal and too large, it's still better than none.

@@ -37,5 +37,6 @@ exclude_paths:
- tests/files/custom_cni/cilium.yaml
- venv
- .github
- docs/ansible/variable_reference.yml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep grouping the variables by domain in multiple files.
then let's have people the cherry pick/copy past what they need:

cp docs/ansible/variables/cloud/gcp.yml inventory/mycluster/group_vars/
cp docs/ansible/variables/network/calico.yml inventory/mycluster/group_vars/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like, one file per role defaults ? That's mostly redundant with the defaults file themselves, no ?
Especially if we recreate the hierarchy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's already redundant with roles.

not all per role but some hierarchy especially when configuration are exclusive:

  • calico vs weave vs cilium...
  • crio vs docker vs ....
  • gcp vs azure ...

and then maybe some hierarchy but not down to the role:

  • kubernetes/vars.yml (all cluster configs)
  • apps/vars.yaml (all apps)

but can comes later too. need a bit more though on it.

@@ -12,16 +12,16 @@ Heketi provides a CLI that provides users with a means to administer the deploym

## Install

Copy the inventory.yml.sample over to inventory/sample/k8s_heketi_inventory.yml and change it according to your setup.
Copy the inventory.yml.sample over to `<inventory_path>/k8s_heketi_inventory.yml` and change it according to your setup.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/heketi/heketi is abandoned let's drop it in another PR.

@VannTen
Copy link
Contributor Author

VannTen commented Jan 17, 2025 via email

@VannTen
Copy link
Contributor Author

VannTen commented Jan 17, 2025

I've linked this PR in the slack channel in the hopes we can have some users join the discussion, so we can have more info on how the sample inventory is used.

@Nathanael-Mtd
Copy link

I understand that sample is complex and a pain to maintain, I'm ok with that removal for that reason.
But on my side, when I started Kubernetes 4 years ago, it helped me SO MUCH to understand Kubernetes settings and features and was a great resource to start deploying a cluster.

Also I use it to upgrade my clusters by comparing my inventory folder with the sample one.

I don't know how new users will start using Kubespray after that removal, and I'm affraid they will probably find another solutions to deploy clusters if they don't have a tool to generate inventory files with important default values.

@vdveldet
Copy link

Jumped away from the samples inventory a long time ago and retain only the basics in 1 file.
I always use the container version as runtime (so no issues with versions and dependency as such) and load the inventory-dir in the container.

My inventory looks like this :

.
├── group_vars
│   └── all
│       └── configuration.yml
├── host_vars
│   ├── kubnode01
│   │   └── os.yml
│   ├── kubnode02
│   │   └── os.yml
│   ├── kubnode03
│   │   └── os.yml
│   ├── kubnode04
│   │   └── os.yml
│   └── kubnode05
│       └── os.yml
└── hosts.yml

I'm parsing the default values to get most of the information for the variable names and meaning
Basically find . -type d -name defaults will give that info and from there find want I need and trust the defaults.

When I explain the inventory to a colleague, they always came back with the question. "Why is there a separation in k8s_cluster and all in the group_vars" so we removed everything not needed that created this confusion, resulting in configuration.yml

@VannTen
Copy link
Contributor Author

VannTen commented Jan 20, 2025 via email

@Nathanael-Mtd
Copy link

Could you explicit ? Did you use it as documentation, reference, copying the inventory ?

I used as reference and started new cluster by copying and editing the sample folder to a new one to custom my cluster.

Like, something like git diff <tag> <old_tag> -- inventory/sample ? It was my impression that given we miss updating it pretty regularly.

I use a file comparaison tool (an VSCode extension) with sample folder as source and my inventory folder as destination.

The sample should not be 'default values' => that's what role defaults are for. And ideally (and it's seems to work given #11898) there is not much to define unless you need to customize (at which point you need to edit your inventory anyway).

Yeah sure, but I used it to get some "default keys/values" with comments, without making search on the whole codebase to find keys I need to change (I ended up to done that for specific things).

Overall I'm ok with the sample folder removal, because this PR adds generated variables reference file, it will be more reliable.
On my futures cluster upgrades, I will probably use that file as a source for maintaining custom cluster configs, and it will be so better than sample folder comparaison.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. ci-extended Run additional tests cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

not updated sample in inventory for change {kube,system}_master_{cpu,memory,ephemeral-storage,pid}
5 participants