infra

The vision `One place to rule them all`

This repository contains several subfolders - each for its own tool:

.github: Contains Github actions, which call terraform and ansible with their respective folders.
terraform: Contains terraform code for Cluster-, DNS-, Storage-, Network-provisioning & initial configuration (cloud-init).
ansible: Contains additional configuration management which terraform is not intended for.
kubernetes: Contains kubernetes manifests like helm charts and kustomizations.

The tools

Terraform

The Terraform code should be formatted, and tested with git-hooks locally. The Github Action is the actual planner and executor. The state is stored in Terraform cloud. The secrets are stored in github (actions), with prefix TF_VAR_*.

Ansible

Ansible playbook can be run from your local machine (ssh auth). But they are run from a Github Action anyway / as well. The Ansible inventory is created by Terraform during the Terraform apply step.

Kubernetes

The Kubernetes distro of choice is k3s. It's installed with Ansible. The actual Kubernetes manifests are deployed/synced with FluxCD.

FluxCD

The FluxCD bootstrap is executed each time the ansible playbook is run.

FluxCD just "skips" the bootstrap if it was already bootstrapped before.

Apps

Logs

Metrics

Tracing

Storage

Backup

Certificates

VPN / Auth

The flow is like this:

Make changes
Push to main branch
Github action is triggered and triggers further modules (terraform, ansible or k8s/none)
Check everything works as intended
Merge into prod branch
Github action is triggered and executes the same module as before

Github actions

They require some setup on Github side. For once, two environments should be created: prod and dev (main). The latter provides the ability to test stuff before putting it live in the prod environment.

Many secrets are required in the process. Most are universal, but some are distinct per environment.

#- KUBENODE_SSH_PRIVATE_KEY

GITHUB_TOKEN # Automatically set by Github Actions
TAILSCALE_AUTH_TOKEN
TERRAFORM_TOKEN
TF_VAR_CLOUDFLARE_APITOKEN
TF_VAR_HCLOUD_TOKEN
TF_VAR_ROOT_DOMAIN
TRANSCRYPT_PASSWORD

Kubernetes

There are two options on how to deploy changes:

Run kubectl/helm commands from localhost or github action. This is harder to setup initially, but easier to debug. Automatic deletion is unsupported this way.
Install flux (or different gitops tool) on cluster and let it reconcile automatically. Automatic deletion is supported - but only for k8s-internal resources.
k3s supports automatic installation of manifests out of the box.

Option 3 is what this project leverages.

Retrieve the kubeconfig

scp -i ~/.ssh/automation.key [email protected]:/etc/kubernetes/admin.conf ./kube-config

Use kubectl --kubeconfig ./kube-config * from now on.

Secrets

Secrets are encrypted with transcrypt using aes-256-cbc as cipher. New repositories can be initialized with ./transcrypt. Cloned repositories can be initialized with ./transcrypt -c aes-256-cbc -p 'password'.

Storage

Several storage providers were tested and were not fitting;

Rook/Ceph had an internal certificate expiration that was undocumented at the time of writing
Longhorn relies on UI-interaction for multi-disk configurations

Therefore, bare-metal ZFS is deployed where needed (since so far everythings operates on single-node base). ZFS is deployed and configured via ansible. Integration with Kubernetes happens via the default kubernetes hostPath provider.

TODO

secrets,sops,...
monitoring, logging (grafana, prometheus, openmetrics, fluentd, jaeger-tracing/grafana-tempo)
take a proper look at rook config (erasure coding, ...)
https://kubernetes.io/docs/tasks/administer-cluster/dns-horizontal-autoscaling/#enablng-dns-horizontal-autoscaling
RBAC / access mgmt / proper kubeconfigs
tailscale vpn between blackhole and pegasus (non-k8s)
k8s on blackhole
- cilium
- ingress-nginx / traefik
https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/cpu-default-namespace/
https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/memory-default-namespace/

Maintenance

Github Action
- Update runner version
- Update github action versions in .github/workflows/apply.yaml
- Update cli versions within apply.yaml:
  - Terraform
- Update github action versions in .github/workflows/destroy.yaml
Ansible modules
- k3s
- tailscale
- fluxcd cli
System -> manual apt-update && apt-upgrade -y
Kubernetes
- infra
  - flux
  - ingress-nginx
  - cert-manager
- apps
  - link-shortener caddy version
  - umami, umami-mariadb -> currently latest, but never pulled again
  - vaultwarden
Terraform
- Provider
Secret rotation TBD

Why `flux` isn't there yet:

I tried it and it had several hickups that in summary rendered it unusable.

two different types of kustomization.yamls - one the plain kubernetes one, one is flux specific. Feels bad and it's a hassle to mix them.
- also, why would you need ("required field") to set a custom reconcilation interval for subparts of your application, when you already have a global one?
many helm charts bring CRDs with them. And in most of those cases, you want to instantiate them as well (ClusterIssuer for cert-manager, CephCluster for rook, ...) But flux/kustomization does not support this use case: Before the CRDs are added, the framework tries to instantiate the CRs, which is not possible by that time and the whole deployment fails. Next retry it is the same, so it's not even that CRDs are deployed in the first try and CRs in the second... it just fails repeatedly. You don't have that if you first apply the CRDs (push them to the repo), and the resources in the next step, but that is not an idempotent workflow.
setting values for helm charts is all well and good, but if you change them, they often are not deployed. My guess on the root cause is that the configurations are deployed as configmaps, and updates of their contents don't restart its consumers. That's why flux/kustomize introduced configmapGenerators... But they are obviously not used in helm charts.

Todos from Samba logs:

weak crypto allowed
unix password synx is set, but no valid passwd program parameter

Name		Name	Last commit message	Last commit date
Latest commit History 567 Commits
.github/workflows		.github/workflows
ansible		ansible
kubernetes		kubernetes
packer		packer
talos		talos
terraform		terraform
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.sops.yaml		.sops.yaml
LICENSE		LICENSE
README.md		README.md
manual-run.sh		manual-run.sh
transcrypt		transcrypt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

infra

The vision `One place to rule them all`

The tools

Terraform

Ansible

Kubernetes

FluxCD

Apps

Logs

Metrics

Tracing

Storage

Backup

Certificates

VPN / Auth

Github actions

Kubernetes

Retrieve the kubeconfig

Secrets

Storage

TODO

Maintenance

Why `flux` isn't there yet:

About

Releases

Packages

Contributors 4

Languages

License

thetillhoff/infra

Folders and files

Latest commit

History

Repository files navigation

infra

The vision One place to rule them all

The tools

Terraform

Ansible

Kubernetes

FluxCD

Apps

Logs

Metrics

Tracing

Storage

Backup

Certificates

VPN / Auth

Github actions

Kubernetes

Retrieve the kubeconfig

Secrets

Storage

TODO

Maintenance

Why flux isn't there yet:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

The vision `One place to rule them all`

Why `flux` isn't there yet:

Packages