Supermicro Incus Cluster
Installation
0. Create the IP addresses in the network repo for following subnets
- cluster for the server
- mgmt for the server and the ipmi
1. Update the network flake input
nix flake lock --update-input zentralwerk --commit-lock-file
2. Create an provisioner ISO that starts the a NixOS live system in the 192.168.40.32/27 subnet and flash it on a USB that permanently stays with the server
nix build .#nixosConfigurations.provisioner.config.system.build.isoImage -L
3. Create the NixOS host
In this guide: hyp{1,2,3}
Generate the config files with the UUIDs and hostIDs. This step could be automated by a script that write a file containing these public values.
4. Get the drivers for the network card
- add the result to modules/baremetal.nix
readlink -f /sys/class/net/\*/device/driver | xargs -I{} basename {}- TODO: this step could be automated
5. Create sops permissions with dummy age key for the host
Add sections to the .sops.yaml
6. Create the passwords.
HOSTNAME=hyp1 nix run .#supermicro-incus-create-passwords
7. Configure IPMI access
PROVISION_IP=192.168.40.33 IPMI_GATEWAY_IP_ADDRESS=10.0.0.254 IPMI_HOSTNAME=hyp1-ipmi.mgmt.zentralwerk.org HOSTNAME=hyp1 nix run .#supermicro-deploy-ipmi
8. Find SSD /dev/disk/by-id/, erase SSDs, partition disks and install
HOSTNAME=hyp1 nix run .#supermicro-deploy-disko
ssh root@192.168.40.33 reboot
9. Unlock disks
nix run .#supermicro-unlock-hyp1
10. Rotate age keys and enroll passwords with userborn
HOSTNAME=hyp1 nix run .#supermicro-enroll-sops
Incus Cluster Setup
Start incus on one bootstrapping server (hyp1).
Create joining token with incus cluster add HOSTNAME for the other cluster members.
Add this value to the config option c3d2.supermicro-incus-cluster.clusterToken.
Deploy the new members: nix run .#hyp2-nixos-rebuild switch and nix run .#hyp3-nixos-rebuild switch
Stop the incus daemon on hyp1.
Remove the bootstrap server on one of the others: incus cluster remove --force hyp1.
Remove the /var/lib/incus/ files on hyp1.
Add the server back again incus cluster add HOSTNAME with a join token.
Run nix run .#supermicro-incus-set-core-address to setup the split network (rest api and internal api is split) and restart incus on all nodes.
The ACME certs get renewed daily or when config parameters change.
This does no happen when we use the preseed with acme settings.
If the ACME certs has not been enrolled yet, change the acme configuration and enroll the cert.
incus config set acme.ca_url https://acme-staging-v02.api.letsencrypt.org/directory followed by incus config set acme.ca_url https://acme-v02.api.letsencrypt.org/directory
Run nix run .#supermicro-incus-create-local-storage-pool to create the required local storage pools on all members.
Creating Cluster Tokens
TODO
Incus Cluster Software Defined Storage Integration
TODO
Bootstrapping services that are outside the cluster
- dns
- vaultwarden
- minio (S3 Bucket for Incus and Helm)
- woodpecker agents
- ingress proxy
- gitea
- auth
TODOs
These services should be either synced between the cluster and the non cluster servers or should be run on both to ensure a seamless cluster recovery process
- S3 syncing from rook and minio for recovery critical services
- dns cache now has include for local routes from secondaries
- Automatically populate passwords from SOPS into vaultwarden. The admin will have the passwords saved in a local browser extension.
- Replicated VMs for woodpecker agents on all cluster members
- Replicated VMs for round robin ingress proxy on all cluster members
- gitea ??
- auth ??