Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Supermicro Incus Cluster

3x Supermicro X11DPi-NT

Installation

0. Create the IP addresses in the network repo for following subnets

  • cluster for the server
  • mgmt for the server and the ipmi

1. Update the network flake input

nix flake lock --update-input zentralwerk --commit-lock-file

2. Create an provisioner ISO that starts the a NixOS live system in the 192.168.40.32/27 subnet and flash it on a USB that permanently stays with the server

nix build .#nixosConfigurations.provisioner.config.system.build.isoImage -L

3. Create the NixOS host

In this guide: hyp{1,2,3}

Generate the config files with the UUIDs and hostIDs. This step could be automated by a script that write a file containing these public values.

4. Get the drivers for the network card

  • add the result to modules/baremetal.nix
  • readlink -f /sys/class/net/\*/device/driver | xargs -I{} basename {}
  • TODO: this step could be automated

5. Create sops permissions with dummy age key for the host

Add sections to the .sops.yaml

6. Create the passwords.

HOSTNAME=hyp1 nix run .#supermicro-incus-create-passwords

7. Configure IPMI access

PROVISION_IP=192.168.40.33 IPMI_GATEWAY_IP_ADDRESS=10.0.0.254 IPMI_HOSTNAME=hyp1-ipmi.mgmt.zentralwerk.org HOSTNAME=hyp1 nix run .#supermicro-deploy-ipmi

8. Find SSD /dev/disk/by-id/, erase SSDs, partition disks and install

HOSTNAME=hyp1 nix run .#supermicro-deploy-disko

ssh root@192.168.40.33 reboot

9. Unlock disks

nix run .#supermicro-unlock-hyp1

10. Rotate age keys and enroll passwords with userborn

HOSTNAME=hyp1 nix run .#supermicro-enroll-sops

Incus Cluster Setup

Start incus on one bootstrapping server (hyp1). Create joining token with incus cluster add HOSTNAME for the other cluster members. Add this value to the config option c3d2.supermicro-incus-cluster.clusterToken.

Deploy the new members: nix run .#hyp2-nixos-rebuild switch and nix run .#hyp3-nixos-rebuild switch

Stop the incus daemon on hyp1. Remove the bootstrap server on one of the others: incus cluster remove --force hyp1. Remove the /var/lib/incus/ files on hyp1. Add the server back again incus cluster add HOSTNAME with a join token.

Run nix run .#supermicro-incus-set-core-address to setup the split network (rest api and internal api is split) and restart incus on all nodes.

The ACME certs get renewed daily or when config parameters change. This does no happen when we use the preseed with acme settings. If the ACME certs has not been enrolled yet, change the acme configuration and enroll the cert. incus config set acme.ca_url https://acme-staging-v02.api.letsencrypt.org/directory followed by incus config set acme.ca_url https://acme-v02.api.letsencrypt.org/directory

Run nix run .#supermicro-incus-create-local-storage-pool to create the required local storage pools on all members.

Creating Cluster Tokens

TODO

Incus Cluster Software Defined Storage Integration

TODO

Bootstrapping services that are outside the cluster

  • dns
  • vaultwarden
  • minio (S3 Bucket for Incus and Helm)
  • woodpecker agents
  • ingress proxy
  • gitea
  • auth

TODOs

These services should be either synced between the cluster and the non cluster servers or should be run on both to ensure a seamless cluster recovery process

  • S3 syncing from rook and minio for recovery critical services
  • dns cache now has include for local routes from secondaries
  • Automatically populate passwords from SOPS into vaultwarden. The admin will have the passwords saved in a local browser extension.
  • Replicated VMs for woodpecker agents on all cluster members
  • Replicated VMs for round robin ingress proxy on all cluster members
  • gitea ??
  • auth ??