Update k3s Certificates

This happens every year around May 6th. The connection to the k3s service goes down and Kubectl command won't work. Here is how to fix it (do this within 90 days of certification expiry):

NOTE: Make sure that Openstack authentication environment is enabled. Otherwise, "openstack" commands will not work. This is only used if you want to remove dangling pods at the end of the procedure.

Check for CA expiration (controller). No CA renewal needed -> option A. CA renewal needed -> option B

Does any CA expire within 365 days? This check can be copied and pasted to the command line.

sudo bash -c 'cat << "EOF" | bash
echo "  K3s CA-expiry check (warning window: 365 days)"
warn_days=365
rotate_ca_needed=0

for crt in /var/lib/rancher/k3s/server/tls/*-ca.crt; do
  end=$(openssl x509 -enddate -noout -in "$crt" | cut -d= -f2)
  left=$(( ( $(date -d "$end" +%s) - $(date +%s) ) / 86400 ))
  printf "  %-26s  expires %-25s  (%s days left)\n" "$(basename "$crt")" "$end" "$left"
  [[ $left -lt $warn_days ]] && rotate_ca_needed=1
done

if [[ $rotate_ca_needed -eq 0 ]]; then
  echo -e "\n  CA > $warn_days d: use **Option A** (leaf-only restart)"
else
  echo -e "\n  CA < $warn_days d: use **Option B** (full CA rotation)"
fi
EOF'

Option A: Leaf-only renewal (most years)

Check if the certificates are one year old, also check the status of k3s service if it contains logs like: "x509: certificate has expired or is not yet valid". Check correct function of k3s.

systemctl status k3s
k3s certificate check --output table
kubectl get nodes
kubectl top nodes
  1. k3s restart (controller) K3s sees the certs < 90 days from expiry, issues new leaf certs at startup.

sudo systemctl restart k3s
watch -n3 kubectl get nodes         # wait for Ready
  1. k3s agent restart (all worker nodes)

sudo systemctl restart k3s-agent
  1. Verify (controller)

k3s certificate check --output table
kubectl get nodes
kubectl top nodes
  1. Copy fresh kubeconfig (controller) Check if k3s.yaml is different than kubeconfig.yml

sudo cmp /etc/rancher/k3s/k3s.yaml /etc/kolla/zun-compute-k8s/kubeconfig.yml

If different, then copy k3s.yaml and do service restarts

sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config 
sudo cp /etc/rancher/k3s/k3s.yaml /etc/kolla/zun-compute-k8s/kubeconfig.yml
sudo cp /etc/rancher/k3s/k3s.yaml /etc/kolla/blazar-manager/kubeconfig.yml
sudo cp /etc/rancher/k3s/k3s.yaml /etc/kolla/blazar-api/kubeconfig.yml
sudo cp /etc/rancher/k3s/k3s.yaml /etc/kolla/doni-worker/kubeconfig.yml
docker restart zun_compute_k8s blazar_manager blazar_api doni_worker
  1. if there are dangling zun containers left on the workers delete them (controller)

kubectl get pods --all-namespaces -o wide | grep "zun"

Copy and paste this on the command line.

sudo bash -s <<'CLEAN'
# -------- Delete ONLY orphan Zun deployments/pods --------
echo "Scanning for Zun pods stuck in Pending …"

kubectl get pods -A --field-selector=status.phase=Pending \
  -o jsonpath='{range .items[*]}{.metadata.namespace}{"|"}{.metadata.name}{"\n"}{end}' \
  | grep '|zun-' \
  | while IFS='|' read -r NS POD; do
        UUID=$(echo "$POD" | sed -E 's/^zun-([0-9a-f-]{36}).*/\1/')
        if openstack container show "$UUID" >/dev/null 2>&1; then
            echo "Keeping $UUID – still tracked by Zun"
        else
            echo "Deleting orphan $UUID  ($NS/$POD)"
            openstack container delete "$UUID" >/dev/null 2>&1 || true
            kubectl -n "$NS" delete deployment "zun-$UUID" --ignore-not-found
        fi
  done

# Final check
kubectl get pods -A --field-selector=status.phase=Pending | grep zun \
  || echo "No Pending Zun pods remain"
CLEAN

Option B: Full CA rotation (rare)

Check if the certificates are one year old, also check the status of k3s service if it contains logs like: "x509: certificate has expired or is not yet valid". Check correct function of k3s.

systemctl status k3s
k3s certificate check --output table
kubectl get nodes
kubectl top nodes
  1. Prepare new CA (controller)

sudo mkdir -p /opt/new-ca
sudo k3s certificate rotate-ca --generate --path /opt/new-ca
  1. k3s secret deletion (controller)

kubectl delete secret -n kube-system k3s-serving --ignore-not-found
sudo rm -f /var/lib/rancher/k3s/server/tls/dynamic-cert.json
  1. k3s restart (controller)

sudo systemctl restart k3s
watch -n3 kubectl get nodes         # wait for Ready
  1. k3s agent restart (all worker nodes)

sudo systemctl restart k3s-agent
  1. Verify (controller)

k3s certificate check --output table
kubectl get nodes
kubectl top nodes
  1. Copy fresh kubeconfig (controller)

sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config 
sudo cp /etc/rancher/k3s/k3s.yaml /etc/kolla/zun-compute-k8s/kubeconfig.yml
sudo cp /etc/rancher/k3s/k3s.yaml /etc/kolla/blazar-manager/kubeconfig.yml
sudo cp /etc/rancher/k3s/k3s.yaml /etc/kolla/blazar-api/kubeconfig.yml
sudo cp /etc/rancher/k3s/k3s.yaml /etc/kolla/doni-worker/kubeconfig.yml
docker restart zun_compute_k8s blazar_manager blazar_api doni_worker
  1. if there are dangling zun containers left on the workers delete them (controller)

kubectl get pods --all-namespaces -o wide | grep "zun"

Copy and paste this on the command line.

sudo bash -s <<'CLEAN'
# -------- Delete ONLY orphan Zun deployments/pods --------
echo "Scanning for Zun pods stuck in Pending …"

kubectl get pods -A --field-selector=status.phase=Pending \
  -o jsonpath='{range .items[*]}{.metadata.namespace}{"|"}{.metadata.name}{"\n"}{end}' \
  | grep '|zun-' \
  | while IFS='|' read -r NS POD; do
        UUID=$(echo "$POD" | sed -E 's/^zun-([0-9a-f-]{36}).*/\1/')
        if openstack container show "$UUID" >/dev/null 2>&1; then
            echo "Keeping $UUID – still tracked by Zun"
        else
            echo "Deleting orphan $UUID  ($NS/$POD)"
            openstack container delete "$UUID" >/dev/null 2>&1 || true
            kubectl -n "$NS" delete deployment "zun-$UUID" --ignore-not-found
        fi
  done

# Final check
kubectl get pods -A --field-selector=status.phase=Pending | grep zun \
  || echo "No Pending Zun pods remain"
CLEAN

Last updated