Update k3s Certificates
This happens every year around May 6th. The connection to the k3s service goes down and Kubectl command won't work. Here is how to fix it (do this within 90 days of certification expiry):
NOTE: Make sure that Openstack authentication environment is enabled. Otherwise, "openstack" commands will not work. This is only used if you want to remove dangling pods at the end of the procedure.
Check for CA expiration (controller). No CA renewal needed -> option A. CA renewal needed -> option B
Does any CA expire within 365 days? This check can be copied and pasted to the command line.
sudo bash -c 'cat << "EOF" | bash
echo " K3s CA-expiry check (warning window: 365 days)"
warn_days=365
rotate_ca_needed=0
for crt in /var/lib/rancher/k3s/server/tls/*-ca.crt; do
end=$(openssl x509 -enddate -noout -in "$crt" | cut -d= -f2)
left=$(( ( $(date -d "$end" +%s) - $(date +%s) ) / 86400 ))
printf " %-26s expires %-25s (%s days left)\n" "$(basename "$crt")" "$end" "$left"
[[ $left -lt $warn_days ]] && rotate_ca_needed=1
done
if [[ $rotate_ca_needed -eq 0 ]]; then
echo -e "\n CA > $warn_days d: use **Option A** (leaf-only restart)"
else
echo -e "\n CA < $warn_days d: use **Option B** (full CA rotation)"
fi
EOF'
Option A: Leaf-only renewal (most years)
Check if the certificates are one year old, also check the status of k3s service if it contains logs like: "x509: certificate has expired or is not yet valid". Check correct function of k3s.
systemctl status k3s
k3s certificate check --output table
kubectl get nodes
kubectl top nodes
k3s restart (controller) K3s sees the certs < 90 days from expiry, issues new leaf certs at startup.
sudo systemctl restart k3s
watch -n3 kubectl get nodes # wait for Ready
k3s agent restart (all worker nodes)
sudo systemctl restart k3s-agent
Verify (controller)
k3s certificate check --output table
kubectl get nodes
kubectl top nodes
Copy fresh kubeconfig (controller) Check if k3s.yaml is different than kubeconfig.yml
sudo cmp /etc/rancher/k3s/k3s.yaml /etc/kolla/zun-compute-k8s/kubeconfig.yml
If different, then copy k3s.yaml and do service restarts
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo cp /etc/rancher/k3s/k3s.yaml /etc/kolla/zun-compute-k8s/kubeconfig.yml
sudo cp /etc/rancher/k3s/k3s.yaml /etc/kolla/blazar-manager/kubeconfig.yml
sudo cp /etc/rancher/k3s/k3s.yaml /etc/kolla/blazar-api/kubeconfig.yml
sudo cp /etc/rancher/k3s/k3s.yaml /etc/kolla/doni-worker/kubeconfig.yml
docker restart zun_compute_k8s blazar_manager blazar_api doni_worker
if there are dangling zun containers left on the workers delete them (controller)
kubectl get pods --all-namespaces -o wide | grep "zun"
Copy and paste this on the command line.
sudo bash -s <<'CLEAN'
# -------- Delete ONLY orphan Zun deployments/pods --------
echo "Scanning for Zun pods stuck in Pending …"
kubectl get pods -A --field-selector=status.phase=Pending \
-o jsonpath='{range .items[*]}{.metadata.namespace}{"|"}{.metadata.name}{"\n"}{end}' \
| grep '|zun-' \
| while IFS='|' read -r NS POD; do
UUID=$(echo "$POD" | sed -E 's/^zun-([0-9a-f-]{36}).*/\1/')
if openstack container show "$UUID" >/dev/null 2>&1; then
echo "Keeping $UUID – still tracked by Zun"
else
echo "Deleting orphan $UUID ($NS/$POD)"
openstack container delete "$UUID" >/dev/null 2>&1 || true
kubectl -n "$NS" delete deployment "zun-$UUID" --ignore-not-found
fi
done
# Final check
kubectl get pods -A --field-selector=status.phase=Pending | grep zun \
|| echo "No Pending Zun pods remain"
CLEAN
Option B: Full CA rotation (rare)
Check if the certificates are one year old, also check the status of k3s service if it contains logs like: "x509: certificate has expired or is not yet valid". Check correct function of k3s.
systemctl status k3s
k3s certificate check --output table
kubectl get nodes
kubectl top nodes
Prepare new CA (controller)
sudo mkdir -p /opt/new-ca
sudo k3s certificate rotate-ca --generate --path /opt/new-ca
k3s secret deletion (controller)
kubectl delete secret -n kube-system k3s-serving --ignore-not-found
sudo rm -f /var/lib/rancher/k3s/server/tls/dynamic-cert.json
k3s restart (controller)
sudo systemctl restart k3s
watch -n3 kubectl get nodes # wait for Ready
k3s agent restart (all worker nodes)
sudo systemctl restart k3s-agent
Verify (controller)
k3s certificate check --output table
kubectl get nodes
kubectl top nodes
Copy fresh kubeconfig (controller)
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo cp /etc/rancher/k3s/k3s.yaml /etc/kolla/zun-compute-k8s/kubeconfig.yml
sudo cp /etc/rancher/k3s/k3s.yaml /etc/kolla/blazar-manager/kubeconfig.yml
sudo cp /etc/rancher/k3s/k3s.yaml /etc/kolla/blazar-api/kubeconfig.yml
sudo cp /etc/rancher/k3s/k3s.yaml /etc/kolla/doni-worker/kubeconfig.yml
docker restart zun_compute_k8s blazar_manager blazar_api doni_worker
if there are dangling zun containers left on the workers delete them (controller)
kubectl get pods --all-namespaces -o wide | grep "zun"
Copy and paste this on the command line.
sudo bash -s <<'CLEAN'
# -------- Delete ONLY orphan Zun deployments/pods --------
echo "Scanning for Zun pods stuck in Pending …"
kubectl get pods -A --field-selector=status.phase=Pending \
-o jsonpath='{range .items[*]}{.metadata.namespace}{"|"}{.metadata.name}{"\n"}{end}' \
| grep '|zun-' \
| while IFS='|' read -r NS POD; do
UUID=$(echo "$POD" | sed -E 's/^zun-([0-9a-f-]{36}).*/\1/')
if openstack container show "$UUID" >/dev/null 2>&1; then
echo "Keeping $UUID – still tracked by Zun"
else
echo "Deleting orphan $UUID ($NS/$POD)"
openstack container delete "$UUID" >/dev/null 2>&1 || true
kubectl -n "$NS" delete deployment "zun-$UUID" --ignore-not-found
fi
done
# Final check
kubectl get pods -A --field-selector=status.phase=Pending | grep zun \
|| echo "No Pending Zun pods remain"
CLEAN
Last updated