1. Confirm that all DNS records have been created.
Galileo will not set DNS records for your cluster and as such you need to set those appropriately for your company. Each record should have a TTL of 60 seconds or less. If you are letting Galileo provision Let’s Encrypt certificates for you automatically with cert-manager, it’s important to make sure that all of cert-manager’s http solvers have told Let’s Encrypt to provision a certificate with all of the domains specified for the cluster (i.e.api|console|data|grafana.my-cluster.my-domain.com
)
2. Check the API’s health-check.
3. Check for unready pods.
4. Check for pending persistent volume claims.
5. Clickhouse keeper fails to start
clickhouse-keeper
with zero ready replicas, it means the kubernetes version is incompatible, please take the following steps:
- Upgrade kubernetes version (control plane + node groups) to at least 1.30
- Delete the broken CRD with
kubectl delete crd clickhousekeeperinstallations.clickhouse-keeper.altinity.com
- Delete the clickhouse operator with
kubectl delete deploy clickhouse-operator
- Re-apply the manifest
- Wait for 2 minutes, confirm 3 clickhouse keeper statefulsets
chk-clickhouse-keeper-cluster
are up withkubectl get sts --all-namespaces | grep -i clickhouse-keeper
- If you still see an unhealthy statefulset
clickhouse-keeper
along with those 3, just clean up the statefulset and its pvc withkubectl delete sts clickhouse-keeper && kubectl delete pvc data-volume-claim-clickhouse-keeper-0