etcd-manager logs says that etcd node of cilium node has joined to the rest of clsuter correctly but that's not correct ! etcd is down and there is no data in the volume that is attached to the ec2 #16872
Labels
kind/bug
Categorizes issue or PR as related to a bug.
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
/kind bug
1. What
kops
version are you running? The commandkops version
, will displaythis information.
1.28.4
I also tried to upgrade the cluster by the last stable version 1.29.2 but there is no difference .
2. What Kubernetes version are you running?
kubectl version
will print theversion if a cluster is running or provide the Kubernetes version specified as
a
kops
flag.client of kubectl v1.31.0
the k8s server 1.27.16
etcd version is 3.5.9
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
kops --name=mycluster --state s3://my-cluster-sample rolling-update cluster --instance-group=master-1b --yes
5. What happened after the commands executed?
Seems the cluster state are healthy ! all etcd-manager ( and the new one etcd-manager-cilium-i-026c2be03509de051 ) are healthy .
the new etc-manager node reports ( logs of etcd-manager-cilium-i-026c2be03509de051 ) that etcd has joined to the rest of cluster but it's not true . actually etcd server could not listen up to any ports ( 4003,2382,8083 ports are down ) , just etcd-manager itself is listening to this port 3991! the volume that has attached to the machine and shared to the pod is empty as well.
Rest of cluster hopefully report that they could not connect to the new etcd because it's not up and running.
6. What did you expect to happen?
I would expect if I can fix the issue by forcing this node to join to the rest of cluster by setting ETCD_INITIAL_CLUSTER_STATE=existing as env in our kops cluster configuration for this etcd node ! after re-create that node , I would have a healthy etcd cluster but it does not work ! it lies and seems everything is ok but it's not true.
The text was updated successfully, but these errors were encountered: