0 0 Share PDF

Multiple failures in HA cluster can lead to loss of cluster nodes

Article ID: KB000408

NOTE: This issue is fixed for UCP 1.1 and higher.

When you join non-replica nodes in a UCP HA cluster, the nodes are configured to talk to that specific cluster for cluster membership advertisement. Through that specific controller node, they will auto-detect the other controllers.

If that specific controller fails, the non-replica node will continue to advertise to other controllers in the cluster and everything will work fine. However, if the non-replica node restarts/reboots while the specific controller it was joined to is still down, the non-replica will fail to talk to that specific controller after rebooting, and the node will drop out of the cluster. Once that specific controller comes back, the node will detect and re-advertise.

This issue was tracked at https://github.com/docker/orca/issues/670.