When an already working Windows worker node is restarted, it can lose connection from the UCP managers and will error out instead of reconnecting.
You should notice one or more of the following symptoms:
- On the UCP UI, the node section shows One or more required ports is unavailable on this node
- Windows Event log contains
fatal: Error starting daemon: Error initializing network controller: Error creating default network: HNS failed with error : Unspecified error
- UCP agent generates
Error while checking for available ports: "Error while checking for available ports: The following required ports are already in use on your host - 12376.
- An existing hybrid UCP cluster installation
When a Windows host server is restarting, some HNS assets are not cleaned up properly in the process, and they block the brand new instance from being created. Server crashes can also leave it at similar state.
Microsoft and Docker development teams are currently working on the fix.
In the mean time, use the following steps to clean up the environments:
Login as a UCP administrator.
Generate worker node token (Alternatively, you can generate the same token from the UCP UI.):
docker swarm join-token worker
Run the following commands from PowerShell as an Administrator:
docker swarm leave Stop-Service docker Get-ContainerNetwork | Remove-ContainerNetwork Start-Service docker
Run join command from step 2 to re-join the node.
When you restart the server, Windows hosts might find itself in the same situation again. If that happens, run the steps above again to clean up.
If your cluster is affected by this issue, please contact Microsoft Support.