0 0 Share PDF

Windows worker node becomes unavailable after restart

Article ID: KB000652

Issue

When an already working Windows worker node is restarted, it can lose connection from the UCP managers and will error out instead of reconnecting.

You should notice one or more of the following symptoms:

  • On the UCP UI, the node section shows One or more required ports is unavailable on this node
  • Windows Event log contains fatal: Error starting daemon: Error initializing network controller: Error creating default network: HNS failed with error : Unspecified error
  • UCP agent generates Error while checking for available ports: "Error while checking for available ports: The following required ports are already in use on your host - 12376.

Prerequisites

  • An existing hybrid UCP cluster installation

Root Cause

When a Windows host server is restarting, some HNS assets are not cleaned up properly in the process, and they block the brand new instance from being created. Server crashes can also leave it at similar state.

Resolution

Microsoft and Docker development teams are currently working on the fix.

In the mean time, use the following steps to clean up the environments:

  1. Login as a UCP administrator.

  2. Generate worker node token (Alternatively, you can generate the same token from the UCP UI.):

    docker swarm join-token worker
    
  3. Run the following commands from PowerShell as an Administrator:

    docker swarm leave
    Stop-Service docker
    Get-ContainerNetwork | Remove-ContainerNetwork
    Start-Service docker
    
  4. Run join command from step 2 to re-join the node.

When you restart the server, Windows hosts might find itself in the same situation again. If that happens, run the steps above again to clean up.

What's Next

If your cluster is affected by this issue, please contact Microsoft Support.