Initializing a swarm or joining an existing swarm causes loss of network connection from a specific EC2 instance on AWS. This affects machines running Docker EE on Windows Server only.
On AWS console, error message like this will be shown few minutes after executing
docker swarm join or
docker swarm init:
Instance reachability check failed at <Date>
This issue affects servers that meets the following criteria:
- Windows Server 2016 is installed
- Amazon EC2 instance C3, C4, D2, I2, M4 (excluding m4.16xlarge), M5, and R3
- For on-premise installation, a virtual machine has
Intel(R) 82599 Virtual Functionas primary NIC driver
The drives affected may be conflicting with HNS (Host Network Service) on the Windows host. This issue is still under investigation as of January 2019.
As of today, the workaround to this issue are:
- Use other AWS instances such as i3 or r4
- Consider alternative virtual NIC with on-premise installation
If your server has already lost connection, and you need to recover data, please use following steps on AWS:
Instance IDof the affected server.
- Create a brand new network interface with public IP.
- Select the network interface just created.
- Click Attach and specify the
Instance IDfrom step 1.
- Connect to the instance using the new public IP from step 2.
Docker and AWS support team is working on this issue currently. This article will be updated as progress is made. If you think your cluster is affected by this, please file a support case with Docker Support as well as Amazon AWS support.