This article provides direction on how to begin troubleshooting a UCP 2.2.x cluster, as well as the information to gather along the way to allow support to further assist.
After completing these steps, you should have a good understanding about how to
- Begin troubleshooting a UCP 2.2.x cluster
- Capture important information along the way
Collect Support Dump
Begin by collecting a support dump, you can mark the exact time the issue occurred and whether it’s persistent/intermittent.
Create/verify a backup for all components if possible (UCP, DTR, Swarm, NFS):
Ideas for Investigation
- Container/service status and health
- Container logs and inspect
- Overlay network reachability and DNS resolution
- Resource exhaustion (disk, cpu, memory, io, network, inode)
- Node availability
Determine Where the Issue May Reside
Strongly related: =
This matrix describes the relationship between features of Docker EE UCP 2.2.x — this is not exhaustive of all services but rather a starting point for analysis and investigation.
|Controller||Swarm Manager||auth-api||auth-store||auth-worker||etcd||metrics||CAs||agent/ reconciler||engine|
|Docker API: networks, volumes, containers||=||=||≈||≈||=|
|Docker API: services, secrets, configs||=||≈||≈||=||=|
|Node list timeout/errors||=||=||≈||=|
|Node in Pending||≈||=|
|“rpc error …”||=|
|“context deadline exceeded”||=||≈||=|
|Charts not loading data||=||=||≈||=|
Review Logs for Errors and Resolution
Review UCP logs based upon the above matrix for errors and hints to resolution.
Capture a support dump after every action taken to fix the problem. Avoid taking any actions unless you fully understand what the problem is. If the first set of actions does not fix the problem, avoid taking further actions in the same direction, as the issue might have mutated by the action, and the environment may be surfacing a different problem entirely. Resolve the issues uncovered from the logs. If unsure, find assistance from docs.docker.com or kbase to move forward, collect the data captured during the process, and open a case with Docker Support.