0 0 Share PDF

Troubleshooting a UCP 2.2.x Cluster

Article ID: KB000137

This article provides direction on how to begin troubleshooting a UCP 2.2.x cluster, as well as the information to gather along the way to allow support to further assist.

Goal

After completing these steps, you should have a good understanding about how to

  1. Begin troubleshooting a UCP 2.2.x cluster
  2. Capture important information along the way

Collect Support Dump

Begin by collecting a support dump, you can mark the exact time the issue occurred and whether it’s persistent/intermittent.

Verify Backups

Create/verify a backup for all components if possible (UCP, DTR, Swarm, NFS):

Ideas for Investigation

  • Container/service status and health
  • Container logs and inspect
  • Overlay network reachability and DNS resolution
  • Resource exhaustion (disk, cpu, memory, io, network, inode)
  • Node availability

Determine Where the Issue May Reside

Strongly related: =

Related: ≈

This matrix describes the relationship between features of Docker EE UCP 2.2.x — this is not exhaustive of all services but rather a starting point for analysis and investigation.

Controller Swarm Manager auth-api auth-store auth-worker etcd metrics CAs agent/ reconciler engine
Docker API: networks, volumes, containers = = =
Docker API: services, secrets, configs = = =
Authentication Failures = = =
Authorization Failures = = =
Node list timeout/errors = = =
Unhealthy nodes = = = =
Node in Pending =
“rpc error …” =
“context deadline exceeded” = =
Charts not loading data = = =

More Troubleshooting

Review Logs for Errors and Resolution

Review UCP logs based upon the above matrix for errors and hints to resolution.

Resolve

Capture a support dump after every action taken to fix the problem. Avoid taking any actions unless you fully understand what the problem is. If the first set of actions does not fix the problem, avoid taking further actions in the same direction, as the issue might have mutated by the action, and the environment may be surfacing a different problem entirely. Resolve the issues uncovered from the logs. If unsure, find assistance from docs.docker.com or kbase to move forward, collect the data captured during the process, and open a case with Docker Support.