0 0 Share PDF

Troubleshooting a DTR 2.3.x Cluster

Article ID: KB000136

This article provides direction on how to begin troubleshooting a DTR 2.3.x cluster, as well as the information to gather along the way to allow Docker Support to further assist.

Prerequisites

User should understand:

  • DTR architecture
  • How to view logs
  • How to access Engine logs in debug mode

Goal

After completing these steps, you should have a good understanding how to:

  • Begin troubleshooting a DTR 2.3.x cluster
  • Capture important information along the way

Collect Support Dump

Begin by collecting a support dump. You can mark the exact time the issue occurred and whether it’s persistent/intermittent.

Verify Backups

Create/verify a backup for all components if possible (DTR, UCP, Swarm, NFS):

Ideas for Investigation

  • Container/service status and health
  • Container logs and inspect
  • Overlay network reachability and DNS resolution
  • Resource exhaustion (disk, cpu, memory, io, network, inode)
  • Node availability

Determine Where the Issue May Reside

Strongly related: = 

Related: ≈

This matrix describes the relationship between features of Docker Enterprise DTR 2.3.x; this is not exhaustive of all services but rather a starting point for analysis and investigation.

api registry rethink garant nginx jobrunner postgres notary-server notary-signer storage ucp
Docker API =
Docker Login = = =
Web Login = = =
Orgs/Teams/Users =
DTR Availability = = =
General DTR API =
Permissions = =
Push/Pull = =
Image Promotion = = =
Image Scanning = = =
Image Signing = =
Image GC = =

More Troubleshooting

Review Logs for Errors

Review DTR logs based upon the above matrix for errors and hints to resolution.

Resolution

Capture a support dump after every action taken to fix the problem. Avoid taking any actions unless you fully understand what the problem is. If the first set of actions does not fix the problem, avoid taking further actions in the same direction, as the issue might have mutated by the action, and the environment may be surfacing a different problem entirely. Resolve the issues uncovered from the logs. If unsure, find assistance from docs.docker.com or kbase to move forward, collect the data captured during the process, and open a case with Docker Support.

Pro Tip: In the case where one of the DTR replicas has had containers accidently removed running the dtr/reconfigure command can resolve the problem: https://docs.docker.com/datacenter/dtr/2.3/reference/cli/reconfigure/#description.