This article provides direction on how to begin troubleshooting a DTR 2.3.x cluster, as well as the information to gather along the way to allow Docker Support to further assist.
Prerequisites
User should understand:
- DTR architecture
- How to view logs
- How to access Engine logs in debug mode
Goal
After completing these steps, you should have a good understanding how to:
- Begin troubleshooting a DTR 2.3.x cluster
- Capture important information along the way
Collect Support Dump
Begin by collecting a support dump. You can mark the exact time the issue occurred and whether it’s persistent/intermittent.
Verify Backups
Create/verify a backup for all components if possible (DTR, UCP, Swarm, NFS):
- https://docs.docker.com/datacenter/dtr/2.3/guides/admin/backups-and-disaster-recovery/
- https://docs.docker.com/datacenter/ucp/2.2/reference/cli/backup/
- https://docs.docker.com/engine/swarm/admin_guide/#back-up-the-swarm
Ideas for Investigation
- Container/service status and health
- Container logs and inspect
- Overlay network reachability and DNS resolution
- Resource exhaustion (disk, cpu, memory, io, network, inode)
- Node availability
Determine Where the Issue May Reside
Strongly related: =
Related: ≈
This matrix describes the relationship between features of Docker Enterprise DTR 2.3.x; this is not exhaustive of all services but rather a starting point for analysis and investigation.
api | registry | rethink | garant | nginx | jobrunner | postgres | notary-server | notary-signer | storage | ucp | |
---|---|---|---|---|---|---|---|---|---|---|---|
Docker API | = | ||||||||||
Docker Login | ≈ | = | = | = | |||||||
Web Login | = | ≈ | = | = | |||||||
Orgs/Teams/Users | = | ||||||||||
DTR Availability | ≈ | ≈ | = | = | = | ||||||
General DTR API | = | ≈ | ≈ | ≈ | |||||||
Permissions | = | = | ≈ | ≈ | |||||||
Push/Pull | = | ≈ | ≈ | ≈ | = | ||||||
Image Promotion | = | = | ≈ | ≈ | = | ||||||
Image Scanning | = | ≈ | ≈ | = | = | ||||||
Image Signing | ≈ | ≈ | ≈ | = | = | ||||||
Image GC | ≈ | = | = |
More Troubleshooting
- https://docs.docker.com/datacenter/dtr/2.3/guides/admin/monitor-and-troubleshoot/troubleshoot-with-logs
- https://docs.docker.com/datacenter/dtr/2.3/guides/admin/monitor-and-troubleshoot/
Review Logs for Errors
Review DTR logs based upon the above matrix for errors and hints to resolution.
Resolution
Capture a support dump after every action taken to fix the problem. Avoid taking any actions unless you fully understand what the problem is. If the first set of actions does not fix the problem, avoid taking further actions in the same direction, as the issue might have mutated by the action, and the environment may be surfacing a different problem entirely. Resolve the issues uncovered from the logs. If unsure, find assistance from docs.docker.com or kbase to move forward, collect the data captured during the process, and open a case with Docker Support.
Pro Tip: In the case where one of the DTR replicas has had containers accidently removed running the dtr/reconfigure command can resolve the problem: https://docs.docker.com/datacenter/dtr/2.3/reference/cli/reconfigure/#description.