I'm about to create a Docker Enterprise Edition (EE) cluster, and planning how many nodes and what their roles will be. I've been told I should never run workloads on the nodes running the Universal Control Plane (UCP) managers or Docker Trusted Registry (DTR) replica nodes. Why should I never run ordinary, appplication workloads on these types of nodes?
For very small, prototype deployments, pilot programs, and proofs of concept, running application workloads on UCP managers and DTR replicas is probably fine, so long as performance and security are not significant concerns.
However, for clusters where development, test, and production are the intended usages, it's a best practice to dedicate specific nodes for UCP and DTR functions. That is, a typical production cluster would include six nodes to perform management and registry functions: three for UCP and three for DTR. There are several reasons for this recommendation, including:
Contention. If the application workload is greater than what the node running the UCP or DTR function can handle, then UCP and DTR may be unable to maintain consensus of their underlying state. If this happens on several nodes at the same time, the UCP or DTR function (or both) may lose quorum, with significant impact to performance and the ability to recover from this non-quorate state.
Security. In the event of a bad actor, major bug, or exploit, malicious code may break out of a container, and begin executing suspect code at the node level. If this were to occur on a UCP node, the entire cluster would be suspect and subject to the exploit code. If this were to occur on a DTR node, all images stored in the registry, vulnerabilty scanning would be suspect, and Docker Content Trust would also be suspect.
Maintenance. When performing maintenance on a node, or in the event of a node failure, one typical uses the "drain" function to seamlessly shift the application workload away from the node. If the UCP or DTR nodes are treated like ordinary workers, the application workload may shift onto them. Should this overload one or more UCP or DTR nodes, this may result in contention, i.e. condition 1.