0 0 Share PDF

Network has active endpoints error during Interlock backend stack removal

Article ID: KB000859

Issue

docker stack rm fails when removing overlay networks with Interlock backend services on them:

$ docker stack remove myapp
Removing service myapp_web
Removing network myapp_default
Failed to remove network vbxm26mirq7no6kn09pofknd8: Error response from daemon: Error response from daemon: network myapp_default id vbxm26mirq7no6kn09pofknd8 has active endpointsFailed to remove some resources from stack: myapp

This happens the first time docker stack rm is issued. If issued again about a minute later docker stack rm demo removes the network.

Prerequisites

  • Universal Control Plane version 3.0.x starting with version 3.0.0
  • Interlock Layer 7 Routing

Root Cause

Summary

Interlock's ability to dynamically reconfigure itself to use new networks can transgress the ownership model of Docker stacks, under which a stack expects to exclusively manage the lifecycle of its own resources. This behavior was not present in the predecessor to Interlock, Host Routing Mesh (HRM), because HRM operated on long lived stack-external networks.

Background

  • docker stack is implemented with individual client side service and network commands (create, update, rm).
  • The ucp-interlock-extension service watches the swarm API and asynchronously adds or removes the ucp-interlock-proxy service to or from networks with Interlock backend services on them.

Sequence of Events

This sequence details the events that occur when the following example stack is deployed and then removed.

docker stack deploy myapp -c - <<EOSTACK
version: "3.2"
services:
  web:
    image: nginx
    deploy:
      labels:
        com.docker.lb.hosts: app.example.org
        com.docker.lb.network: myappnet
        com.docker.lb.port: 80
EOSTACK
  1. docker stack deploy creates a default network myapp_default and deploys the service myapp_default.
  2. (within seconds) the ucp-interlock service notices the new service and polls ucp-interlock-extension for an nginx.conf and then updates the ucp-interlock-proxy service, adding it to myapp_default network.
  3. (some time later) User issues docker stack rm to remove the service and begin to remove the network. Network removal fails because ucp-interlock-proxy is still on the network.
  4. (within seconds) ucp-interlock notices there are no Interlock labelled services on the network, and polls ucp-interlock-extension for an nginx.conf that it propagates to ucp-interlock-proxy to remove it from the network.
  5. (after ucp-interlock-proxy service update completes) docker stack rm succeeds to remove networkmyapp_default.

Resolution

This issue is tracked by Docker-internal engineering ticket ID escalation/731 and currently has no known resolution. As of 7 Sept 2018 Docker engineering was evaluating behavioral changes in swarm mode to address this issue.

Work Arounds

There are two operational accommodations for this issue. One involves changes to stack files and the other involves changes to the affected CI/CD pipeline.

External Network Work Around

Create swarm networks prior to deploying stacks and reference them in stacks as external networks. Under this configuration stack removal will not try to remove the networks, thereby avoiding the issue.

  1. Source a Universal Control Plane client certificate bundle.

  2. Create an overlay network myappnet:

    docker network create -d overlay myappnet
    
  3. Create a docker-compose.yml declaring using the network myappnet as an external resource:

    version: "3.2"
    services:
      web:
        image: nginx
        deploy:
          labels:
            com.docker.lb.hosts: app.example.org
            com.docker.lb.network: myappnet
            com.docker.lb.port: 80
        networks:
          myappnet:
    
    networks:
      myappnet:
        external: true
    
  4. Deploy the stack:

    docker stack deploy myapp_externalnetwork -c docker-compose.yml
    
  5. Wait a minute for Interlock to reconfigure itself to use the network, then remove the stack:

    docker stack remove myapp_externalnetwork
    

    Note: stack removal succeeds because the network is declared as external, so the problematic network removal is skipped.

CI/CD Retry Work Around

Implement docker stack rm retry in the affected CI/CD pipeline. Eventually (typically about 30 seconds) Interlock will remove itself from the affected network, and docker stack rm will succeed.

What's Next