0 0 Share PDF

UCP Support Dump fails due to Calico log collection timeout

Article ID: KB001008

Issue

Sometimes when collecting a UCP Support Dump to submit to Docker Support, the contents of the support dump are not collected due to a timeout in the support dump container.

Unzip the support dump, and you may notice that any number of nodes have failed, showing a hostname.error file, with the following contents:

Error copying support log from hostname.domain.com: context deadline exceeded

Further examination of the affected host's support dump data may show only the Calico CNI logs have been included:

$ ls -lha hostname.domain.com/dsinfo/
total 16K
drwxr-xr-x  3 ada ada 4.0K Jun 19 13:01 .
drwxr-xr-x 19 ada ada 4.0K Jun 19 13:01 ..
-rw-r--r--  1 ada ada 2.0K Jun 19 11:04 certdump.txt
drwxr-xr-x  3 ada ada 4.0K Jun 19 13:01 cni

Root Cause

The UCP Support Dump tool normally limits the # of lines collected from each container, but we have noticed that sometimes the calico-node Pod logs grow large, causing the support dump to fail to complete.

This will be resolved in an upcoming patch release.

Prerequisites

Resolution

To clear the calico-node Pod logs, enabling the UCP Support Dump to succeed, can be accomplished in several ways. Choose one of the following:

  • Truncate the calico-node container logs. This reduces the file's size to 0 bytes without deleting the file's inode. This would need to be performed on each host, individually, as root:
for container in $(docker ps -aqf name=k8s_.*_calico-node --no-trunc); do
  truncate -s0 $(docker info --format {{.DockerRootDir}})/containers/${container}/*log
done
  • Remove the calico-node containers and let the Pod reschedule them. This may cause a temporarily interruption in programming new Pod IPs until the calico-node containers are recreated. This could be performed through a UCP Admin client bundle:
for container in $(docker ps -aqf name=calico-node); do
  docker rm -f ${container}; 
done
  • Remove the calico pods and let the calico daemonset reschedule them. This may cause a temporarily interruption in programming new Pod IPs until the calico-node containers are recreated. This could be performed through a UCP Admin client bundle:
for pod in $(kubectl -n kube-system get pod -l k8s-app=calico-node -o custom-columns=:.metadata.name); do 
  kubectl -n kube-system delete pod $pod; done

What's Next