Calico-managed container communication accross hosts fails

Dear Calico community,

based on the following environment:

  • K8s: 1.19.3
  • Calico: 3.16.4

deployed via:

  • Kubespray: 2.11

we use the following sample K8s-YAML providing:

  • 2x K8s-PODs
  • comprising a single container image “network-multitool” (offering several network tools like “nc”, “ping”, “traceroute”, etc.) each
  • on DIFFERENT physical nodes
  • with a service attached to “mt-server” to allow the “mt-client” to reach it:
apiVersion: v1
kind: Namespace
metadata:
  name: clusterdbg

---
# Pod 1 - Role: server
apiVersion: v1
kind: Pod
metadata:
  name: mt-server
  namespace: clusterdbg
  labels:
    app: mt-server
spec:
  # Pod is scheduled on nodes that are labelled with dbgnode=n1
  nodeSelector:
    dbgnode: n1
  tolerations:
  - key: "node.kubernetes.io/unschedulable"
    operator: "Equal"
    effect: "NoSchedule"
  containers:
  - command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 30; done;"]
    image: registry-prod.xxxxxx.corpintra.xxx/clusterdbg/praqma/network-multitool:0daefe6
    imagePullPolicy: IfNotPresent
    name: mt-server
    ports:
    # TCP-Port 9090
    - containerPort: 9090
    # UDP-Port 9191
    - containerPort: 9091
---
# Pod 2 - Role: client
apiVersion: v1
kind: Pod
metadata:
  name: mt-client
  namespace: clusterdbg
spec:
  # Pod is scheduled on nodes that are labelled with dbgnode=n2
  nodeSelector:
    dbgnode: n2
  tolerations:
  - key: "node.kubernetes.io/unschedulable"
    operator: "Equal"
    effect: "NoSchedule"
  containers:
  - command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 30; done;"]
    image: registry-prod.xxxxxx.corpintra.xxx/clusterdbg/praqma/network-multitool:0daefe6
    imagePullPolicy: IfNotPresent
    name: mt-client
---

# Cluster IP service that is exposing the mt-server
apiVersion: v1
kind: Service
metadata:
  name: mt-server-clusterip
  namespace: clusterdbg
spec:
  selector:
    app: mt-server
  ports:
    - protocol: TCP
      port: 9090
      targetPort: 9090
      name: tcp-service
    - protocol: UDP
      port: 9191
      targetPort: 9191
      name: udp-service

As long as we try to ping from “mt-client” to “mt-server” we have success. As soon as we try to reach the server via TCP protocols, e.g. via “netcat” we fail (no “Hello” reaches the netcat service running on “mt-server” listening on port 9090):

#starting the NetCat server TCP listen process on “mt-server” on port 9090
nc -l -p 9090

#writing to the NetCat server from “mt-client” to port 9090
echo “Hello” | nc <target_ip> 9090

After analyzing TCP SYN flags on different network devices (container device, tunnel device, host device) we get the following picture:

Hints:

  • Inter-host communication worked, suddenly it fails and we do not know why
  • host-local communication works (e.g. when the PODs are running on the same physical node)
  • we followed the Calico troubleshooting guide below Troubleshoot and diagnostics without detecting any obvious errors

Does the Calico community have any ideas what we can do to get rid of this issue? :blush:

Important hints:

  • the issue happens when updating K8s worker nodes from CentOS 7.6 to CentOS 7.8 or directly using CentOS 7.8 worker nodes completely fresh installed (not updated)
  • the issue appears also when booting version 7.6 of the Kernel provided in an updated CentOS 7.8 to CentOS 7.8 machine
  • when deploying the above K8s PODs on a CentOS 7.6 worker node again, it works
  • Docker Engine Community edition in version 19.03.13 with API version 1.40 is in use

Finally, we strongly suppose some incompatibilities between:

  • CentOS 7.8
  • Calico
  • and probably the Docker Community container runtime - see image above.

Do you know of these incompatibilities? If yes, where?

Some thoughts:

  • Maybe the OS update brought a component that conflicts with pod networking, such as firewalld or network manager. I tend to use

    watch iptables-save -c | grep DROP | grep -v '\[0:0\]'
    

    to see if iptables is dropping traffic.

  • Maybe the domain name we’re detecting has changed and Calico doesn’t think the pod belongs on this host. Calico’s node name needs to agree with kubernetes’ node name. I think we use the Kubernetes downward API to get the node name these days so that shouldn’t happen (but if you’re building your own manifests you may get caught out).

  • Worth checking for errors/warnings in the calico-node log on the host with the target pod.

  • Do you have any network policy in play? Perhaps the source address of the packet is being incorrectly SNATted and then it doesn’t match the policy.

As with CentOS 7.9 the problem has gone we didn’t take a look at the “iptables” rules in detail to check whether the root cause is located there (e.g. “drop rule”). A comparison between a working (on CentOS 7.6/7.9) and defect cluster (on CentOS 7.8) environment might show differences.

Clear is that Kubespray uses the following directive:

  • calico_iptables_backend = “Legacy”

for Felix (see: https://docs.projectcalico.org/reference/felix/configuration) when deploying Calico under CentOS 7.x. This means “iptables” instead of NFT (nftables) is still used for CentOS 7.x.

Strange was indeed, that although the configuration file for NetworkManager below:

  • /etc/NetworkManager/conf.d/calico.conf

has been created including the expected content by Kubespray (according to: https://docs.projectcalico.org/archive/v3.16/maintenance/troubleshoot/troubleshooting), the problem occurred below CentOS 7.8. So, we assume it’s not NetworkManager that causes the issue.

With that we mark this issue as done and solved.