Getting the dial tcp 10.96.0.1:443: i/o timeout issues

Hello All,

I have created kubernetes cluster on centos 7 with one master node and 2 worker nodes.
[causer@master deployments]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 9d v1.18.0
node1 Ready 7d8h v1.18.0
node2 Ready 7d8h v1.18.0

i am using calico cni for networking. All the pods are running fine except the ingress pods on worker nodes
i.e.
[causer@master deployments]$ kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system calico-kube-controllers-5fc5dbfc47-ktfzc 1/1 Running 2 9d 192.168.219.72 master
kube-system calico-node-bwhff 1/1 Running 2 9d 192.168.29.85 master
kube-system calico-node-mg7rp 1/1 Running 1 7d7h 192.168.29.161 node2
kube-system calico-node-rn9zg 1/1 Running 1 7d7h 192.168.29.16 node1
kube-system coredns-66bff467f8-vkz47 1/1 Running 2 9d 192.168.219.71 master
kube-system coredns-66bff467f8-w4b45 1/1 Running 2 9d 192.168.219.73 master
kube-system etcd-master 1/1 Running 2 9d 192.168.29.85 master
kube-system kube-apiserver-master 1/1 Running 2 9d 192.168.29.85 master
kube-system kube-controller-manager-master 1/1 Running 2 9d 192.168.29.85 master
kube-system kube-proxy-7b9jd 1/1 Running 1 7d7h 192.168.29.161 node2
kube-system kube-proxy-8bbjc 1/1 Running 1 7d7h 192.168.29.16 node1
kube-system kube-proxy-b96b5 1/1 Running 2 9d 192.168.29.85 master
kube-system kube-scheduler-master 1/1 Running 3 9d 192.168.29.85 master
kube-system tiller-deploy-754f98dbfc-f7ztt 1/1 Running 1 7d7h 192.168.104.2 node2
nginx-ingress nginx-ingress-f2lh6 0/1 CrashLoopBackOff 7 16m 192.168.104.3 node2
nginx-ingress nginx-ingress-htclm 0/1 CrashLoopBackOff 7 16m 192.168.166.129 node1
nginx-ingress nginx-ingress-lssvl 1/1 Running 0 16m 192.168.219.74 master

after looking into the logs for failing ingress pods i am getting the following logs

[causer@master deployments]$ kubectl logs nginx-ingress-htclm -n nginx-ingress
I0407 14:34:18.403254 1 main.go:169] Starting NGINX Ingress controller Version=1.6.3 GitCommit=b9378d56
F0407 14:34:48.409730 1 main.go:275] Error trying to get the default server TLS secret nginx-ingress/default-server-secret: could not get nginx-ingress/default-server-secret: Get https://10.96.0.1:443/api/v1/namespaces/nginx-ingress/secrets/default-server-secret: dial tcp 10.96.0.1:443: i/o timeout

Also in the tiller pod i am getting the following error

[causer@master deployments]$ kubectl logs tiller-deploy-754f98dbfc-f7ztt -n kube-system
[main] 2020/04/07 13:19:03 Starting Tiller v2.16.5 (tls=false)
[main] 2020/04/07 13:19:03 GRPC listening on :44134
[main] 2020/04/07 13:19:03 Probes listening on :44135
[main] 2020/04/07 13:19:03 Storage driver is ConfigMap
[main] 2020/04/07 13:19:03 Max history per release is 0
[storage] 2020/04/07 14:41:53 listing all releases with filter
[storage/driver] 2020/04/07 14:42:23 list: failed to list: Get https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps?labelSelector=OWNER%3DTILLER: dial tcp 10.96.0.1:443: i/o timeout

Anyone else got this issue ? The ingress pod running on master is able to connect with kube api server but the ingress pods running on worker nodes are not able to connect.
Any hint or suggestions will be very helpful. i was told in one of the forum that this is the issue with the Calico configuration nginxinc/kubernetes-ingress#911 (comment) . Can somebody help me in fixing the issue ??

-Regards
Mohit

we are using version 3.9 Calico

It sounds like pods that are attempting to access the API server are having trouble doing so. Specifically, pods that are not co-located on the same node as the API server.

This could be indicative of a problem with the kube-proxy on those nodes, or it could be a routing issue.

One thing I noticed in your output above is that it looks like both your nodes and pods are using the 192.168.0.0/16 IP range. This could cause problems with networking between pods and nodes, similar to the ones you’ve described. First thing I would recommend doing is to make sure your pod, node, and service CIDRs do not overlap at all.

If that doesn’t work, here are a few other things to try:

  • Are you able to access https://10.96.0.1:443 from one of those nodes directly? e.g., using SSH and curl? That would show if it’s a problem with the node itself, or if it’s only pods that are having this trouble.
  • If you are able to access it from the node, but not pods, then it is probably an issue with pod routing. Does other pod->pod communication work?
  • If you are not able to access it from the node, then you probably want to check the iptables rules that kube-proxy has programmed to make sure they look correct, and check the kube-proxy logs for more information.

You were right the CIDR of my virtual machines is conflicting with the PODS.I am able to run the cluster after using different CIDR for Pods.

-Many Thanks

Mohit

2 Likes