IPSec VPN to a K8S-cluster: traffic policies to allow requests from an external network

Hello,

Would anyone be so kind as to help me to understand what policies should I apply to allow traffic from an external subnet?

I have a bunch of K8S-nodes and a separate server that works as a VPN gateway, it’s connected to the same VLAN. The nodes have the following IP-addresses: 10.13.17.1/22, 10.13.17.2/22, 10.13.17.3/22 and so on. The VPN gateway has 10.13.16.253/22.

The Cluster IP CIDR is 10.233.0.0/18, the pod IP CIDR is 10.233.64.0/18.

The VPN server supports an IPSec site-to-site connection with a remote network, 10.103.103.0/24. Also, this server supports BGP sessions with all K8S-nodes, so its route table is full of prefixes announced by Calico nodes (10.233.0.0/18 is present too as well, of course).

When I establish a connection to a service inside of the cluster from the VPN-server, everything is good. The client (10.13.16.253) sends a SYN-packet to the service (10.233.10.101:1337), the worker receives this packet, changes it’s destination IP-address to the IP-address of the pod (10.233.103.49:1337) and changes it’s source IP-address to some IP-address (10.233.110.0) that will help the worker to receive the reply and give it back to the connection initiator. Here’s what happens on the worker that receives this SYN-packet:

22:04:25.866546 IP 10.13.16.253.56297 > 10.233.10.101.1337: Flags [S], seq 3575679444, win 65228, options [mss 1460,nop,wscale 7,sackOK,TS val 1385938010 ecr 0], length 0
22:04:25.866656 IP 10.233.110.0.54430 > 10.233.103.49.1337: Flags [S], seq 3575679444, win 65228, options [mss 1460,nop,wscale 7,sackOK,TS val 1385938010 ecr 0], length 0
22:04:25.867313 IP 10.233.103.49.1337 > 10.233.110.0.54430: Flags [S.], seq 2017844946, ack 3575679445, win 28960, options [mss 1460,sackOK,TS val 1201488363 ecr 1385938010,nop,wscale 7], length 0
22:04:25.867533 IP 10.233.10.101.1337 > 10.13.16.253.56297: Flags [S.], seq 2017844946, ack 3575679445, win 28960, options [mss 1460,sackOK,TS val 1201488363 ecr 1385938010,nop,wscale 7], length 0

So, the connection is established and everyone is happy.

But when I try to connect to the same service from the external network (10.103.103.0/24) the worker who receives the SYN-packet does NOT change the source IP-address, it changes the destination IP-address only, so the packet’s source IP-address is unchanged.

21:56:05.794171 IP 10.103.103.1.52132 > 10.233.10.101.1337: Flags [S], seq 3759345254, win 29200, options [mss 1460,sackOK,TS val 195801472 ecr 0,nop,wscale 7], length 0
21:56:05.794242 IP 10.103.103.1.52132 > 10.233.103.49.1337: Flags [S], seq 3759345254, win 29200, options [mss 1460,sackOK,TS val 195801472 ecr 0,nop,wscale 7], length 0
21:56:21.826153 IP 10.103.103.1.52132 > 10.233.10.101.1337: Flags [S], seq 3759345254, win 29200, options [mss 1460,sackOK,TS val 195817504 ecr 0,nop,wscale 7], length 0
21:56:21.826199 IP 10.103.103.1.52132 > 10.233.103.49.1337: Flags [S], seq 3759345254, win 29200, options [mss 1460,sackOK,TS val 195817504 ecr 0,nop,wscale 7], length 0
21:56:53.924191 IP 10.103.103.1.52132 > 10.233.10.101.1337: Flags [S], seq 3759345254, win 29200, options [mss 1460,sackOK,TS val 195849600 ecr 0,nop,wscale 7], length 0
21:56:53.924254 IP 10.103.103.1.52132 > 10.233.103.49.1337: Flags [S], seq 3759345254, win 29200, options [mss 1460,sackOK,TS val 195849600 ecr 0,nop,wscale 7], length 0

The destination IP-address is changed, so I can see these packets on the worker where the pod is running, but there are no replies to them:

21:56:05.794602 IP 10.103.103.1.52132 > 10.233.103.49.1337: Flags [S], seq 3759345254, win 29200, options [mss 1460,sackOK,TS val 195801472 ecr 0,nop,wscale 7], length 0
21:56:21.826553 IP 10.103.103.1.52132 > 10.233.103.49.1337: Flags [S], seq 3759345254, win 29200, options [mss 1460,sackOK,TS val 195817504 ecr 0,nop,wscale 7], length 0
21:56:53.924556 IP 10.103.103.1.52132 > 10.233.103.49.1337: Flags [S], seq 3759345254, win 29200, options [mss 1460,sackOK,TS val 195849600 ecr 0,nop,wscale 7], length 0

The external network (10.103.103.0/24) is being advertised by the VPN server via BGP, so all the workers know that this network is accessible via 10.13.16.253. When I run the ping-test from a host in the external network (10.103.103.1) to the IP-address of the service (10.233.10.101), the test passes, VPN works fine and routing tables seem to be correct.

So, why does the network “trust” to 10.13.16.253 and doesn’t trust to 10.103.103.1? And why does the worker perform SNAT and DNAT for the packets from 10.13.16.253 and does not perform SNAT for the packets from 10.103.103.1? Should I add some policies to allow this traffic?

Thanks in advance for any clues!

What are your Calico IP pools calicoctl get ippool and your kube-proxy cluster-cidr setting (may be in its config map or a command-line argument)?

Calico does SNAT when it sees traffic from an IP pool with natOutgoing=true to an address outside any IP pool.

kube-proxy does SNAT when it sees traffic from outside its cluster CIDR to a service.

The cluster CIDR should contain your Calico IP pools and neither should overlap with your service CIDR or the IPs of your workers or external hosts.

Thank you for your attention!

The Calico IP pool is 10.233.64.0/18, the value of the clusterCIDR parameter of kube-proxy is 10.233.64.0/18 too.

Everything seems to be correct, right?

Yes, those look correct, what about your service cluster CIDR (as used by kube API server/kube controller manager)?

What do you have for calicoctl get ippools -o wide ?

Hello!

The value of --service-cluster-ip-range is 10.233.0.0/18.

And calicoctl get ippools -o wide gives the following result:

NAME           CIDR             NAT    IPIPMODE   VXLANMODE   DISABLED   SELECTOR
default-pool   10.233.64.0/18   true   Always     Never       false      all()

Is that correct?

Ta-damn!

pfSense was breaking the SYN-packet’s checksum:

13:53:32.286601 IP (tos 0x0, ttl 62, id 33830, offset 0, flags [DF], proto TCP (6), length 60)
    10.103.103.1.47390 > 10.233.10.101.1337: Flags [S], cksum 0x86e4 (incorrect -> 0x99db), seq 4230752647, win 29200, options [mss 1460,sackOK,TS val 598846881 ecr 0,nop,wscale 7], length 0
        0x0000:  4500 003c 8426 4000 3e06 31e0 0a67 6701  E..<.&@.>.1..gg.
        0x0010:  0ae9 0a65 b91e 0539 fc2c 2987 0000 0000  ...e...9.,).....
        0x0020:  a002 7210 86e4 0000 0204 05b4 0402 080a  ..r.............
        0x0030:  23b1 ada1 0000 0000 0103 0307            #...........

I’ve disabled the hardware checksum offload feature and now everything works smoothly.

Lots of thanks to y’all for your time and attention!