Comparison of docker k8s network architecture

Time:2020-3-1

Docker network architecture

Comparison of docker k8s network architecture
Design defects:

  • The bridge is virtualized by the host and cannot be addressed externally. If you want to be addressed externally, you need to map the container port to the host port. In fact, port mapping is realized by adding corresponding rules to the NAT table of iptables, which is also called NAT mode. Nat model needs to map various ports, which will limit the capacity of host and increase the complexity of container arrangement.
  • Port is a scarce resource, which needs to solve the problem of port conflict and dynamic port allocation. This not only complicates the scheduling, but also complicates the configuration of applications, which is manifested in port conflicts, reuse, and exhaustion.
  • Nat introduces the practice of address segmentation into additional complexity. For example, the IP seen by the application in the container is not the IP exposed to the outside world. Because of the network isolation, the application in the container can only detect the IP of the container, but what needs to be declared is the IP of the host. This information asymmetry will bring problems such as the mechanism of breaking the ring self registration.

K8s network model

Kubenetes is independent from docker network model (NAT network model) to form a new network model.
Each pod has a flat “pod IP” that shares the network namespace.
It communicates with other physical machines and pods across the network through podip.
The IP per pod model creates a clean, reverse compatible model.
In this model, pod can be regarded as virtual machine or physical machine from the aspects of port allocation, network, domain name resolution, service discovery, load balancing, application configuration and migration, and the application can smoothly migrate from the non content environment to the container environment in the same pod.

To implement this model, several problems need to be solved:

Container to container

  • Pod is a collection of containers
  • Pod contains containers that run on the same host and have the same network space. Containers can communicate with each other
  • The pod running container contains the service container and the network container “pause”
  • Pause, as the name suggests, prevents the container from exiting
  • The network container is only used to take over the pod network, and the service container realizes network sharing by joining the network container

The pod content manager runs similarly with the following command:

docker run -p 80:80 -p 8080:8080 --name network-container -d gcr.io/google_container/pause:3.1
 
docker run --net container:network-container -d jonlangemak/docker:web_container_8080

Container network topology in pod
Comparison of docker k8s network architecture

Pod to pod communication

  • K8s network model is a flat network plane
  • Pod, as a network unit, is at the same level as k8s node network

Minimum k8s network topology

  • Communication between pods: pod1 and POD2 (same host), pod1 and pod3 (cross host communication)
  • Communication between node and Pod: node1 and pod1 / POD2 (same host), pod3 (cross host communication)

Comparison of docker k8s network architecture

Question:

  1. How to ensure that the podip of pod is globally unique? ·
  2. Pod / container on the same k8s node can communicate natively. How can pod communicate without nodes?
  • The podip of the pod is allocated by the docker bridge, so you can configure the docker bridge of different k8s nodes into different IP segments.
  • Enhance docker, create an overlay network in the container cluster, and connect each node, such as flannel.

Flannel network model

  • Flannel is an overlay network tool designed by the coreos team
  • Flannel sets a subnet for the host, encapsulates the communication messages between containers through the tunnel protocol, and realizes the cross host communication of containers.
  • Flannel needs to be configured with etcd before running
etcdctl set /coreos.com/network/config '{"Network": "10.0.0.0/16"}'
  • After configuration, run flannel on node
  • The configuration in the etcd is checked on initial start-up
  • Assign available IP segments to the current node
  • Create a routing table in etcd
etcdctl ls /coreos.com/network/subnets
ectdctl get /coreos.com/network/subnets/10.0.62.0-24
ectdctl get /coreos.com/network/subnets/10.0.10.0-24
  • After the network segment is allocated, a virtual network card is created
ip addr show flannel.1
  • The docker bridge (docker0) is also configured, which is implemented by modifying the docker startup parameter BiP.
ip addr show docker0
  • In addition, flannel modifies the routing table so that the flannel virtual network card can take over the container’s Cross host communication.
route -n 

Network topology
Comparison of docker k8s network architecture

Service to pod communication

Service acts as a “service agent” between pods, which is externally represented as a single access interface to forward requests to the pod.

$ kubectl decsribe service myservice
Name: myservice
Namespace: default
Lables: <none>
Selector: name=mypod
Type: ClusterIP
IP: 10.254.206.220
Port: http 80/TCP
Endpoints: 10.0.62.87:80,10.0.62.88:80,10.0.62.89:80

Session Affinity: None
No evevts.

Read as follows:

  • The virtual IP of service is 10.254.206.220
  • Port 80 / TCP corresponds to three back ends: 10.0.62.87:80, 10.0.62.88:80, 10.0.62.89:80
  • Request 10.254.206.220:80 is forwarded to one of these back ends
  • The virtual IP is created by k8s. The virtual network segment is configured by API server startup parameter service cluster IP range = 10.254.0.0/16

The Kube proxy component is responsible for virtual IP routing and forwarding. It implements the virtual forwarding network on the container coverage network. Its functions are as follows:

  • Forward the request of virtual IP accessing the service to endpoints
  • Monitor the changes of service and endpoints to refresh forwarding rules
  • Provide load balancing capability

The Kube proxy implementation mechanism is specified by the startup parameter — proxy mode

  • Userspace
  • Currently, Kube proxy is only a layer 3 (TCP / UDP over IP) forwarding, and the default policy is polling.
  • Kube proxy will create two iptables rules for the service, including the following two iptables custom chains

    • Kube-ports-container is used to match the messages sent by the container, and is bound in the pre outgoing table of the NAT table
    • Kube-ports-host is used to match the messages sent by the host and bound to the output chain of the NAT table
  • The purpose of these two rules is to redirect the message of destination IP 10.254.206.220 and destination port 80 to local port 35841
  • This port is what Kube proxy listens to

Comparison of docker k8s network architecture

  • Iptables

Through the creation of iptables rules, the virtual IP requests accessing the service are directly redirected to the endpoints. When the endpoints change, the Kube proxy will refresh the relevant iptables rules. In this mode, the Kube proxy is only responsible for monitoring the service and endpoints, updating the iptables rules, and message forwarding depends on the Linux kernel. The default load balancing method is random.
 

Comparison of docker k8s network architecture
Query iptables rules about service myservice

iptables-save | grep myservice

Kube proxy will create a series of iptables rules for the service, including the iptables custom chain:

  • Kube-services: bound in the preouting chain and output chain of the NAT table
  • Kube-svc – *: represents a service, bound in kube-services
  • KUBE-SEP-: represents each backend of endpoints, bound to Kube SVC-

Query forwarding rules

iptables -t nat -L -n