preface
In the k8s cluster construction process, in general, the pod we deploy selects nodes through the automatic scheduling strategy of the cluster. By default, the scheduler considers that the resources are sufficient and the load is as average as possible. Sometimes we need to control pod more finely; Sometimes we want internal and external businesses to run on different nodes, and two interdependent pods to run on the same node; This requires us to better control the deployment of pod; K8s provides us with the concepts of affinity and anti affinity, taint and tolerance.
- Node affinity
- It is also divided into hard affinity and soft affinity
- Hard affinity indicates that the conditions must be met
- Soft affinity means to meet as much as possible
Node affinity:
- Hard affinity indicates that the condition must be met
requiredDuringSchedulingIgnoredDuringExecution
It means that the pod must be deployed to the node that meets the conditions. If there is no node that meets the conditions, it will keep trying again. Where ignoreduringexecution refers to the time when the pod runs after deployment,If the pod of the specified node does not meet the operation conditions, the pod will continue to run。 - Soft affinity means to meet the conditions as much as possible
preferredDuringSchedulingIgnoredDuringExecution
It means that it is preferentially deployed to the nodes that meet the conditions,If there are no nodes that meet the conditions, ignore these conditions and deploy according to the normal logic.
View a detailed description of nodeaffinity
[[email protected] scheduler]# kubectl explain pods.spec.affinity.nodeAffinity
KIND: Pod
...
FIELDS:
Preferredduringschedulingignoredduringexecution < [] Object > # soft affinity
The scheduler will prefer to schedule pods to nodes that satisfy the
affinity expressions specified by this field, but it may choose a node that
violates one or more of the expressions. The node that is most preferred is
the one with the greatest sum of weights, i.e. for each node that meets all
of the scheduling requirements (resource request, requiredDuringScheduling
affinity expressions, etc.), compute a sum by iterating through the
elements of this field and adding "weight" to the sum if the node matches
the corresponding matchExpressions; the node(s) with the highest sum are
the most preferred.
Requiredduringschedulingignoredduringexecution < Object > # hard affinity
If the affinity requirements specified by this field are not met at
scheduling time, the pod will not be scheduled onto the node. If the
affinity requirements specified by this field cease to be met at some point
during pod execution (e.g. due to an update), the system may or may not try
to eventually evict the pod from its node.
[ [email protected] -master ~]# kubectl explain pods. spec.affinity. nodeAffinity. Requiredduringschedulingignoredduringexecution # hard affinity details
...
FIELDS:
Nodeselectorterms < [] Object > - required - # node selection conditions
Required. A list of node selector terms. The terms are ORed.
[ [email protected] -master scheduler]# kubectl explain pods. spec.affinity. nodeAffinity. Preferredduringschedulingignoredduringexecution # soft affinity details
...
FIELDS:
Preference < Object > - required - # affinity bias is used with weights
A node selector term, associated with the corresponding weight.
Weight < integer > - required - # weight
Weight associated with matching the corresponding nodeSelectorTerm, in the
range 1-100.
[[email protected] scheduler]# kubectl explain pods.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution
KIND: Pod
VERSION: v1
RESOURCE: requiredDuringSchedulingIgnoredDuringExecution <Object>
DESCRIPTION:
If the affinity requirements specified by this field are not met at
scheduling time, the pod will not be scheduled onto the node. If the
affinity requirements specified by this field cease to be met at some point
during pod execution (e.g. due to an update), the system may or may not try
to eventually evict the pod from its node.
A node selector represents the union of the results of one or more label
queries over a set of nodes; that is, it represents the OR of the selectors
represented by the node selector terms.
FIELDS:
nodeSelectorTerms <[]Object> -required-
Required. A list of node selector terms. The terms are ORed.
Example 1: node hard affinity
[[email protected] Scheduler]# cat pod-with-nodeselector.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-with-nodeselector
spec:
containers:
- name: demoapp
image: ikubernetes/demoapp:v1.0
Nodeselector: # hard affinity option
GPU: '' # is empty
[[email protected] Scheduler]# kubectl get pod
NAME READY STATUS RESTARTS AGE
deployment-demo-5fddfb8ffc-lssq8 1/1 Running 0 23m
deployment-demo-5fddfb8ffc-r277n 1/1 Running 0 23m
deployment-demo-5fddfb8ffc-wrpjx 1/1 Running 0 23m
deployment-demo-5fddfb8ffc-zzwck 1/1 Running 0 23m
Pod with nodeselector 0 / 1 pending 0 3m8s# pending status
[[email protected] Scheduler]# kubectl describe pod pod-with-nodeselector
......
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 15s default-scheduler 0/4 nodes are available: 4 node(s) didn't match node selector. # Prompt that all nodes have no matching labels
Warning FailedScheduling 15s default-scheduler 0/4 nodes are available: 4 node(s) didn't match node selector.
[ [email protected] -Master scheduler]# kubectl label node k8s-node3 GPU = '' # label node3 GPU is empty
node/k8s-node3 labeled
[[email protected] Scheduler]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-demo-5fddfb8ffc-lssq8 1/1 Running 0 24m 192.168.113.14 k8s-node1 <none> <none>
deployment-demo-5fddfb8ffc-r277n 1/1 Running 0 24m 192.168.12.12 k8s-node2 <none> <none>
deployment-demo-5fddfb8ffc-wrpjx 1/1 Running 0 24m 192.168.12.11 k8s-node2 <none> <none>
deployment-demo-5fddfb8ffc-zzwck 1/1 Running 0 24m 192.168.51.19 k8s-node3 <none> <none>
Pod with nodeselector 1 / 1 running 0 3m59s 192.168.51.20 k8s-node3 < none > < none > # running in node3
[ [email protected] -Master scheduler]# kubectl label node k8s-node3 gpu- # deleting the label will not affect the created pod, because scheduling only occurs before creation
node/k8s-node3 labeled
[[email protected] Scheduler]# kubectl get pod
NAME READY STATUS RESTARTS AGE
deployment-demo-5fddfb8ffc-lssq8 1/1 Running 0 28m
deployment-demo-5fddfb8ffc-r277n 1/1 Running 0 28m
deployment-demo-5fddfb8ffc-wrpjx 1/1 Running 0 28m
deployment-demo-5fddfb8ffc-zzwck 1/1 Running 0 28m
pod-with-nodeselector 1/1 Running 0 8m9s
[[email protected] Scheduler]# cat node-affinity-required-demo.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: node-affinity-required
namespace: default
spec:
replicas: 5
selector:
matchLabels :
app: demoapp
ctlr: node-affinity-required
template:
metadata:
labels :
app: demoapp
ctlr: node-affinity-required
spec:
containers :
- name: demoapp
image: ikubernetes/demoapp:v1.0
livenessProbe :
httpGet:
path: '/livez'
port: 80
initialDelaySeconds: 5
readinessProbe:
httpGet :
path: '/readyz'
port: 80
initialDelaySeconds: 15
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
-Matchexpressions: # match criteria
-Key: GPU # owns this tag
operator: Exists
- key: node-role. kubernetes. IO / Master # cannot satisfy the two conditions of master node at the same time
operator: DoesNotExist
[[email protected] Scheduler]# kubectl apply -f node-affinity-required-demo.yaml
[[email protected] Scheduler]# kubectl get pod
NAME READY STATUS RESTARTS AGE
node-affinity-required-5cb67df4b-d5nk6 0/1 Pending 0 3m51s
node-affinity-required-5cb67df4b-m6zxf 0/1 Pending 0 3m52s
node-affinity-required-5cb67df4b-sq5k9 0/1 Pending 0 3m51s
node-affinity-required-5cb67df4b-tvpwf 0/1 Pending 0 3m51s
node-affinity-required-5cb67df4b-vkx7j 0/1 Pending 0 3m52s
Pod with nodeselector 0 / 1 pending 0 31m #pod pending
[[email protected] Scheduler]# kubectl label node k8s-node2 gpu='true'
Node / k8s-node2 labeled # adds a label to the node to meet the criteria
[[email protected] Scheduler]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
node-affinity-required-5cb67df4b-d5nk6 0/1 ContainerCreating 0 5m14s <none> k8s-node2 <none> <none>
node-affinity-required-5cb67df4b-m6zxf 0/1 ContainerCreating 0 5m15s <none> k8s-node2 <none> <none>
node-affinity-required-5cb67df4b-sq5k9 0/1 ContainerCreating 0 5m14s <none> k8s-node2 <none> <none>
node-affinity-required-5cb67df4b-tvpwf 0/1 ContainerCreating 0 5m14s <none> k8s-node2 <none> <none>
node-affinity-required-5cb67df4b-vkx7j 0/1 ContainerCreating 0 5m15s <none> k8s-node2 <none> <none>
Example 2: nodeaffinity hard affinity
- requiredDuringSchedulingIgnoredDuringExecution
[[email protected] Scheduler]# cat node-affinity-and-resourcefits.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: node-affinity-and-resourcefits
namespace: default
spec:
replicas: 5
selector:
matchLabels:
app: demoapp
ctlr: node-affinity-and-resourcefits
template:
metadata:
labels:
app: demoapp
ctlr: node-affinity-and-resourcefits
spec:
containers:
- name: demoapp
image: ikubernetes/demoapp:v1.0
Resources: # preselection function: only nodes that meet the resource requirements will be given the following weight scores for optimization
requests:
cpu: 1000m
memory: 200Mi
livenessProbe:
httpGet:
path: '/livez'
port: 80
initialDelaySeconds: 5
readinessProbe:
httpGet:
path: '/readyz'
port: 80
initialDelaySeconds: 15
affinity:
nodeAffinity:
Requiredduringschedulingignoredduringexecution: # hard affinity
nodeSelectorTerms:
- matchExpressions:
- key: gpu
operator: Exists
Example 3: nodeaffinity soft affinity
- preferredDuringSchedulingIgnoredDuringExecution
[[email protected] Scheduler]# cat node-affinity-preferred-demo.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: node-affinity-preferred
spec:
replicas: 5
selector:
matchLabels:
app: demoapp
ctir: node-affinity-preferred
template:
metadata:
name: demoapp
labels:
app: demoapp
ctir: node-affinity-preferred
spec:
containers :
- name: demoapp
image: ikubernetes/demoapp:v1.0
Resources: # preselection function: only nodes that meet the resource requirements will be given the following weight scores for optimization
requests:
cpu: 100m
memory: 100Mi
affinity:
nodeAffinity:
Preferredduringschedulingignoredduringexecution: # soft affinity
- weight: 60
preference:
Matchexpressions: # with GPU tag plus 60 weight
- key: gpu
operator: Exists
- weight: 30
preference:
Matchexpressions: # including Foo and bar tags plus 30 weights
- key: region
operator: In
values: ["foo","bar"]
[[email protected] Scheduler]# kubectl apply -f node-affinity-preferred-demo.yaml
deployment.apps/node-affinity-preferred created
#Label nodes
[[email protected] ~]# kubectl label node k8s-node1.org gpu=2
node/k8s-node1.org labeled
[[email protected] ~]# kubectl label node k8s-node3.org region=foo
node/k8s-node3.org labeled
[[email protected] ~]# kubectl label node k8s-node2.org region=bar
node/k8s-node2.org labeled
[ [email protected] -Master ~]# kubectl get node - L GPU # node1 is GPU
NAME STATUS ROLES AGE VERSION
k8s-node1.org Ready <none> 47d v1.22.2
[ [email protected] -Master ~]# kubectl get node - L region #node2, node3 are
NAME STATUS ROLES AGE VERSION
k8s-node2.org Ready <none> 47d v1.22.2
k8s-node3.org Ready <none> 47d v1.22.2
[[email protected] Scheduler]# kubectl apply -f node-affinity-preferred-demo.yaml
deployment.apps/node-affinity-preferred created
- Theoretically, node1 has a higher weight, and all pods will run on node1
[[email protected] ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
details-v1-79f774bdb9-vjmll 2/2 Running 0 27d 10.244.42.13 k8s-node3.org <none> <none>
node-affinity-preferred-5579fd76bc-5hfd2 0/2 Init:0/1 0 15s <none> k8s-node1.org <none> <none>
node-affinity-preferred-5579fd76bc-gzhd6 0/2 Init:0/1 0 15s <none> k8s-node1.org <none> <none>
node-affinity-preferred-5579fd76bc-q8wrc 0/2 Init:0/1 0 15s <none> k8s-node1.org <none> <none>
node-affinity-preferred-5579fd76bc-v42sn 0/2 Init:0/1 0 15s <none> k8s-node1.org <none> <none>
node-affinity-preferred-5579fd76bc-vvc42 0/2 Init:0/1 0 15s <none> k8s-node1.org <none> <none>
productpage-v1-6b746f74dc-q564k 2/2 Running 0 27d 10.244.42.21 k8s-node3.org <none> <none>
ratings-v1-b6994bb9-vh57t 2/2 Running 0 27d 10.244.42.19 k8s-node3.org <none> <none>
reviews-v1-545db77b95-clh87 2/2 Running 0 27d 10.244.42.12 k8s-node3.org <none> <none>
reviews-v2-7bf8c9648f-hdbdl 2/2 Running 0 27d 10.244.42.9 k8s-node3.org <none> <none>
reviews-v3-84779c7bbc-4vzcz 2/2 Running 0 27d 10.244.42.17 k8s-node3.org <none> <none>