Instant noodles are not delicious. I used this k8s scheduler and conquered him

Time:2021-7-28

1.1 introduction to scheduler

在这里插入图片描述

Let's pretend to be forced by Xiao LiuToday, let’s learn about K8Scheduler
在这里插入图片描述

ScheduleryesKubernetesThe main task of the scheduler is to definepodOn the nodes assigned to the cluster, the following issues need to be considered:

  • Fairness: how to ensure that each node can be allocated resources
  • Efficient utilization of resources: all resources in the cluster are used to the maximum extent
  • Efficiency: the performance of scheduling is good, and it can be used for mass production as soon as possiblepodComplete scheduling
  • Flexibility: allows users to control the scheduling logic according to their own needs

SchedulerIt runs as a separate program and will always be connected after startupapiserverobtainPodSpec.NodeNameEmptypod, for eachpodWill create onebinding, indicating thatpodWhich node should it be placed on.

1.2 dispatching process

在这里插入图片描述

The scheduling is divided into several parts:

PredicateA series of algorithms can be used:

  • PodFitsResources: whether the remaining resources on the node are greater thanpodRequested resources
  • PodFitsHost: ifpodSpecifiedNodeName, check whether the node name andNodeNamematching
  • PodFitsHostPorts: already used on nodeportWhether andpodAppliedportconflict
  • PodSelectorMatches: filter out andpoddesignatedlabelMismatched nodes
  • NoDiskConflict: alreadymountYesvolumeandpoddesignatedvolumeNo conflicts unless they are read-only

If inpredicateThere is no suitable node in the process,podWill always bependingStatus (waiting), and continuously retry scheduling until a node meets the conditions.

After this step, if multiple nodes meet the conditions, continueprioritiesProcedure: sort nodes by priority size. Priority consists of a series of key value pairs. The key is the name of the priority item and the value is its weight. These priority options include:

  • LeastRequestedPriority: by calculationCPUandMemoryThe lower the usage rate, the higher the weight. In other words, this priority indicator tends to nodes with lower resource utilization ratio
  • BalancedResourceAllocation: on nodeCPUandMemoryThe closer the usage, the higher the weight. This should be used with the above, not alone
  • ImageLocalityPriority: tend to have nodes to be mirrored. The larger the total size of the mirror, the higher the weight

All priority items and weights are calculated by the algorithm to get the final result.

1.3 custom scheduler

exceptK8SThe built-in scheduler can be customized. adoptspec:schedulernameParameter specifies the name of the scheduler, which can bepodSelect a scheduler to schedule. Like the followingpodchoicemy-schedulerSchedule instead of the defaultdefault-scheduler

apiVersion: v1
kind: Pod
metadata:
  name: annotation-second-scheduler
  labels:
    name: multischeduler-example
spec:
  schedulername: my-scheduler
  containers:
 - name: pod-with-second-annotation-container
    image: gcr.io/google_containers/pause:2.0

2.1 node affinity

spec.affinity.nodeAffinity

  • preferredDuringSchedulingIgnoredDuringExecution(priority execution plan): soft strategy
  • requiredDuringSchedulingIgnoredDuringExecution(required execution plan): hard strategy

Key value operation relationship:

Key description the value of inlabel is in a list. The value of notinlabel is not in a list. The value of gtlabel is greater than a value. The value of ltlabel is less than a value. Exists a label exists doesnotexist a label does not exist

Soft strategy:

[[email protected] schedule]
apiVersion: v1
kind: Pod
metadata:
  name: affinity
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-affinity
    image: hub.hc.com/library/myapp:v1
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - worker3
[[email protected] schedule]
NAME                      READY   STATUS    RESTARTS   AGE   IP            NODE      NOMINATED NODE   READINESS GATES
affinity                 1/1     Running   0          39s   10.244.2.92   worker2

Hard policy:

[[email protected] schedule]
apiVersion: v1
kind: Pod
metadata:
  name: affinity2
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-affinity
    image: hub.hc.com/library/myapp:v1
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - worker3

[[email protected] schedule]
NAME                      READY   STATUS    RESTARTS   AGE   IP            NODE      NOMINATED NODE   READINESS GATES
affinity2                  0/1     Pending   0          23s                          

[[email protected] schedule]
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  49s   default-scheduler  0/3 nodes are available: 3 node(s) didn't match node selector.

2.2 pod affinity

spec.affinity.podAffinity/podAntiAffinity

  • preferredDuringSchedulingIgnoredDuringExecution(priority execution plan): soft strategy
  • requiredDuringSchedulingIgnoredDuringExecution(required execution plan): hard strategy
[[email protected] schedule]
apiVersion: v1
kind: Pod
metadata:
  name: pod-2
  labels:
    app: pod-2
spec:
  containers:
  - name: pod-2
    image: hub.hc.com/library/myapp:v1
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - pod-1
        topologyKey: kubernetes.io/hostname
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - pod-2
          topologyKey: kubernetes.io/hostname

[[email protected] schedule]
pod-2                     0/1     Pending   0          4s                             

[[email protected] schedule]
NAME                      READY   STATUS    RESTARTS   AGE     IP            NODE      NOMINATED NODE   READINESS GATES
pod-1                     1/1     Running   0          5s      10.244.2.94   worker2

The comparison of affinity / anti affinity scheduling strategies is as follows:

Scheduling policy matching label operator topology domain supports scheduling target nodeaffinity host in, notin, exists, doesnotexist, GT, lt no specified host podaffinitypodin, notin, exists, doesnotexist is the same topology domain as the specified pod podanitaffinitypodin, notin, exists, doesnotexist is that the pod is not in the same topology domain as the specified pod

2.3 taint and acceleration

Node affinity, yespodAn attribute (preference or rigid requirement) that makespodAttracted to a specific class of nodes.TaintOn the contrary, it enables nodes to exclude a specific class of nodespod

TaintandtolerationMutual cooperation can be used to avoidpodIs assigned to an inappropriate node. One or more can be applied on each nodetaint, which means that for those who can’t tolerate thesetaintYespod, will not be accepted by this node. If willtolerationbe applied topodAbove means thesepodCan (but does not require) be scheduled to have a matchtaintOn the node.

① Stain(Taint)Composition of

usekubectl taintA command can be given to aNodeNode setting stain,NodeAfter being stained, andPodThere is a mutually exclusive relationship between them, which can makeNoderefusePodThe scheduling execution will evenNodeAlready existsPodThe composition of each stain expelled is as follows:key=value:effect

There is one for each stainkeyandvalueLabel as a stain, wherevalueCan be empty,effectDescribe the role of stains. currenttaint effectThe following three options are supported:

  • NoScheduleK8SWill notPodSchedule to a with the stainNodeupper
  • PreferNoScheduleK8SWill try to avoidPodSchedule to a with the stainNodeupper
  • NoExecuteK8SWill notPodSchedule to a with the stainNodeAt the same timeNodeAlready exists onPodExpel

② Setting, viewing and removal of stains

kubectl describe node node-name

kubectl taint nodes node1 key1=value1:effect

kubectl taint nodes node1 key1=value1:effect

TaintedNodeWill be based ontaintYeseffectandPodA mutually exclusive relationship between,PodWill not be scheduled to a certain extentNodeCome on. But we canPodSet tolerance on(Toleration), tolerance is setPodThe presence of stains can be tolerated and can be scheduled to those with stainsNodeCome on.

** tolerationConfiguration of:**

spec:
  tolerations:
    - key: "key1"
      operator: "Equal"
      value: "value1"
      effect: "NoSchedule"
      tolerationSeconds: 3600
    - key: "key1"
      operator: "Equal"
      value: "value1"
      effect: "NoExecute"
    - key: "key2"
      operator: "Exists"
      effect: "NoSchedule"

explain:

  • amongkeyvauleeffectTo be withNodeSet ontaintbring into correspondence with
  • operatorThe value of isExistsWill ignorevaluevalue
  • tolerationSeconds: whenPodIf you need to be expelled, you canPodTime to continue running on

① When not specifiedkeyValue indicates that all stains are toleratedkey

tolerations:
 - operator: "Exists"

② When not specifiedeffectValue indicates that all stain effects are tolerated

tolerations:
 - key: "key"
operator: "Exists"

③ There are multipleMasterTo prevent resource waste, you can set the following settings

kubectl taint nodes Node-Name node-role.kubernetes.io/master=:PreferNoSchedule

2.4 specify scheduling node

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: myweb
spec:
  replicas: 7
  template:
    metadata:
      labels:
        app: myweb
    spec:
      nodeName: worker1
      nodeSelector:
	    type: theSelected
      containers:
      - name: myweb
        image: hub.hc.com/library/myapp:v1
        ports:
        - containerPort: 80

explain:

  • spec.nodeNameWill:PodDirect scheduling to the specifiedNodeOn the node, it will be skippedSchedulerThe matching rule is mandatory matching
  • spec.nodeSelector: passedK8SYeslabel-selectorThe mechanism selects nodes, which are matched by the scheduler’s scheduling strategylabelAnd then scheduling.PodTo the target node, the matching rule is a mandatory constraint
  • toNodeLabeling:kubect; label node worker1 type=theSelected
    在这里插入图片描述
    Xiao Liu, go first
    在这里插入图片描述

Wechat search: Xiao Liu of the whole stack, get the PDF version of the article