.. _labels_and_selectors: ====================================== 标签和选择器(labels and selectors) ====================================== Node的众所周知标签、申明、瑕疵 =============================== 虽然标签可以任意命名,只要能够正确选择即可。但是, `kubernetes保留了一些众所周知的标签(Labels),申明(Annotations)和瑕疵(Taints) `_ ,例如: - ``kubernetes.io/arch`` 这个参数在Go语言中定义为 ``runtime.GOARCH`` 可以用于混合arm和x86节点的集群 - ``kubernetes.io/os`` 这个参数在Go语言中定义为 ``runtime.GOOS`` ,在混合不同操作系统(例如Linux和Windows节点)时使用 - ``kubernetes.io/hostname`` 需要注意hostname可以在 ``kubelet`` 传递参数 ``--hostname-override`` 参数覆盖 - ``node.kubernetes.io/instance-type`` 在 ``cloudprovider`` 中定义虚拟机规格类型,这个是云计算常用的规格,例如 ``g2.2xlarge`` , ``m3.medium`` 等等 - ``topology.kubernetes.io/zone`` 这个标签是云计算厂商用于标记不同zone的机房拓扑,例如 ``topology.kubernetes.io/zone=us-east-1c`` 。这个标签在 ``节点`` (Node) 和 ``持久化卷`` (PersistentVolume)非常有用。 创建标签和删除标签 =================== - 创建节点标签:: kubectl label node = - 删除节点标签:: kubectl label node - - 删除多个节点标签:: kubectl label - - 删除所有节点标签( **慎用** ):: kubectl label --all - 标签实践 ========= - 首先需要准备一个Kubernetes集群,我采用 :ref:`arm_k8s` :: kubectl get nodes -o wide --show-labels 这里可以看到:: NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME LABELS jetson Ready 61d v1.20.2 192.168.6.10 Ubuntu 18.04.5 LTS 4.9.140-tegra docker://19.3.6 beta.kubernetes.io/arch=arm64,beta.kubernetes.io/os=linux,kubernetes.io/arch=arm64,kubernetes.io/hostname=jetson,kubernetes.io/os=linux pi-master1 Ready master 68d v1.20.2 192.168.6.11 Ubuntu 20.04.2 LTS 5.4.0-1028-raspi docker://19.3.8 beta.kubernetes.io/arch=arm64,beta.kubernetes.io/os=linux,kubernetes.io/arch=arm64,kubernetes.io/hostname=pi-master1,kubernetes.io/os=linux,node-role.kubernetes.io/master= pi-worker1 Ready 65d v1.20.2 192.168.6.15 Ubuntu 20.04.2 LTS 5.4.0-1028-raspi docker://19.3.8 beta.kubernetes.io/arch=arm64,beta.kubernetes.io/os=linux,kubernetes.io/arch=arm64,kubernetes.io/hostname=pi-worker1,kubernetes.io/os=linux pi-worker2 Ready 65d v1.20.2 192.168.6.16 Ubuntu 20.04.2 LTS 5.4.0-1028-raspi docker://19.3.8 beta.kubernetes.io/arch=arm64,beta.kubernetes.io/os=linux,kubernetes.io/arch=arm64,kubernetes.io/hostname=pi-worker2,kubernetes.io/os=linux 这里可以看到重复的标签,例如 ``beta.kubernetes.io/arch=arm64`` 和 ``kubernetes.io/arch=arm64`` 应该和我最初部署Kubernetes版本较低,逐步升级到最新版本,向后兼容。 - 树莓派worker节点配置了SSD硬盘,所以添加标签:: kubectl label nodes pi-worker1 disktype=ssd kubectl label nodes pi-worker2 disktype=ssd - Jetson节点配置了GPU,所以添加标签:: kubectl label nodes jetson model=gpu - 现在我们deploy时候,指定节点选择:: ... spec: nodeSelector: disktype: ssd ``nodeSelector`` 调度失败排查 ================================= ``svc`` 需要配置 ``selector`` ------------------------------- - 注意,并不是只要设置 ``nodeSelector`` 就可以完成调度选择,如果没有在deployment中配置 ``svc`` 指定的pod ``selector`` 就会导致以下错误:: error: error validating "deployment_arm.yaml": error validating data: ValidationError(Deployment.spec): missing required field "selector" in io.k8s.api.apps.v1.DeploymentSpec; if you choose to ignore these errors, turn validation off with --validate=false 这是因为,不仅pod需要通过标签能够选择节点,服务 ``svc`` 也需要通过标签来选择pod。所以需要注意 ``deployment.yaml`` 包含以下内容:: apiVersion: apps/v1 kind: Deployment metadata: labels: app.kubernetes.io/name: onesre-core ... spec: selector: matchLabels: app.kubernetes.io/name: onesre-core 这样对应的 ``svc.yaml`` 中配置:: apiVersion: v1 kind: Service metadata: app.kubernetes.io/name: onesre-core ... spec: type: ClusterIP clusterIP: None selector: app.kubernetes.io/name: onesre-core 节点 ``taint`` 会阻止 ``nodeSelector`` ----------------------------------------- 在实际生产部署中,有可能遇到调度失败情况。例如,我在 :ref:`helm3_prometheus_grafana` 想通过配置 ``nodeSelector`` 来指定将监控相关服务调度到专用服务器: - 为监控专用服务器打标: .. literalinclude:: labels_and_selectors/label_node_prometheus :language: bash :caption: 为监控服务器label以便监控组件调度到专用服务器 - 修订 ``deployment`` ,指定组件调度到上述标签节点 ``kubectl edit deployments stable-grafana`` ,配置内容如下: .. literalinclude:: labels_and_selectors/config_nodeselector :language: bash :caption: 配置 ``stable-grafana`` deployments,指定 ``nodeSelector`` 到 ``telemetry: prometheus`` 标签节点 修订完成后检查:: kubectl get pods -o wide -A | grep grafa 但是发现并没有迁移成功(依然运行在 ``old-mon-serv`` 服务器,并没有调度到我期望的 ``mon-serv`` ):: # kubectl get pods -o wide -A | grep grafa default stable-grafana-6449bcb69b-rqhwz 3/3 Running 0 22h 10.233.125.1 old-mon-serv default stable-grafana-c4465d9cb-kgwp6 0/3 Pending 0 2m28s - 检查:: kubectl get pods stable-grafana-c4465d9cb-kgwp6 -o yaml 可以看到调度失败原因是 ``taint`` :: status: conditions: - lastProbeTime: null lastTransitionTime: "2023-03-30T12:28:14Z" message: '0/43 nodes are available: 1 node(s) had taint {node.k8s.xxx.com/initial: }, that the pod didn''t tolerate, 42 node(s) didn''t match node selector.' reason: Unschedulable status: "False" type: PodScheduled phase: Pending qosClass: BestEffort - 检查 ``mon-serv`` 服务器 ``Tains`` 情况 :: # kubectl describe node mon-serv | grep 'Taints' Taints: node.k8s.xxx.com/initial:NoSchedule 原因是新部署服务器为了避免验收未通过情况,默认先打标了 ``NoSchedule`` 的 ``Taints`` ,需要清理:: kubectl taint nodes mon-serv node.k8s.xxx.com/initial:NoSchedule- 参考 ======= - `Labels and Selectors `_ - `ValidationError: missing required field “selector” in io.k8s.api.v1.DeploymentSpec `_ - `How to delete a node label by command and api? `_ - `How to Add or Remove Labels to Nodes in Kubernetes `_