.. _prometheus_monitor_kubelet_controller-manager_scheduler: ================================================================== Prometheus监控Kubelet, kube-controller-manager 和 kube-scheduler ================================================================== 修订 ``kubelet`` 配置 ======================= 在排查 :ref:`prometheus_metrics_connect_refuse` 时可以看到,Kubelet的metrics监控采集并不是 ``connection refused`` 报错,而是 ``server returned HTTP status 403 Forbidden`` 而且我也发现,并不是所有节点都出现 ``403 Forbidden`` ,管控服务器 ``control001`` 到 ``control003`` 这3台服务器的kubelet是正常监控的(实际上工作节点采用了手工安装的定制 ``kubelet`` 软件包) .. figure:: ../../../_static/kubernetes/monitor/prometheus/prometheus_monitor_kubelet_403_forbidden.png :scale: 50 对比检查了生产环境,管控服务器采集没有问题: - 比较异常节点和正常节点 ``kubelet`` 运行参数: .. literalinclude:: prometheus_monitor_kubelet_controller-manager_scheduler/kubelet_parameter_ok :language: bash :caption: **能够** 通过 ``10250`` 端口访问 :ref:`metrics` 的 ``kubelet`` 运行参数 异常的节点采用了非常复杂的 ``kubelet`` 参数,其中影响的参数如下:: --authorization-mode=Webhook 这个参数配置在 ``/etc/systemd/system/kubelet.service.d/10-kubeadm.conf`` 中(如果你使用了 ``kubeadm`` 部署),对应配置: .. literalinclude:: prometheus_monitor_kubelet_controller-manager_scheduler/10-kubeadm.conf :language: bash :caption: 默认 ``kubeadm`` 部署 ``kubelet`` 配置了 ``--authorization-mode=Webhook`` 修订添加 ``--authentication-token-webhook=true`` ,即: .. literalinclude:: prometheus_monitor_kubelet_controller-manager_scheduler/10-kubeadm_fix.conf :language: bash :caption: 修订添加 ``--authentication-token-webhook=true`` 此外,部署中可能还有如下禁止 ``cadvisor-port`` 配置,也需要移除:: cadvisor-port=0 完成修订之后,需要重启 ``kubelet`` 服务 修复 ``kubelet`` 配置脚本 =========================== 综合以上操作,可以使用如下脚本来修正: .. literalinclude:: prometheus_monitor_kubelet_controller-manager_scheduler/fix_prometheus_monitor_configs :language: bash :caption: 修正 Kubelet, kube-controller-manager 和 kube-scheduler 配置,以便prometheus能够监控cadvisor 修订 ``kube-controller-manager`` 和 ``kube-scheduler`` 配置 ============================================================ ``kube-controller-manager`` 和 ``kube-scheduler`` 默认无法被 :ref:`prometheus` 监控是因为其默认 ``metrics`` 只在回环地址 ``127.0.0.1`` 上提供。由于 ``kubeadm`` 部署的管控服务都是采用 :ref:`static_pod` (通过 ``kubelet`` 确保 ``pod`` 始终运行),所以修订 ``/etc/kubernetes/manifest/`` 目录下对应配置: - ``/etc/kubernetes/manifest/kube-controller-manager.yaml`` :: ... - --bind-address=0.0.0.0 ... - --port=10252 - ``/etc/kubernetes/manifest/kube-scheduler.yaml`` :: ... - --bind-address=0.0.0.0 ... - --port=10251 参考 ======= - `Prometheus kubelet metrics server returned HTTP status 403 Forbidden `_ 以这篇文档为参考 - `How to Monitor the Kubelet `_ - `Cadvisor metrics scraping generates - HTTP server returned HTTP status 403 Forbidden #3941 `_ - `How To Fix "server returned HTTP status 403 Forbidden" in Prometheus `_