K8S节点异常,数据损坏?那个开源的集群主动巡检东西送给你
跟着 Kubernetes 的利用越来越普遍,集群规模越来越大,所带来的问题也逐步增加。集群组件不安康、磁盘损坏、节点 Not Ready等等问题闪开发及运维人员叫苦不迭,那么有没有什么东西可以帮忙各人更好的办理集群,降低办理风险呢?
今天 Gitee 介绍的 KubeEye 就是一款优良的 Kubernetes 集群主动巡检东西,若是你正在被若何办理集群所困扰,那就往下看看吧!
项目名称:KubeEye
项目做者:KubeSphere
开源答应协议:Apache-2.0
项目地址:https://gitee.com/kubesphere/kubeeye 项目简介KubeEye 是一款开源的 Kubernetes 集群主动巡检东西,旨在主动检测发现 Kubernetes 上的各类问题,好比应用设置装备摆设错误、集群组件不安康和节点问题,帮忙集群办理员更好地办理集群降低风险。
项目架构 利用场景KubeEye 能够发现你的集群控造平面的问题,包罗 kube-apiserver/kube-controller-manager/etcd 等。KubeEye 能够帮忙你检测各类节点问题,包罗内存/CPU/磁盘压力,不测的内核错误日记等。KubeEye 按照行业更佳理论验证你的工做负载 yaml 标准,帮忙你使你的集群不变。若何利用 KubeEye机器上安拆 KubeEye
从 Releases 中下载预构建的可施行文件。或者你也能够从源代码构建git clone https://gitee.com/kubesphere/kubeeye.git cd kubeeye make install[可选] 安拆 Node-problem-Detector
留意:那一行将在你的集群上安拆 npd,只要当你想要详细的陈述时才需要。
ke install npd
KubeEye 施行
root@node1:# ke diag NODENAME SEVERITY HEARTBEATTIME REASON MESSAGE node18 Fatal 2020-11-19T10:32:03+08:00 NodeStatusUnknown Kubelet stopped posting node status. node19 Fatal 2020-11-19T10:31:37+08:00 NodeStatusUnknown Kubelet stopped posting node status. node2 Fatal 2020-11-19T10:31:14+08:00 NodeStatusUnknown Kubelet stopped posting node status. node3 Fatal 2020-11-27T17:36:53+08:00 KubeletNotReady Container runtime not ready: RuntimeReady=false reason:DockerDaemonNotReady message:docker: failed to get docker version: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? NAME SEVERITY TIME MESSAGE scheduler Fatal 2020-11-27T17:09:59+08:00 Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused etcd-0 Fatal 2020-11-27T17:56:37+08:00 Get https://192.168.13.8:2379/health: dial tcp 192.168.13.8:2379: connect: connection refused NAMESPACE SEVERITY PODNAME EVENTTIME REASON MESSAGE default Warning node3.164b53d23ea79fc7 2020-11-27T17:37:34+08:00 ContainerGCFailed rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? default Warning node3.164b553ca5740aae 2020-11-27T18:03:31+08:00 FreeDiskSpaceFailed failed to garbage collect required amount of images. Wanted to free 5399374233 bytes, but freed 416077545 bytes default Warning nginx-b8ffcf679-q4n9v.16491643e6b68cd7 2020-11-27T17:09:24+08:00 Failed Error: ImagePullBackOff default Warning node3.164b5861e041a60e 2020-11-27T19:01:09+08:00 SystemOOM System OOM encountered, victim process: stress, pid: 16713 default Warning node3.164b58660f8d4590 2020-11-27T19:01:27+08:00 OOMKilling Out of memory: Kill process 16711 (stress) score 205 or sacrifice child Killed process 16711 (stress), UID 0, total-vm:826516kB, anon-rss:819296kB, file-rss:0kB, shmem-rss:0kB insights-agent Warning workloads-1606467120.164b519ca8c67416 2020-11-27T16:57:05+08:00 DeadlineExceeded Job was active longer than specified deadline kube-system Warning calico-node-zvl9t.164b3dc50580845d 2020-11-27T17:09:35+08:00 DNSConfigForming Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 100.64.11.3 114.114.114.114 119.29.29.29 kube-system Warning kube-proxy-4bnn7.164b3dc4f4c4125d 2020-11-27T17:09:09+08:00 DNSConfigForming Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 100.64.11.3 114.114.114.114 119.29.29.29 kube-system Warning nodelocaldns-2zbhh.164b3dc4f42d358b 2020-11-27T17:09:14+08:00 DNSConfigForming Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 100.64.11.3 114.114.114.114 119.29.29.29 NAMESPACE SEVERITY NAME KIND TIME MESSAGE kube-system Warning node-problem-detector DaemonSet 2020-11-27T17:09:59+08:00 [livenessProbeMissing runAsPrivileged] kube-system Warning calico-node DaemonSet 2020-11-27T17:09:59+08:00 [runAsPrivileged cpuLimitsMissing] kube-system Warning nodelocaldns DaemonSet 2020-11-27T17:09:59+08:00 [cpuLimitsMissing runAsPrivileged] default Warning nginx Deployment 2020-11-27T17:09:59+08:00 [cpuLimitsMissing livenessProbeMissing tagNotSpecified] insights-agent Warning workloads CronJob 2020-11-27T17:09:59+08:00 [livenessProbeMissing] insights-agent Warning cronjob-executor Job 2020-11-27T17:09:59+08:00 [livenessProbeMissing] kube-system Warning calico-kube-controllers Deployment 2020-11-27T17:09:59+08:00 [cpuLimitsMissing livenessProbeMissing] kube-system Warning coredns Deployment 2020-11-27T17:09:59+08:00 [cpuLimitsMissing]您能够参考常见FAQ内容来优化您的集群。
除此之外,KubeEye 还撑持添加自定义查抄规则,若是你想要领会更详细的项目信息,那就点击面的链接去项目主页看看吧:https://gitee.com/kubesphere/kubeeye
Tags: