VictoriaMetrics 实现多集群监控及告警

2022-03-24 00:00:00 创建 集群 部署 安装 即可

之前的文章 Prometheus+Thanos实现多集群监控及告警 介绍了使用Thanos实现多集群监控的方法,这里介绍下使用 VictoriaMetrics 进行多集群监控的方法

首先扔一篇 VictoriaMetircs 与 Thanos 的比较文章 https://my.oschina.net/u/4148359/blog/4531605 ,文章由VictoriaMetrics 核心开发者所著,所以可能会更倾向于 VictoriaMetrics,不过依照我的使用情况来看,VictoriaMetrics 确实更轻量,部署也比较简单,资源占用更少,比如之前使用 Thanos 的时候,只接入了两个集群,gateway组件给了10G的内存,还是经常被OOM重启。。

VictoriaMetrics 本身是一个高性能的时序数据库,并且天然支持Prometheus API,并拥有多种功能组件,如果是中小型集群监控的话,单机版足够用,也是本文使用的方式


安装组件
集群APrometheus-operator、Prometheus、alertmanager、grafana、kube-state-metrics、node-exporter、prometheus-adapter、victriametrics、vmalert
集群BPrometheus-operator、Prometheus、kube-state-metrics、node-exporter、prometheus-adapter

集群A部署

VictoriaMetrics

简单的方式是使用helm安装,具体安装方式可参考 https://github.com/VictoriaMetrics/helm-charts

这里贴一下单机版各资源文件方便理解

clusterrole.yaml

kind: ClusterRoleapiVersion: rbac.authorization.k8s.io/v1metadata:  name: victoria-metrics-single-clusterrole  namespace: monitoring  labels:    helm.sh/chart: victoria-metrics-single-0.6.1rules:  - apiGroups:      ['extensions']    resources:      ['podsecuritypolicies']    verbs:          ['use']    resourceNames:  [victoria-metrics-single]

serviceaccount.yaml

apiVersion: v1kind: ServiceAccountmetadata:  labels:    helm.sh/chart: victoria-metrics-single-0.6.1  name: victoria-metrics-single  namespace: monitoring

clusterrolebinding.yaml

kind: ClusterRoleBindingapiVersion: rbac.authorization.k8s.io/v1metadata:  name: victoria-metrics-single-clusterrolebinding  namespace: monitoring  labels:    helm.sh/chart: victoria-metrics-single-0.6.1subjects:  - kind: ServiceAccount    name: victoria-metrics-single    namespace: monitoringroleRef:  kind: ClusterRole  name: victoria-metrics-single-clusterrole  apiGroup: rbac.authorization.k8s.io

role.yaml

apiVersion: rbac.authorization.k8s.io/v1kind: Rolemetadata:  name: victoria-metrics-single  namespace: monitoring  labels:    helm.sh/chart: victoria-metrics-single-0.6.1rules:- apiGroups:      ['extensions']  resources:      ['podsecuritypolicies']  verbs:          ['use']  resourceNames:  [victoria-metrics-single]

rolebinding.yaml

apiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata:  name: victoria-metrics-single  namespace: monitoring  labels:    helm.sh/chart: victoria-metrics-single-0.6.1roleRef:  apiGroup: rbac.authorization.k8s.io  kind: Role  name: victoria-metrics-singlesubjects:- kind: ServiceAccount  name: victoria-metrics-single  namespace: monitoring

podsecuritypolicy.yaml

apiVersion: policy/v1beta1kind: PodSecurityPolicymetadata:  name: victoria-metrics-single  namespace: monitoring  labels:    helm.sh/chart: victoria-metrics-single-0.6.1  annotations:    seccomp.security.alpha.kubernetes.io/allowedProfileNames: 'docker/default'    seccomp.security.alpha.kubernetes.io/defaultProfileName:  'docker/default'spec:  privileged: false  allowPrivilegeEscalation: false  requiredDropCapabilities:    - ALL  volumes:    - 'configMap'    - 'emptyDir'    - 'projected'    - 'secret'    - 'downwardAPI'    - 'persistentVolumeClaim'  hostNetwork: false  hostIPC: false  hostPID: false  runAsUser:    rule: 'RunAsAny'  seLinux:    rule: 'RunAsAny'  supplementalGroups:    rule: 'RunAsAny'  fsGroup:    rule: 'RunAsAny'  readOnlyRootFilesystem: false

server-statefulset.yaml

apiVersion: apps/v1kind: StatefulSetmetadata:  namespace: monitoring  labels:    app: server    app.kubernetes.io/name: victoria-metrics-single    app.kubernetes.io/instance: victoria-single    helm.sh/chart: victoria-metrics-single-0.6.1  name: victoria-metrics-single-serverspec:  serviceName: victoria-metrics-single-server  selector:    matchLabels:      app: server      app.kubernetes.io/name: victoria-metrics-single      app.kubernetes.io/instance: victoria-single  replicas: 1  podManagementPolicy: OrderedReady  template:    metadata:      labels:        app: server        app.kubernetes.io/name: victoria-metrics-single        app.kubernetes.io/instance: victoria-single        helm.sh/chart: victoria-metrics-single-0.6.1    spec:      automountServiceAccountToken: true      containers:        - name: victoria-metrics-single-server          image: "victoriametrics/victoria-metrics:v1.45.0"          imagePullPolicy: "IfNotPresent"          args:            - "--retentionPeriod=1"            - "--storageDataPath=/storage"            - -dedup.minScrapeInterval=10s            - --envflag.enable=true            - --envflag.prefix=VM_            - --loggerFormat=json          ports:            - name: http              containerPort: 8428          livenessProbe:            initialDelaySeconds: 5            periodSeconds: 15            tcpSocket:              port: http            timeoutSeconds: 5          readinessProbe:            httpGet:              path: /health              port: http            initialDelaySeconds: 5            periodSeconds: 15            timeoutSeconds: 5          resources:            limits:              cpu: 1000m              memory: 5000Mi            requests:              cpu: 100m              memory: 512Mi          volumeMounts:            - name: server-volume              mountPath: /storage              subPath:       serviceAccountName: victoria-metrics-single      terminationGracePeriodSeconds: 60  volumeClaimTemplates:    - metadata:        name: server-volume      spec:        accessModes:          - ReadWriteOnce        resources:          requests:            storage: "30Gi"        storageClassName: "nfs-storage"

server-service-headless.yaml

apiVersion: v1kind: Servicemetadata:  namespace: monitoring  labels:    app: server    app.kubernetes.io/name: victoria-metrics-single    app.kubernetes.io/instance: victoria-single    helm.sh/chart: victoria-metrics-single-0.6.1  name: victoria-metrics-single-serverspec:  clusterIP: None  ports:    - name: http      port: 8428      protocol: TCP      targetPort: http  selector:    app: server    app.kubernetes.io/name: victoria-metrics-single    app.kubernetes.io/instance: victoria-single


Prometheus-operator

按照之前的文章 prometheus-operator 安装 使用kube-prometheus进行安装

需要修改的文件 prometheus-prometheus.yaml

添加如下内容

externalLabels:    cluster: cluster-a   # 查询的数据中,会包含cluster标签,用于区分集群数据remote_write:  - url: http://<victoriametrics-addr>:8428/api/v1/write

去掉如下内容

alerting:    alertmanagers:    - name: alertmanager-main      namespace: monitoring      port: web

因为不需要Prometheus和AlertManager直接通信,后面改用vmalert组件


VMAlert

这里使用operator的方式部署,参考 https://github.com/VictoriaMetrics/operator

1、下载operator资源文件

export VM_VERSION=`basename $(curl -fs -o/dev/null -w %{redirect_url} https://github.com/VictoriaMetrics/operator/releases/latest)`wget https://github.com/VictoriaMetrics/operator/releases/download/$VM_VERSION/bundle_crd.zipunzip  bundle_crd.zip

资源默认使用 monitoring-system 命名空间进行安装,可以使用如下命令修改为自定义空间

sed -i "s/namespace: monitoring-system/namespace: YOUR_NAMESPACE/g" release/operator/*

2、创建crd

kubectl apply -f release/crds

3、创建operator

kubectl apply -f release/operator/

4、创建vmalert.yaml

apiVersion: operator.victoriametrics.com/v1beta1kind: VMAlertmetadata:  name: vmalert  namespace: monitoringspec:  replicaCount: 1  image:    repository: victoriametrics/vmalert    tag: v1.40.0    pullPolicy: IfNotPresent  datasource:    url: "http://victoria-metrics-single-server:8428"  notifier:    url: "http://alertmanager-main:9093"  remoteWrite:    url: "http://victoria-metrics-single-server:8428"    flushInterval: 1m  remoteRead:    url: "http://victoria-metrics-single-server:8428"    lookback: 1h  evaluationInterval: "30s"  ruleSelector:    matchLabels:      prometheus: k8s      role: alert-rules

创建以上资源即可

grafana

添加数据源,选择类型同样是Prometheus,地址填写VictoriaMetrics的地址即可:http://victoria-metrics-single-server:8428

配置ingress将VictoriaMetrics服务暴露,方便其他集群接入

apiVersion: extensions/v1beta1kind: Ingressmetadata:  labels:    app: victoria-metircs  name: victoria-metircs-ingress  namespace: monitoringspec:  rules:  - host: victoria.abc.com    http:      paths:      - backend:          serviceName: victoria-metrics-single-server          servicePort: 8428        path: /

集群B部署

1、按照集群A的步骤编辑 prometheus-prometheus.yaml,只是remoteWrite需要修改为上面创建的ingress地址

remoteWrite:  - url: http://victoria.abc.com/api/v1/write

2、在coredns的配置中添加 victoria.abc.com 的解析

kubectl edit cm -n kube-system coredns
hosts { 172.16.27.115 victoria.abc.com fallthrough}

3、按照之前的文章 prometheus-operator 安装 即可,只需安装kube-state-metrics、node-exporter、prometheus、prometheus-adapter、prometheus-operator即可

至此基于VictoriaMetrics和Prometheus-operator的多集群监控就搭建完成了,再有其他集群想要接入的话,直接按照集群B的步骤操作即可

来源 https://mp.weixin.qq.com/s?__biz=MzAwNjMyMzYyNg==&mid=2247483931&idx=1&sn=65ba5f3519ca5c7ad1c59b6d1e2894c8&chksm=9b0e6edeac79e7c894879270bb7b841f7abba8ab2ff0a6b3103599760055646f81bcc0d98a92&mpshare=1&scene=1&srcid=03244DxSQTg0xNkG4BzayHaC&sharer_sharetime=1648104370587&sharer_shareid=b37f669367a61860107a8412c2f3cc74#rd

相关文章