VictoriaMetrics 实现多集群监控及告警
之前的文章 Prometheus+Thanos实现多集群监控及告警 介绍了使用Thanos实现多集群监控的方法,这里介绍下使用 VictoriaMetrics 进行多集群监控的方法
首先扔一篇 VictoriaMetircs 与 Thanos 的比较文章 https://my.oschina.net/u/4148359/blog/4531605 ,文章由VictoriaMetrics 核心开发者所著,所以可能会更倾向于 VictoriaMetrics,不过依照我的使用情况来看,VictoriaMetrics 确实更轻量,部署也比较简单,资源占用更少,比如之前使用 Thanos 的时候,只接入了两个集群,gateway组件给了10G的内存,还是经常被OOM重启。。
VictoriaMetrics 本身是一个高性能的时序数据库,并且天然支持Prometheus API,并拥有多种功能组件,如果是中小型集群监控的话,单机版足够用,也是本文使用的方式
安装组件 | |
---|---|
集群A | Prometheus-operator、Prometheus、alertmanager、grafana、kube-state-metrics、node-exporter、prometheus-adapter、victriametrics、vmalert |
集群B | Prometheus-operator、Prometheus、kube-state-metrics、node-exporter、prometheus-adapter |
集群A部署
VictoriaMetrics
简单的方式是使用helm安装,具体安装方式可参考 https://github.com/VictoriaMetrics/helm-charts
这里贴一下单机版各资源文件方便理解
clusterrole.yaml
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: victoria-metrics-single-clusterrole
namespace: monitoring
labels:
helm.sh/chart: victoria-metrics-single-0.6.1
rules:
- apiGroups: ['extensions']
resources: ['podsecuritypolicies']
verbs: ['use']
resourceNames: [victoria-metrics-single]
serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
helm.sh/chart: victoria-metrics-single-0.6.1
name: victoria-metrics-single
namespace: monitoring
clusterrolebinding.yaml
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: victoria-metrics-single-clusterrolebinding
namespace: monitoring
labels:
helm.sh/chart: victoria-metrics-single-0.6.1
subjects:
- kind: ServiceAccount
name: victoria-metrics-single
namespace: monitoring
roleRef:
kind: ClusterRole
name: victoria-metrics-single-clusterrole
apiGroup: rbac.authorization.k8s.io
role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: victoria-metrics-single
namespace: monitoring
labels:
helm.sh/chart: victoria-metrics-single-0.6.1
rules:
- apiGroups: ['extensions']
resources: ['podsecuritypolicies']
verbs: ['use']
resourceNames: [victoria-metrics-single]
rolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: victoria-metrics-single
namespace: monitoring
labels:
helm.sh/chart: victoria-metrics-single-0.6.1
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: victoria-metrics-single
subjects:
- kind: ServiceAccount
name: victoria-metrics-single
namespace: monitoring
podsecuritypolicy.yaml
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: victoria-metrics-single
namespace: monitoring
labels:
helm.sh/chart: victoria-metrics-single-0.6.1
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: 'docker/default'
seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default'
spec:
privileged: false
allowPrivilegeEscalation: false
requiredDropCapabilities:
- ALL
volumes:
- 'configMap'
- 'emptyDir'
- 'projected'
- 'secret'
- 'downwardAPI'
- 'persistentVolumeClaim'
hostNetwork: false
hostIPC: false
hostPID: false
runAsUser:
rule: 'RunAsAny'
seLinux:
rule: 'RunAsAny'
supplementalGroups:
rule: 'RunAsAny'
fsGroup:
rule: 'RunAsAny'
readOnlyRootFilesystem: false
server-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
namespace: monitoring
labels:
app: server
app.kubernetes.io/name: victoria-metrics-single
app.kubernetes.io/instance: victoria-single
helm.sh/chart: victoria-metrics-single-0.6.1
name: victoria-metrics-single-server
spec:
serviceName: victoria-metrics-single-server
selector:
matchLabels:
app: server
app.kubernetes.io/name: victoria-metrics-single
app.kubernetes.io/instance: victoria-single
replicas: 1
podManagementPolicy: OrderedReady
template:
metadata:
labels:
app: server
app.kubernetes.io/name: victoria-metrics-single
app.kubernetes.io/instance: victoria-single
helm.sh/chart: victoria-metrics-single-0.6.1
spec:
automountServiceAccountToken: true
containers:
- name: victoria-metrics-single-server
image: "victoriametrics/victoria-metrics:v1.45.0"
imagePullPolicy: "IfNotPresent"
args:
- "--retentionPeriod=1"
- "--storageDataPath=/storage"
- -dedup.minScrapeInterval=10s
- --envflag.enable=true
- --envflag.prefix=VM_
- --loggerFormat=json
ports:
- name: http
containerPort: 8428
livenessProbe:
initialDelaySeconds: 5
periodSeconds: 15
tcpSocket:
port: http
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 5
periodSeconds: 15
timeoutSeconds: 5
resources:
limits:
cpu: 1000m
memory: 5000Mi
requests:
cpu: 100m
memory: 512Mi
volumeMounts:
- name: server-volume
mountPath: /storage
subPath:
serviceAccountName: victoria-metrics-single
terminationGracePeriodSeconds: 60
volumeClaimTemplates:
- metadata:
name: server-volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "30Gi"
storageClassName: "nfs-storage"
server-service-headless.yaml
apiVersion: v1
kind: Service
metadata:
namespace: monitoring
labels:
app: server
app.kubernetes.io/name: victoria-metrics-single
app.kubernetes.io/instance: victoria-single
helm.sh/chart: victoria-metrics-single-0.6.1
name: victoria-metrics-single-server
spec:
clusterIP: None
ports:
- name: http
port: 8428
protocol: TCP
targetPort: http
selector:
app: server
app.kubernetes.io/name: victoria-metrics-single
app.kubernetes.io/instance: victoria-single
Prometheus-operator
按照之前的文章 prometheus-operator 安装 使用kube-prometheus进行安装
需要修改的文件 prometheus-prometheus.yaml
添加如下内容
externalLabels:
cluster: cluster-a # 查询的数据中,会包含cluster标签,用于区分集群数据
remote_write:
- url: http://<victoriametrics-addr>:8428/api/v1/write
去掉如下内容
alerting:
alertmanagers:
- name: alertmanager-main
namespace: monitoring
port: web
因为不需要Prometheus和AlertManager直接通信,后面改用vmalert组件
VMAlert
这里使用operator的方式部署,参考 https://github.com/VictoriaMetrics/operator
1、下载operator资源文件
export VM_VERSION=`basename $(curl -fs -o/dev/null -w %{redirect_url} https://github.com/VictoriaMetrics/operator/releases/latest)`
wget https://github.com/VictoriaMetrics/operator/releases/download/$VM_VERSION/bundle_crd.zip
unzip bundle_crd.zip
资源默认使用 monitoring-system 命名空间进行安装,可以使用如下命令修改为自定义空间
sed -i "s/namespace: monitoring-system/namespace: YOUR_NAMESPACE/g" release/operator/*
2、创建crd
kubectl apply -f release/crds
3、创建operator
kubectl apply -f release/operator/
4、创建vmalert.yaml
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAlert
metadata:
name: vmalert
namespace: monitoring
spec:
replicaCount: 1
image:
repository: victoriametrics/vmalert
tag: v1.40.0
pullPolicy: IfNotPresent
datasource:
url: "http://victoria-metrics-single-server:8428"
notifier:
url: "http://alertmanager-main:9093"
remoteWrite:
url: "http://victoria-metrics-single-server:8428"
flushInterval: 1m
remoteRead:
url: "http://victoria-metrics-single-server:8428"
lookback: 1h
evaluationInterval: "30s"
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
创建以上资源即可
grafana
添加数据源,选择类型同样是Prometheus,地址填写VictoriaMetrics的地址即可:http://victoria-metrics-single-server:8428
配置ingress将VictoriaMetrics服务暴露,方便其他集群接入
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
labels:
app: victoria-metircs
name: victoria-metircs-ingress
namespace: monitoring
spec:
rules:
- host: victoria.abc.com
http:
paths:
- backend:
serviceName: victoria-metrics-single-server
servicePort: 8428
path: /
集群B部署
1、按照集群A的步骤编辑 prometheus-prometheus.yaml
,只是remoteWrite需要修改为上面创建的ingress地址
remoteWrite:
- url: http://victoria.abc.com/api/v1/write
2、在coredns的配置中添加 victoria.abc.com 的解析
kubectl edit cm -n kube-system coredns
hosts {
172.16.27.115 victoria.abc.com
fallthrough
}
3、按照之前的文章 prometheus-operator 安装 即可,只需安装kube-state-metrics、node-exporter、prometheus、prometheus-adapter、prometheus-operator即可
至此基于VictoriaMetrics和Prometheus-operator的多集群监控就搭建完成了,再有其他集群想要接入的话,直接按照集群B的步骤操作即可
来源 https://mp.weixin.qq.com/s?__biz=MzAwNjMyMzYyNg==&mid=2247483931&idx=1&sn=65ba5f3519ca5c7ad1c59b6d1e2894c8&chksm=9b0e6edeac79e7c894879270bb7b841f7abba8ab2ff0a6b3103599760055646f81bcc0d98a92&mpshare=1&scene=1&srcid=03244DxSQTg0xNkG4BzayHaC&sharer_sharetime=1648104370587&sharer_shareid=b37f669367a61860107a8412c2f3cc74#rd
相关文章