Prometheus 监控实战：架构解析与告警配置

2025-06-23 01:07:47

Prometheus是强大的开源监控工具，能够实现数据采集、数据加工、可视化展示以及告警推送一系列端到端的监控处理流程。本文简要介绍Prometheus的架构原理，以及从采集到展示到告警推送这一系列流程的实现。

1、Prometheus基本架构及原理

Prometheus是一款开源的监控工具，它的基本实现原理是从exporter拉取数据，或者间接地通过网关gateway拉取数据（如果在k8s内部署，可以使用服务发现的方式），它默认本地存储抓取的所有数据，并通过一定规则进行清理和整理数据，并把得到的结果存储到新的时间序列中，采集到的数据有两个去向，一个是报警，另一个是可视化。以下是Prometheus最新版本的架构图：

图片1080×648 42.3 KB

Prometheus具有以下特点：

提供多维度数据模型和灵活的查询语言：通过将监控指标关联多个Tag将监控数据进行任意维度的组合；提供HTTP查询接口；可以很方便的结合Grafana等组件展示数据。

支持服务器节点的本地存储：通过prometheus自带的时序数据库，可以完成每秒千万级的数据存储。同时在保存大量历史数据的场景中，prometheus还可以对接第三方时序数据库如OpenTSDB等。

定义了开放指标数据标准：支持pull和push两种方式的数据采集，以基于HTTP的Pull方式采集时序数据，只有实现了prometheus监控数据格式才可以被prometheus采集；以Push方式向中间网关推送时序数据，能更灵活地应对各种监控场景。

支持通过静态文件配置和动态发现机制发现监控对象，自动完成数据采集。prometheus目前已经支持Kubernetes、Consul等多种服务发现机制，可以减少运维人员的手动配置环节。

1.1 组件介绍

从Prometheus的架构图中可以看到，Prometheus主要有四大组件Prometheus Server、Push gateway、Exporters和Alertmanager，分别如下：

Prometheus Server：负责从Exporter拉取和存储监控数据，根据告警规则产生告警并发送给Alertmanager，并提供一套灵活的查询语言PromQL

Exporters/Jobs：Prometheus的数据采集组件，负责收集目标对象（host, container…）的性能数据，并通过HTTP接口提供给Prometheus Server。支持数据库、硬件、消息中间件、存储系统、http服务器、jmx等。

Short-lived jobs：瞬时任务的场景，无法通过pull方式拉取，需要使用push方式，与PushGateway搭配使用

PushGateway：应对部分push场景的组件可选组件，这部分监控数据先推送到Push Gateway上，然后再由Prometheus Server端拉取。用于存在时间较短，可能在Prometheus来拉取之前就消失了的 jobs

Alertmanager：从Prometheus server端接收到alerts后，会基于PromQL的告警规则分析数据，如果满足PromQL定义的规则，则会产生一条告警，并发送告警信息到Alertmanager，Alertmanager则是根据配置处理告警信息并发送。常见的接收方式有：电子邮件，pagerduty，OpsGenie, webhook 等。

Service Discovery：Prometheus支持多种服务发现机制：文件、DNS、Consul、Kubernetes、OpenStack、EC2等等。基于服务发现的过程是通过第三方提供的接口，Prometheus查询到需要监控的Target列表，然后轮训这些Target获取监控数据。

Prometheus的工作流程是：

Prometheus server定期从配置好的 jobs 或者exporters中拉metrics，或者接收来自Pushgateway发过来的metrics，或者从其他的Prometheus server中拉 metrics；

Prometheus server将收集到的metrics数据存储到本地，并运行已定义好的alert.rules，记录新的时间序列或者向Alertmanager推送警报；

Alertmanager根据配置文件，对接收到的警报进行处理，发出告警；

对采集的数据进行可视化展示

1.2 存储机制

Prometheus以时间序列的方式将数据存储在本地硬盘，按照两个小时为一个时间窗口，将两小时内产生的数据存储在一个块(Block)中，每一个块又分为多个chunks，其中包含该时间窗口内的所有样本数据(chunks)，元数据文件(meta.json)以及索引文件(index)。

图片661×273 15.2 KB

当前时间窗口内正在收集的样本数据会直接保存在内存当中，达到2小时后写入磁盘，这样可以提高Prometheus的查询效率。为了防止程序崩溃导致数据丢失，实现了WAL（write-ahead-log）机制，启动时会以写入日志(WAL)的方式来实现重播，从而恢复数据。此期间如果通过API删除时间序列，删除记录也会保存在单独的逻辑文件当中(tombstone)，而不是立即从chunk文件中删除。

Prometheus中的data目录数据如下所示：

1.3 Prometheus数据采集方式

Prometheus有两种数据采集方式：pull主动拉取和push被动推送

pull：指的是客户端先安装各类已有的exporters并以守护进程的模式运行。Explorter采集数据并且可以对http请求作出响应，返回metrics数据。Prometheus通过pull的方式（HTTP_GET）去访问每个节点上的exporter并返回需要的数据。

push：指的是客户端（或服务端）安装官方的pushgateway插件，然后通过自行编写的脚本，将监控数据组织成metrics的形式发送给pushgateway，而后pushgateway再推送给prometheus，这里需要注意的是pushgateway只是一个中间转发的媒介

Prometheus主要使用Pull的方式，通过HTTP协议去采集数据。总体来说Pull方式比push更好，Pull和Push两种方式对比如下：

图片718×216 19.4 KB

1.4 Prometheus中metrics类型

Prometheus中主要有以下metrics类型：

Gauges：仪表盘类型，可增可减，如CPU使用率，内存使用率，集群节点个数，大部分监控数据都是这种类型的

Counters：计数器类型，只增不减，如机器的启动时间，HTTP访问量等。机器重启不会置零，在使用这种指标类型时，通常会结合rate()方法获取该指标在某个时间段的变化率

Histograms：柱状图，用于观察结果采样，分组及统计，如：请求持续时间，响应大小。其主要用于表示一段时间内对数据的采样，并能够对其指定区间及总数进行统计。

Summary：类似Histogram，用于表示一段时间内数据采样结果，其直接存储quantile数据，而不是根据统计区间计算出来的。不需要计算，直接存储结果。

2、Prometheus环境部署

2.1 二进制方式部署

Prometheus下载链接：https://prometheus.io/download/

1）下载安装包并解压

[root@tango-centos01 src]# tar -xzvf prometheus-2.24.1.linux-amd64.tar.gz -C /usr/local/ [root@tango-centos01 local]# mv prometheus-2.24.1.linux-amd64 prometheus-2.24.1

2）配置自我监控

[root@tango-centos01 prometheus-2.24.1]# vi prometheus.yml global: scrape_interval: 15s # 全局配置，默认15s收集一次数据. # 配置外部标签 external_labels: monitor: 'codelab-monitor'# 监控配置scrape_configs: # 监控任务名称，KV形式. - job_name: 'prometheus' # 覆盖前面的全局配置，以5s收集一次数据. scrape_interval: 5s # 目标监控主机和收集数据的端口 static_configs: - targets: ['localhost:9090']

3）启动prometheus

[root@tango-centos01 prometheus-2.24.1]# ./prometheus & [root@tango-centos01 prometheus-2.24.1]# netstat -tulnp |grep 9090tcp6 0 0 :::9090 :::* LISTEN 1781/./prometheus

上面可以看到监听了9090端口，即可通过localhost:9090/metrics来获取指标数据，也可以通过浏览器直接访问localhost:9090通过web界面来查看数据

图片1080×336 36.9 KB

访问http://192.168.112.101:9090/metrics

图片935×437 17 KB

配置了9090端口，默认prometheus会抓取自己的/metrics接口，在Graph选项已经可以看到监控的数据

图片1080×310 53.2 KB

2.2 容器化部署

1）创建配置文件/usr/local/Prometheus/prometheus.yml

global: scrape_interval: 15s # 默认抓取间隔, 15秒向目标抓取一次数据。 external_labels: monitor: 'codelab-monitor'# 这里表示抓取对象的配置scrape_configs:#这个配置是表示在这个配置内的时间序例，每一条都会自动添加上这个{job_name:"prometheus"}的标签 - job_name: 'prometheus' scrape_interval: 5s # 重写了全局抓取间隔时间，由15秒重写成5秒 static_configs: - targets: ['localhost:9090']

2）使用docker运行

[root@tango-centos02 prometheus]# docker rm -f prometheus[root@tango-centos02 prometheus]# docker run --name=prometheus -d \-p 9090:9090 \-v /usr/local/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml \prom/prometheus:v2.24.1 \--config.file=/etc/prometheus/prometheus.yml \--web.enable-lifecycle[root@tango-centos02 prometheus]# docker psCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMESbfd2a1a0957a prom/prometheus:v2.24.1 "/bin/prometheus --câ€¦" 8 seconds ago Up 6 seconds 9090/tcp, 0.0.0.0:9090->9090/tcp prometheus

启动时加上--web.enable-lifecycle启用远程热加载配置文件

3）访问http://192.168.112.102:9090，我们会看到如下界面

图片1080×268 29.3 KB

3、Prometheus监控配置

3.1 Node exporter

Prometheus为了支持各种中间件以及第三方的监控提供了exporter，大家可以把它理解成监控适配器，将不同指标类型和格式的数据统一转化为Prometheus能够识别的指标类型。社区中常用的exporter如下：

图片761×427 35 KB

来源：https://yunlzheng.gitbook.io/prometheus-book

比如Node exporter主要通过读取Linux的/proc以及/sys目录下的系统文件获取操作系统运行状态，reids exporter通过Reids命令行获取指标，mysql exporter通过读取数据库监控表获取MySQL的性能数据。他们将这些异构的数据转化为标准的Prometheus格式，并提供HTTP查询接口。

3.1.1 Node Exporter安装配置

Node Exporter同样采用Golang编写，并且不存在任何的第三方依赖，只需要下载，解压即可运行。可以从https://prometheus.io/download/获取最新的node exporter版本的二进制包，默认启动后监听9100端口。

1）下载并安装

[root@tango-centos01 src]# tar -xzvf node_exporter-1.1.0.linux-amd64.tar.gz -C /usr/local/prometheus/ [root@tango-centos01 prometheus]# mv node_exporter-1.1.0.linux-amd64/ node_exporter-1.1.0

2）启动node_exporter

[root@node02 ~]# cd /usr/local/node_exporter[root@tango-centos01 node_exporter-1.1.0]# nohup ./node_exporter --web.listen-address=":9101" & [root@tango-centos01 node_exporter-1.1.0]# netstat -tulnp |grep 9101tcp6 0 0 :::9101 :::* LISTEN 2842/./node_exporte

node_exporte基本信息配置如下：

--web.listen-address=":9100"：#node_exporter监听的端口，默认是9100，若需要修改则通过此参数。

--web.telemetry-path="/metrics"：#获取metric信息的url，默认是/metrics，若需要修改则通过此参数

--log.level="info"：#设置日志级别

--log.format="logger:stderr"：#设置打印日志的格式，若有自动化日志提取工具可以使用这个参数规范日志打印的格式

3）访问http://192.168.112.101:9101/可以看到以下页面：

图片887×210 10.5 KB

3.1.2 Node Exporter监控指标

访问 http://192.168.112.101:9101/metrics，可以看到当前node exporter获取到的当前主机的所有监控数据，如下所示：

图片1080×241 53.4 KB

每一个监控指标之前都会有一段类似于如下形式的信息：

# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.# TYPE node_cpu_seconds_total counternode_cpu_seconds_total{cpu="0",mode="idle"} 11138.6node_cpu_seconds_total{cpu="0",mode="iowait"} 341.27node_cpu_seconds_total{cpu="0",mode="irq"} 0node_cpu_seconds_total{cpu="0",mode="nice"} 0.03node_cpu_seconds_total{cpu="0",mode="softirq"} 13.5node_cpu_seconds_total{cpu="0",mode="steal"} 0node_cpu_seconds_total{cpu="0",mode="system"} 240.35node_cpu_seconds_total{cpu="0",mode="user"} 333.54

其中HELP用于解释当前指标的含义，TYPE则说明当前指标的数据类型。除了这些以外，在当前页面中根据物理主机系统的不同，你还可能看到如下监控指标：

node_boot_time：系统启动时间

node_cpu：系统CPU使用量

nodedisk*：磁盘IO

nodefilesystem*：文件系统用量

node_load1：系统负载

nodememeory*：内存使用量

nodenetwork*：网络带宽

node_time：当前系统时间

go_*：node exporter中go相关指标

process_*：node exporter自身进程相关运行指标

3.1.3 从Node Exporter收集监控数据

1）node_exporter默认监听9100端口，在服务端增加被监控的目标主机，即可通过客户端的node_exporter采集数据，如下：

scrape_configs: # The job name is added as a label `job=` to any timeseries scraped from this config. - job_name: 'node' scrape_interval: 5s # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['192.168.112.101:9101']~

2）重启prometheus后生效，在Prometheus的web端可以通过http://192.168.112.101:9090/graph上的查询搜索框进行查询对应监控节点的负载值，如图：

图片1080×538 44.8 KB

3.2 Pushgateway

pushgateway是另一种数据采集的方式，采用被动推送来获取监控数据的prometheus插件，它可以单独运行在任何节点上，并不一定要运行在被监控的客户端。而后通过用户自定义的脚本把需要监控的数据发送给pushgateway，pushgateway再将数据推送给prometheus server。

3.2.1 Pushgateway的安装

官方下载地址：https://prometheus.io/download/#pushgateway

1）下载并解压安装包

[root@tango-centos01 src]# tar -xzvf pushgateway-1.4.0.linux-amd64.tar.gz -C /usr/local/prometheus/[root@tango-centos01 src]# mv /usr/local/prometheus/pushgateway-1.4.0.linux-amd64/ /usr/local/prometheus/pushgateway-1.4.0

2）运行pushgateway

[root@tango-centos01 pushgateway-1.4.0]# nohup ./pushgateway &level=info ts=2021-02-06T07:08:40.451Z caller=main.go:85 msg="starting pushgateway" version="(version=1.4.0, branch=HEAD, revision=007ba874bead1b9ad2253d89e3adeb16a73fd012)"level=info ts=2021-02-06T07:08:40.451Z caller=main.go:86 build_context="(go=go1.15.7, user=root@410bc05a48f6, date=20210122-23:54:24)"level=info ts=2021-02-06T07:08:40.455Z caller=main.go:139 listen_address=:9091level=info ts=2021-02-06T07:08:40.456Z caller=tls_config.go:191 msg="TLS is disabled." http2=false[root@tango-centos01 pushgateway-1.4.0]# netstat -tulnp |grep 9091tcp6 0 0 :::9091 :::* LISTEN 3012/./pushgateway

3）访问链接http://192.168.112.101:9091/

图片1080×745 68.6 KB

3.2.2 配置Pushgateway

1）在prometheus.yml配置文件中，单独定义一个job，然后将target指向pushgateway运行所在主机的主机名或ip和运行端口即可。如下：

- job_name: 'pushgateway' scrape_interval: 5s # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['192.168.112.101:9091'] labels: instance: pushgateway

2）配置完成后，重启prometheus，然后prometheus的web端查询数据，即可获得刚才的测试数据，如图：

图片1080×251 40.6 KB

3.3 Grafana配置

Grafana是一个跨平台的开源的度量分析和可视化工具，可以通过将采集的数据查询然后可视化的展示，并及时通知。它主要有以下六大特点：

展示方式：快速灵活的客户端图表，具备丰富的仪表盘插件，比如热图、折线图、图表等多种展示方式；

数据源：支持Graphite，InfluxDB，OpenTSDB，Prometheus，Elasticsearch，CloudWatch和KairosDB等；

通知提醒：根据不同指标定义不同的告警规则，计算是否触发告警并发送通知；

混合展示：在同一图表中混合使用不同的数据源，可以基于每个查询指定数据源，甚至自定义数据源；

注释：使用来自不同数据源的丰富事件注释图表，将鼠标悬停在事件上会显示完整的事件元数据和标记；

过滤器：Ad-hoc过滤器允许动态创建新的键/值过滤器，这些过滤器会自动应用于使用该数据源的所有查询。

3.3.1 安装grafana

1）下载并安装grafana

[root@tango-centos01 src]# tar -xzvf grafana-7.4.0.linux-amd64.tar.gz -C /usr/local/

2）启动grafana进程

[root@tango-centos01 grafana-7.4.0]# nohup ./bin/grafana-server &[root@tango-centos01 grafana-7.4.0]# netstat -an|grep 3000tcp6 0 0 :::3000 :::* LISTEN

3）访问grafana页面

默认情况下，Grafana将在http://192.168.112.101:3000/上监听。默认登录名是admin和admin，首次登陆提示修改密码，可以skip（跳过）。

3.3.2 配置grafana

1）配置数据源为Prometheus

图片1080×488 66.6 KB

2）导入Dashboard模板

图片1080×392 75.5 KB

3.4 Alertmanager告警配置

Pormetheus的警告由独立的两部分组成：通过在Prometheus中定义AlertRule（告警规则），Prometheus会周期性的对告警规则进行计算，如果满足告警触发条件就会向Alertmanager发送告警信息；Alertmanager管理这些警告，包括silencing, inhibition, aggregation以及通过一些方法发送通知，例如：email，Slack和WebHook等。

图片1080×281 42.5 KB

来源：https://yunlzheng.gitbook.io/prometheus-book

建立警告和通知的主要步骤：

创建和配置Alertmanager

启动Prometheus服务时，通过-alertmanager.url标志配置Alermanager地址，以便Prometheus服务能和Alertmanager建立连接。

3.4.1 安装altermanager

1）下载并解压

[root@tango-centos01 src]# tar -xzvf alertmanager-0.21.0.linux-amd64.tar.gz -C /usr/local/prometheus/[root@tango-centos01 src]# cd /usr/local/prometheus/[root@tango-centos01 prometheus]# mv alertmanager-0.21.0.linux-amd64 alertmanager-0.21.0

2）运行altermanager

[root@tango-centos01 alertmanager-0.21.0]# nohup ./alertmanager &level=info ts=2021-02-06T07:38:40.722Z caller=main.go:216 msg="Starting Alertmanager" version="(version=0.21.0, branch=HEAD, revision=4c6c03ebfe21009c546e4d1e9b92c371d67c021d)"level=info ts=2021-02-06T07:38:40.722Z caller=main.go:217 build_context="(go=go1.14.4, user=root@dee35927357f, date=20200617-08:54:02)"level=info ts=2021-02-06T07:38:40.759Z caller=cluster.go:161 component=cluster msg="setting advertise address explicitly" addr=192.168.112.101 port=9094level=info ts=2021-02-06T07:38:40.793Z caller=cluster.go:623 component=cluster msg="Waiting for gossip to settle..." interval=2slevel=info ts=2021-02-06T07:38:40.853Z caller=coordinator.go:119 component=configuration msg="Loading configuration file" file=alertmanager.ymllevel=info ts=2021-02-06T07:38:40.854Z caller=coordinator.go:131 component=configuration msg="Completed loading of configuration file" file=alertmanager.ymllevel=info ts=2021-02-06T07:38:40.858Z caller=main.go:485 msg=Listening address=:9093level=info ts=2021-02-06T07:38:42.794Z caller=cluster.go:648 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000276345s

访问http://192.168.112.101:9093/#/alerts

图片1080×444 41.4 KB

Alert菜单下可以查看Alertmanager接收到的告警内容，Silences菜单下则可以通过UI创建静默规则， Status菜单可以看到当前系统的运行状态以及配置信息。

3.4.2 Altermanager配置

Alertmanager主要负责对Prometheus产生的告警进行统一处理，因此在Alertmanager配置中一般会包含以下几个主要部分：

全局配置（global）：用于定义一些全局的公共参数，如全局的SMTP配置，Slack配置等内容；

模板（templates）：用于定义告警通知时的模板，如HTML模板，邮件模板等；

告警路由（route）：根据标签匹配，确定当前告警应该如何处理；

接收人（receivers）：接收人是一个抽象的概念，它可以是一个邮箱也可以是微信，Slack或者Webhook等，接收人一般配合告警路由使用；

抑制规则（inhibit_rules）：合理设置抑制规则可以减少垃圾告警的产生

完整配置格式如下：

[root@tango-centos01 alertmanager-0.21.0]# cat alertmanager.yml global: resolve_timeout: 5mroute: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 1h receiver: 'web.hook'receivers:- name: 'web.hook' webhook_configs: - url: 'http://127.0.0.1:5001/'inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'dev', 'instance']

1）告警路由

每一个告警都会从配置文件中顶级的route进入路由树，默认情况下，告警进入到顶级route后会遍历所有的子节点，直到找到最深的匹配route，并将告警发送到该route定义的receiver中。但如果route中设置continue的值为false，那么告警在匹配到第一个子节点之后就直接停止。如果continue为true，报警则会继续进行后续子节点的匹配。如果当前告警匹配不到任何的子节点，那该告警将会基于当前路由节点的接收器配置方式进行处理。

2）告警分组

Altermanager中可以使用group_by来定义分组规则，如果满足group_by中定义标签名称，那么这些告警将会合并为一个通知发送给接收器。

3）接收告警信息（以mail为例）

在Alertmanager使用邮箱通知，用户只需要定义好SMTP相关的配置，并且在receiver中定义接收方的邮件地址即可，如下以qq邮箱为例（注password需要生产授权码登录）：

global: smtp_smarthost: smtp.qq.com:25 smtp_from: '[email protected]' smtp_auth_username: '[email protected]' smtp_auth_password: 'xxxxxxxx' resolve_timeout: 5mtemplates:- '*.tmpl'route: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 1h receiver: 'default-receiver'receivers:- name: 'default-receiver' email_configs: - to: ‘[email protected]'send_resolved: true html: '{{ template "email.html" . }}' # 模板 headers: { Subject: " {{ .CommonLabels.instance }} {{ .CommonAnnotations.summary }}" } #标题inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'dev', 'instance']

配置邮件模板：

[root@tango-centos01 alertmanager-0.21.0]# cat email.tmpl {{ define "email.html" }}{{ range .Alerts }}

实例: {{ .Labels.instance }}信息: {{ .Annotations.summary }}详情: {{ .Annotations.description }}时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }}

3.4.3 关联Prometheus和Altermanager

1）编辑Prometheus配置文件prometheus.yml,并添加以下内容:

# Alertmanager configurationalerting: alertmanagers: - static_configs:- targets:['192.168.112.101:9093']# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.rule_files: - "/usr/local/prometheus/prometheus-2.24.1/alert-rules.yml"

2）配置告警规则文件

[root@tango-centos01 prometheus-2.24.1]# vi alert-rules.ymlgroups:- name: general.rules rules: # Alert for any instance that is unreachable for >5 minutes. - alert: InstanceDown expr: up == 0 for: 1m labels: severity: error annotations: summary: "Instance {{ $labels.instance }} down" description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."- name: mem.rules rules: # Alert for any instance that memory usage for > 5%. - alert: NodeMemoryUsage expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100 > 5 for: 1m labels: severity: error annotations: summary: "Instance {{ $labels.instance }} memory usage over 5%" description: "{{ $labels.instance }} of job {{ $labels.job }} has memory usage over 5%."- name: cpu.rules rules: # Alert for any instance that cpu usage for > 1%. - alert: NodeCpuUsage expr: 100-irate(node_cpu_seconds_total{job="node",mode="idle"}[5m])*100 > 1 for: 1m labels: severity: error annotations: summary: "{{ $labels.instance }} cpu useage load too high" description: "{{ $labels.instance }} of job {{ $labels.job }} has been too high for more than 1 minutes."

重启Prometheus后，规则生效，也可以从http://192.168.112.101:9090/rules查看rules配置是否生效。

图片1080×395 62.7 KB

3.4.4 触发告警测试

1）CPU和内存告警

图片1080×362 59.2 KB

2）查看Alertmanager UI此时可以看到Alertmanager接收到的告警信息。

图片1080×567 71.5 KB

3）查看邮件

图片897×287 12.1 KB

至此，完成了Prometheus从数据采集，数据加工展示和告警处理一整套流程。

参考资料：

https://prometheus.io/docs/introduction/overview/

https://yunlzheng.gitbook.io/prometheus-book/

https://www.cnblogs.com/vovlie/p/7709312.html

https://www.jianshu.com/p/273b1a7d4cab

https://yasongxu.gitbook.io/container-monitor/

https://www.cnblogs.com/chenqionghe/p/10494868.html

https://www.cnblogs.com/imyalost/p/9873641.html

https://blog.51cto.com/lvsir666/2409063

真人版恋爱剧情类游戏盘点
苹果壁纸 – 高清iPhone/iPad/Mac主题壁纸下载 – 绘壁纸

2002年世界杯冠军_2003女排世界杯 - jiuzhoucall.com

Prometheus 监控实战：架构解析与告警配置

2025-06-23 01:07:47