1. 开启性能模式
[mysqld]
performance_schema=1
optimizer_search_depth = 0
optimizer_switch = index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on
performance_schema
解释:
- 1 或 ON:启用(推荐)
- 0 或 OFF:禁用
optimizer_search_depth
解释:
- 控制 MySQL 查询优化器在探索执行计划时的搜索深度
- 默认值:62(表示“自动”)
- 设置为 0:表示“自动深度”
- 设置为 1-62:限制搜索范围
optimizer_switch
解释:
index_merge=on
启用索引合并优化(总开关)- ‘index_merge_union=on’ 启用 OR 条件下的索引联合(如 WHERE a=1 OR b=2)
- ‘index_merge_sort_union=on’ 启用排序后的索引联合(适用于无索引排序的 OR)
- ‘index_merge_intersection=on’ 启用 AND 条件下的索引交集(如 WHERE a=1 AND b=2,但 a、b 分属不同索引)
2. 部署 mysqld-exporter
# 添加 Helm 仓库
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
# 更新 Helm 仓库
helm repo update
# 搜索
helm search repo prometheus-mysql-exporter
# 下载
helm pull prometheus-community/prometheus-mysql-exporter --version 2.11.0
# 下载并解压
helm pull prometheus-community/prometheus-mysql-exporter --version 2.11.0 --untar
2.1. collectors 解读
collectors: {}
# auto_increment.columns: false # 抓取自增列使用情况(接近上限会告警)。
# binlog_size: false # 当前 binlog 文件大小,👉 用于监控 binlog 膨胀,防止磁盘爆掉。
# engine_innodb_status: false # 解析 SHOW ENGINE INNODB STATUS(锁等待、死锁、事务信息)。👉 对调优事务和死锁有用。
# engine_tokudb_status: false #
# global_status: true # 从 SHOW GLOBAL STATUS 抓取 MySQL 的运行状态(QPS、TPS、连接数、缓冲池使用等核心指标)。👉 生产强烈建议开启。
# global_variables: true # 从 SHOW GLOBAL VARIABLES 抓取配置参数。👉 用于监控参数变动。
# info_schema.innodb_metrics: false # 采集 information_schema.innodb_metrics 的数据,更细粒度的 InnoDB 指标(buffer pool、事务、IO)。
# info_schema.innodb_tablespaces: false # 各个表空间大小、空间使用。
# info_schema.innodb_cmp: false # InnoDB 压缩表相关的指标。
# info_schema.innodb_cmpmem: false
# info_schema.processlist: false # 当前连接信息(类似 SHOW PROCESSLIST)。👉 可看阻塞线程、长事务。
# info_schema.processlist.min_time: 0 # 只采集运行超过 N 秒的 SQL,避免噪音。
# info_schema.query_response_time: false
# info_schema.tables: true # 采集表大小、行数、索引大小等。
# info_schema.tables.databases: '*' # 表示采集所有库的表。
# info_schema.tablestats: false # 表级别的统计数据。
# info_schema.schemastats: false # 库级别的统计数据。
# info_schema.userstats: false # 用户维度维度的连接/查询统计(需要 userstat=1 编译 MySQL)。
# info_schema.clientstats: false # 客户端维度的连接/查询统计(需要 userstat=1 编译 MySQL)。
# perf_schema.eventsstatements: false
# perf_schema.eventsstatements.digest_text_limit: 120
# perf_schema.eventsstatements.limit: false
# perf_schema.eventsstatements.timelimit: 86400
# perf_schema.eventswaits: false
# perf_schema.file_events: false
# perf_schema.file_instances: false
# perf_schema.indexiowaits: false
# perf_schema.tableiowaits: false
# perf_schema.tablelocks: false
# perf_schema.replication_group_member_stats: false
# slave_status: true # 采集 SHOW SLAVE STATUS,监控主从复制延迟。👉 必须开。
# slave_hosts: false # 采集 SHOW SLAVE HOSTS,查看从库信息。
# heartbeat: false # 配合 pt-heartbeat 或自建心跳表,计算主从复制延迟(比 Seconds_Behind_Master 更准确)。
# heartbeat.database: heartbeat
# heartbeat.table: heartbeat
必开: global_status, global_variables, slave_status, info_schema.tables (这是 QPS/TPS/连接数/表大小/复制延迟的核心指标)
视情况开:
高并发事务库:engine_innodb_status, info_schema.innodb_metrics
调优/慢 SQL 分析:perf_schema.eventsstatements
有复制心跳表:heartbeat
存储膨胀风险:binlog_size, info_schema.innodb_tablespaces
一般不开: innodb_cmp, userstats, 各种 iowaits (指标太多,Prometheus 压力大,除非专门调优)。
推荐配置:
collectors:
# 基础状态
global_status: true
global_variables: true
# 表大小监控
info_schema.tables: true
info_schema.tables.databases: '*'
# 主从复制
slave_status: true
# 如果有 pt-heartbeat 或自建心跳表,启用下面
# heartbeat: true
# heartbeat.database: heartbeat
# heartbeat.table: heartbeat
# InnoDB 引擎
engine_innodb_status: true
info_schema.innodb_metrics: true
# Binlog 监控(防止磁盘爆掉)
binlog_size: true
# SQL 分析(可选,打开会有较大开销)
perf_schema.eventsstatements: false
# perf_schema.eventsstatements.digest_text_limit: 120
# perf_schema.eventsstatements.limit: 500
# perf_schema.eventsstatements.timelimit: 86400
# 进程/长事务(只监控运行超过30s的SQL)
info_schema.processlist: true
info_schema.processlist.min_time: 30
# 不常用,默认关闭(需要时再开)
auto_increment.columns: false
info_schema.userstats: false
info_schema.clientstats: false
info_schema.innodb_tablespaces: false
info_schema.innodb_cmp: false
info_schema.innodb_cmpmem: false
perf_schema.eventswaits: false
perf_schema.file_events: false
perf_schema.file_instances: false
perf_schema.indexiowaits: false
perf_schema.tableiowaits: false
perf_schema.tablelocks: false
perf_schema.replication_group_member_stats: false
slave_hosts: false
2.2. 单实例
# set values
cat > mysqld-exporter-standalone-values.yaml <<EOF
image:
registry: quay.io
repository: prometheus/mysqld-exporter
tag: "v0.17.2"
mysql:
host: "localhost"
port: 3306
pass: "password"
collectors:
# 基础状态
global_status: true
global_variables: true
# 表大小监控
info_schema.tables: true
info_schema.tables.databases: '*'
# 主从复制
slave_status: false
slave_hosts: false
# InnoDB 引擎
engine_innodb_status: true
info_schema.innodb_metrics: true
# Binlog 监控(防止磁盘爆掉)
binlog_size: true
# 进程/长事务(只监控运行超过30s的SQL)
info_schema.processlist: true
info_schema.processlist.min_time: 30
EOF
# install
helm upgrade --install prometheus-mysql-exporter \
prometheus-community/prometheus-mysql-exporter \
--version 2.11.0 \
-n db \
--set serviceMonitor.enabled=true \
-f mysqld-exporter-standalone-values.yaml
2.3. 主从实例
主服务器 exporter
# set values
cat > mysqld-exporter-standalone-values.yaml <<EOF
image:
registry: quay.io
repository: prometheus/mysqld-exporter
tag: "v0.17.2"
mysql:
host: "localhost"
port: 3306
pass: "password"
collectors:
# 基础状态
global_status: true
global_variables: true
# 表大小监控
info_schema.tables: true
info_schema.tables.databases: '*'
# 主从复制
slave_status: false
slave_hosts: true
# InnoDB 引擎
engine_innodb_status: true
info_schema.innodb_metrics: true
# Binlog 监控(防止磁盘爆掉)
binlog_size: true
# 进程/长事务(只监控运行超过30s的SQL)
info_schema.processlist: true
info_schema.processlist.min_time: 30
EOF
# install
helm upgrade --install prometheus-mysql-exporter \
prometheus-community/prometheus-mysql-exporter \
--version 2.11.0 \
-n db \
--set serviceMonitor.enabled=true \
-f mysqld-exporter-standalone-values.yaml
从服务器 exporter
# set values
cat > mysqld-exporter-standalone-values.yaml <<EOF
image:
registry: quay.io
repository: prometheus/mysqld-exporter
tag: "v0.17.2"
mysql:
host: "localhost"
port: 3306
pass: "password"
collectors:
# 基础状态
global_status: true
global_variables: true
# 表大小监控
info_schema.tables: true
info_schema.tables.databases: '*'
# 主从复制
slave_status: true
slave_hosts: false
# InnoDB 引擎
engine_innodb_status: true
info_schema.innodb_metrics: true
# Binlog 监控(防止磁盘爆掉)
binlog_size: true
# 进程/长事务(只监控运行超过30s的SQL)
info_schema.processlist: true
info_schema.processlist.min_time: 30
EOF
# install
helm upgrade --install prometheus-mysql-exporter \
prometheus-community/prometheus-mysql-exporter \
--version 2.11.0 \
-n db \
--set serviceMonitor.enabled=true \
-f mysqld-exporter-standalone-values.yaml
3. 告警规则
3.1. MySQL 服务可用性告警
- alert:
expr: mysql_up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "MySQL 服务不可用"
description: "{{ $labels.instance }} MySQL 服务已停止运行"
3.2. 主从复制状态告警
# 从库 IO 线程停止
- alert: MySQLSlaveIOThreadStopped
expr: mysql_slave_sql_running == 0
for: 1m
labels:
severity: critical
annotations:
summary: "MySQL 从库 IO 线程停止"
description: "{{ $labels.instance }} 从库 IO 线程已停止,复制中断"
# 从库 SQL 线程停止
- alert: MySQLSlaveSQLThreadStopped
expr: mysql_slave_io_running == 0
for: 1m
labels:
severity: critical
annotations:
summary: "MySQL 从库 SQL 线程停止"
description: "{{ $labels.instance }} 从库 SQL 线程已停止,复制中断"
# 主从延迟过高
- alert: MySQLSlaveReplicationLag
expr: mysql_slave_lag_seconds > 300
for: 2m
labels:
severity: warning
annotations:
summary: "MySQL 主从复制延迟过高"
description: "{{ $labels.instance }} 主从复制延迟 {{ $value }} 秒,超过 5 分钟"
# 主从延迟严重
- alert: MySQLSlaveReplicationLagCritical
expr: mysql_slave_lag_seconds > 3600
for: 1m
labels:
severity: critical
annotations:
summary: "MySQL 主从复制延迟严重"
description: "{{ $labels.instance }} 主从复制延迟 {{ $value }} 秒,超过 1 小时"
3.3. 连接数告警
# 连接数过高
- alert: MySQLTooManyConnections
expr: mysql_global_status_threads_connected / mysql_global_variables_max_connections * 100 > 80
for: 2m
labels:
severity: warning
annotations:
summary: "MySQL 连接数过高"
description: "{{ $labels.instance }} 当前连接数占最大连接数的 {{ $value }}%,超过 80%"
# 连接数达到上限
- alert: MySQLConnectionsNearLimit
expr: mysql_global_status_threads_connected / mysql_global_variables_max_connections * 100 > 95
for: 1m
labels:
severity: critical
annotations:
summary: "MySQL 连接数接近上限"
description: "{{ $labels.instance }} 当前连接数占最大连接数的 {{ $value }}%,超过 95%"
# 异常连接中断
- alert: MySQLAbortedConnections
expr: rate(mysql_global_status_aborted_connects[5m]) > 5
for: 2m
labels:
severity: warning
annotations:
summary: "MySQL 异常连接中断过多"
description: "{{ $labels.instance }} 连接中断速率 {{ $value }}/秒,超过正常水平"
3.4. InnoDB 缓冲池告警
# 缓冲池命中率低
- alert: MySQLInnoDBBufferPoolHitRateLow
expr: (mysql_global_status_innodb_buffer_pool_read_requests - mysql_global_status_innodb_buffer_pool_reads) / mysql_global_status_innodb_buffer_pool_read_requests * 100 < 95
for: 5m
labels:
severity: warning
annotations:
summary: "MySQL InnoDB 缓冲池命中率低"
description: "{{ $labels.instance }} InnoDB 缓冲池命中率 {{ $value }}%,低于 95%"
# 脏页比例过高
- alert: MySQLInnoDBDirtyPagesHigh
expr: mysql_global_status_buffer_pool_dirty_pages / (mysql_global_status_buffer_pool_pages{state="data"} + mysql_global_status_buffer_pool_pages{state="free"} + mysql_global_status_buffer_pool_pages{state="misc"}) * 100 > 75
for: 5m
labels:
severity: warning
annotations:
summary: "MySQL InnoDB 脏页比例过高"
description: "{{ $labels.instance }} InnoDB 脏页占比 {{ $value }}%,超过 75%"
3.5. 查询性能告警
# 慢查询增长过快
- alert: MySQLSlowQueriesHigh
expr: rate(mysql_global_status_slow_queries[5m]) > 2
for: 5m
labels:
severity: warning
annotations:
summary: "MySQL 慢查询过多"
description: "{{ $labels.instance }} 慢查询频率 {{ $value }}/秒,需要优化查询"
# 全表扫描过多
- alert: MySQLSelectScanHigh
expr: rate(mysql_global_status_select_scan[5m]) > 100
for: 5m
labels:
severity: warning
annotations:
summary: "MySQL 全表扫描过多"
description: "{{ $labels.instance }} 全表扫描频率 {{ $value }}/秒,建议优化索引"
3.6. 锁等待告警
# InnoDB 行锁等待时间过长
- alert: MySQLInnoDBRowLockWaitHigh
expr: mysql_global_status_innodb_row_lock_time_avg > 5000
for: 2m
labels:
severity: warning
annotations:
summary: "MySQL InnoDB 行锁等待时间过长"
description: "{{ $labels.instance }} 平均行锁等待时间 {{ $value }} 毫秒,超过 5 秒"
# 表锁等待
- alert: MySQLTableLocksWaitingHigh
expr: mysql_global_status_table_locks_waited > 0
for: 1m
labels:
severity: warning
annotations:
summary: "MySQL 表锁等待"
description: "{{ $labels.instance }} 存在表锁等待情况"
3.7. 二进制日志告警
# 二进制日志空间使用率高(适用于主库)
- alert: MySQLBinlogSpaceHigh
expr: mysql_global_variables_binlog_cache_use / mysql_global_variables_max_binlog_cache_size * 100 > 80
for: 5m
labels:
severity: warning
annotations:
summary: "MySQL 二进制日志缓存使用率高"
description: "{{ $labels.instance }} 二进制日志缓存使用率 {{ $value }}%,超过 80%"
3.8. 复制错误告警
# 检测主从数据不一致(基于 mysqld-exporter 0.11+ 版本)
- alert: MySQLSlaveReplicationError
expr: mysql_global_status_slave_retried_transactions > 0
for: 1m
labels:
severity: critical
annotations:
summary: "MySQL 从库复制出现错误"
description: "{{ $labels.instance }} 从库复制出现错误,需要人工介入"
4. 部署建议
- 阈值调整:根据业务实际情况调整各项告警阈值
- 告警分级:critical 级别需要立即处理,warning 级别需要关注
- 监控频率:建议 mysqld-exporter 抓取间隔设置为 15-30 秒
- 标签管理:为主库和从库实例添加适当的标签区分
5. 注意事项
- 某些指标(如
mysql_slave_*
)仅在从库实例中存在 - 根据实际的 mysqld-exporter 版本,部分指标名称可能需要调整
- 建议结合业务监控一起使用,形成完整的监控体系