注意:本文档中所有敏感信息均已模糊化处理,实际使用时请替换占位符

目录


架构概述


架构说明:
- 负载层: HAProxy + Keepalived 提供高可用负载均衡和VIP管理
- 数据库层: MariaDB Galera 集群实现多活数据库
- 监控层: Prometheus + Grafana 实现全方位监控
- 仲裁层: Galera Arbitrator 提供奇数节点支持


环境准备

系统要求

  • 操作系统: RockyLinux 8/9
  • 硬件要求:
  • 最小 4GB RAM (生产环境建议 16GB+)
  • 最小 2 CPU 核心 (生产环境建议 8 核心+)
  • 网络要求:
  • 节点间延迟 < 5ms
  • 节点间带宽 > 1Gbps

基础配置

# 所有节点执行
# 1. 禁用SELinux
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
setenforce 0
# 2. 配置防火墙
firewall-cmd --permanent --add-port={3306,4444,4567,4568,9200,8080}/tcp
firewall-cmd --reload
# 3. 配置主机名解析
cat >> /etc/hosts <<EOF
192.168.1.101 db1
192.168.1.102 db2
192.168.1.103 db3
192.168.1.201 lb1
192.168.1.202 lb2
EOF
# 4. 时间同步
dnf install -y chrony
systemctl enable --now chronyd
chronyc sources

MariaDB Galera 集群部署

安装组件

# 所有数据库节点执行
# 1. 添加MariaDB仓库
curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash
# 2. 安装必要组件
dnf install -y MariaDB-server MariaDB-backup galera-4

Galera 配置

文件路径/etc/my.cnf.d/galera.cnf

[mysqld]
# 基础配置
binlog_format=ROW
default-storage-engine=InnoDB
innodb_autoinc_lock_mode=2
# Galera 配置
wsrep_on=ON
wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so
wsrep_cluster_name="galera_cluster"
wsrep_cluster_address="gcomm://db1,db2,db3"
wsrep_node_name="db1"  # 节点名需修改
wsrep_node_address="192.168.1.101" # 节点IP需修改
wsrep_sst_method=mariabackup
wsrep_sst_auth="sst_user:${SST_PASSWORD}"  # 替换${SST_PASSWORD}
# 防脑裂配置
wsrep_provider_options="gcache.size=1G; 
pc.ignore_sb=true; 
evs.suspect_timeout=PT30S;
evs.inactive_timeout=PT1M"

健康检测配置 (9200端口)

# 所有数据库节点执行
# 1. 安装xinetd
dnf install -y xinetd

# 2. 创建检测脚本
cat > /usr/local/bin/galera_healthcheck <<'EOF'
#!/bin/bash
CONFIG_FILE="/etc/mysql/conf.d/healthcheck.cnf"
RESPONSE=$(mysql --defaults-extra-file=$CONFIG_FILE -e "SHOW STATUS LIKE 'wsrep_ready'" 2>/dev/null | grep 'ON')
if [ -n "$RESPONSE" ]; then
  echo -e "HTTP/1.1 200 OK\r\nContent-Length: 0\r\n\r\n"
  exit 0
else
  echo -e "HTTP/1.1 503 Service Unavailable\r\nContent-Length: 0\r\n\r\n"
  exit 1
fi
EOF

# 3. 创建配置文件
cat > /etc/mysql/conf.d/healthcheck.cnf <<EOF
[client]
user=exporter
password=${EXPORTER_PASSWORD}  # 替换${EXPORTER_PASSWORD}
EOF
chmod 600 /etc/mysql/conf.d/healthcheck.cnf
chmod +x /usr/local/bin/galera_healthcheck

# 4. 配置xinetd服务
cat > /etc/xinetd.d/galera-healthcheck <<EOF
service galera-healthcheck
{
    disable = no
    flags = REUSE
    socket_type = stream
    port = 9200
    wait = no
    user = root
    group = root
    server = /usr/local/bin/galera_healthcheck
    log_on_failure += USERID
    per_source = UNLIMITED
}
EOF

# 5. 启动服务
systemctl enable --now xinetd

# 验证健康检查
curl -i http://localhost:9200

初始化集群

# 1. 在第一个节点初始化集群
galera_new_cluster
# 2. 启动其他节点
systemctl start mariadb
# 3. 所有节点设置开机自启
systemctl enable mariadb

创建系统用户

-- 在所有节点执行
-- SST用户 (替换${SST_PASSWORD})
CREATE USER 'sst_user'@'localhost' IDENTIFIED BY '${SST_PASSWORD}';
GRANT RELOAD, PROCESS, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sst_user'@'localhost';

-- 监控用户 (替换${EXPORTER_PASSWORD})
CREATE USER 'exporter'@'localhost' IDENTIFIED BY '${EXPORTER_PASSWORD}' WITH MAX_USER_CONNECTIONS 3;
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost';

FLUSH PRIVILEGES;

HAProxy + Keepalived 部署

HAProxy 配置

文件路径/etc/haproxy/haproxy.cfg

global
    log /dev/log local0
    maxconn 4000
    user haproxy
    group haproxy
defaults
    mode tcp
    timeout connect 10s
    timeout client 30s
    timeout server 30s
frontend galera_front
    bind *:3306
    default_backend galera_back
backend galera_back
    balance leastconn
    option tcp-check
    tcp-check connect port 9200
    server db1 192.168.1.101:3306 check port 9200 inter 5s rise 2 fall 3
    server db2 192.168.1.102:3306 check port 9200 inter 5s rise 2 fall 3
    server db3 192.168.1.103:3306 check port 9200 inter 5s rise 2 fall 3
listen stats
    bind *:8080
    mode http
    stats enable
    stats uri /haproxy?stats
    stats refresh 10s
    stats auth admin:${HAPROXY_ADMIN_PASS}  # 替换${HAPROXY_ADMIN_PASS}

Keepalived 配置

主节点文件路径/etc/keepalived/keepalived.conf

vrrp_script chk_haproxy {
    script "pidof haproxy"
    interval 2
    weight 2
}
vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass ${KEEPALIVED_AUTH_PASS}  # 替换${KEEPALIVED_AUTH_PASS}
    }
    virtual_ipaddress {
        192.168.1.200/24
    }
    track_script {
        chk_haproxy
    }
}

备节点配置:
- state 设置为 BACKUP
- priority 设置为 90

启动服务

# 所有负载节点执行
systemctl enable --now haproxy keepalived
# 验证VIP
ip addr show eth0 | grep 192.168.1.200

备份与恢复

定时备份脚本

文件路径/usr/local/bin/backup_galera.sh

#!/bin/bash
# 配置文件:包含所有敏感信息
CONFIG_FILE="/etc/galera/backup.conf"

# 加载配置文件
if [ ! -f "$CONFIG_FILE" ]; then
  echo "$(date) - 配置文件 $CONFIG_FILE 不存在" >> /var/log/mariabackup.log
  exit 1
fi
source "$CONFIG_FILE"

# 创建备份目录
mkdir -p $BACKUP_DIR/$DATE || { 
  echo "$(date) - 创建目录失败" >> $LOG_FILE
  /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "MariaDB节点$(hostname)备份失败:创建目录失败"
  exit 1
}
chown -R mysql:mysql $BACKUP_DIR/$DATE

# 设置文件限制
ulimit -n 65535

# 开始备份
echo "===== 开始备份: $(date) =====" >> $LOG_FILE

mariabackup --backup \
  --user=$BACKUP_USER \
  --password=$BACKUP_PASS \
  --target-dir=$BACKUP_DIR/$DATE \
  --parallel=4 \
  --galera-info \
  --open-files-limit=65535 2>> $LOG_FILE

if [ $? -ne 0 ]; then
    echo "备份失败!" >> $LOG_FILE
    /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "MariaDB节点$(hostname)备份失败:$(tail -n 10 $LOG_FILE)"
    exit 1
fi

# 准备备份
echo "===== 准备备份: $(date) =====" >> $LOG_FILE
mariabackup --prepare \
  --target-dir=$BACKUP_DIR/$DATE \
  --open-files-limit=65535 2>> $LOG_FILE

# 验证关键文件存在
if [ -f "$BACKUP_DIR/$DATE/xtrabackup_checkpoints" ]; then
    echo "备份验证成功" >> $LOG_FILE
    
    # 使用XZ多核压缩备份目录
    echo "===== 压缩备份: $(date) =====" >> $LOG_FILE
    tar -I 'xz -T0' -cf $BACKUP_DIR/$DATE.tar.xz -C $BACKUP_DIR $DATE
    
    if [ $? -eq 0 ]; then
        # 删除原始备份目录
        rm -rf $BACKUP_DIR/$DATE
        echo "备份压缩成功" >> $LOG_FILE
    else
        echo "备份压缩失败!" >> $LOG_FILE
        /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "MariaDB节点$(hostname)备份压缩失败"
    fi
else
    echo "备份验证失败: 缺少xtrabackup_checkpoints文件!" >> $LOG_FILE
    /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "MariaDB节点$(hostname)备份验证失败:缺少关键文件"
    exit 1
fi

# 保留最近365天备份
find $BACKUP_DIR -name "*.tar.xz" -mtime +365 -delete

echo "===== 备份完成: $(date) =====" >> $LOG_FILE

备份配置文件

文件路径/etc/galera/backup.conf

# 备份配置 - 请替换实际值
BACKUP_ROOT="${SECURE_BACKUP_PATH}"  # 替换为安全路径
BACKUP_DIR="${BACKUP_ROOT}/galera_backups"
DATE=$(date +%Y%m%d-%H%M%S)
BACKUP_USER="sst_user"
BACKUP_PASS="${SST_PASSWORD}"  # 替换${SST_PASSWORD}
LOG_FILE="/var/log/mariabackup.log"
DT_TOKEN="${DINGTALK_TOKEN}"   # 替换${DINGTALK_TOKEN}
DT_SIGN="${DINGTALK_SIGN}"     # 替换${DINGTALK_SIGN}

设置定时任务

chmod 600 /etc/galera/backup.conf
chmod +x /usr/local/bin/backup_galera.sh
# 每天凌晨2点执行备份
echo "0 2 * * * /usr/local/bin/backup_galera.sh" | crontab -

自动化恢复脚本

文件路径/usr/local/bin/restore_galera.sh

#!/bin/bash
# Galera集群数据恢复脚本
# 用法: restore_galera.sh <备份文件路径>
# 示例: restore_galera.sh ${BACKUP_ROOT}/galera_backups/20231001-020000.tar.xz

# 配置文件
CONFIG_FILE="/etc/galera/restore.conf"
source "$CONFIG_FILE"

if [ $# -eq 0 ]; then
  echo "错误:请提供备份文件路径"
  echo "用法: $0 <备份文件路径>"
  exit 1
fi

BACKUP_FILE=$1
TEMP_DIR="/tmp/galera_restore_$(date +%s)"
LOG_FILE="/var/log/galera_restore.log"

# 验证备份文件存在
if [ ! -f "$BACKUP_FILE" ]; then
  echo "错误:备份文件 $BACKUP_FILE 不存在" | tee -a $LOG_FILE
  exit 1
fi

echo "===== 开始恢复: $(date) =====" | tee -a $LOG_FILE

# 1. 停止MariaDB服务
echo "[步骤1/8] 停止MariaDB服务..." | tee -a $LOG_FILE
systemctl stop mariadb >> $LOG_FILE 2>&1
if [ $? -ne 0 ]; then
  echo "错误:停止MariaDB服务失败" | tee -a $LOG_FILE
  /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "Galera恢复失败:停止服务失败"
  exit 1
fi

# 2. 清空数据目录
echo "[步骤2/8] 清空数据目录..." | tee -a $LOG_FILE
rm -rf /var/lib/mysql/* >> $LOG_FILE 2>&1

# 3. 创建临时目录
echo "[步骤3/8] 创建临时目录 $TEMP_DIR..." | tee -a $LOG_FILE
mkdir -p $TEMP_DIR >> $LOG_FILE 2>&1
if [ $? -ne 0 ]; then
  echo "错误:创建临时目录失败" | tee -a $LOG_FILE
  /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "Galera恢复失败:创建临时目录失败"
  exit 1
fi

# 4. 解压备份(使用XZ多核解压)
echo "[步骤4/8] 解压备份文件 $BACKUP_FILE..." | tee -a $LOG_FILE
tar -I 'xz -T0' -xf $BACKUP_FILE -C $TEMP_DIR >> $LOG_FILE 2>&1
if [ $? -ne 0 ]; then
  echo "错误:解压备份失败" | tee -a $LOG_FILE
  /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "Galera恢复失败:解压备份失败"
  exit 1
fi

# 获取解压后的目录名
BACKUP_DIR_NAME=$(ls $TEMP_DIR)
RESTORE_DIR="$TEMP_DIR/$BACKUP_DIR_NAME"

# 5. 执行恢复
echo "[步骤5/8] 恢复数据到/var/lib/mysql..." | tee -a $LOG_FILE
mariabackup --copy-back \
  --target-dir=$RESTORE_DIR \
  --open-files-limit=65535 >> $LOG_FILE 2>&1

if [ $? -ne 0 ]; then
  echo "错误:数据恢复失败" | tee -a $LOG_FILE
  /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "Galera恢复失败:数据恢复失败"
  exit 1
fi

# 6. 修复权限
echo "[步骤6/8] 修复数据目录权限..." | tee -a $LOG_FILE
chown -R mysql:mysql /var/lib/mysql >> $LOG_FILE 2>&1

# 7. 启动服务
echo "[步骤7/8] 启动MariaDB服务..." | tee -a $LOG_FILE
systemctl start mariadb >> $LOG_FILE 2>&1
if [ $? -ne 0 ]; then
  echo "错误:启动MariaDB服务失败" | tee -a $LOG_FILE
  /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "Galera恢复失败:启动服务失败"
  exit 1
fi

# 8. 验证恢复
echo "[步骤8/8] 验证集群状态和数据一致性..." | tee -a $LOG_FILE
mysql -e "SHOW STATUS LIKE 'wsrep%'" >> $LOG_FILE 2>&1
mysql -e "CHECK TABLE mysql.user EXTENDED" >> $LOG_FILE 2>&1

# 清理临时文件
rm -rf $TEMP_DIR

echo "===== 恢复完成: $(date) =====" | tee -a $LOG_FILE
echo "恢复日志: $LOG_FILE"
echo "请验证数据完整性和集群状态"

# 发送成功通知
/usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "Galera数据恢复成功完成"

恢复配置文件

文件路径/etc/galera/restore.conf

# 恢复配置 - 请替换实际值
DT_TOKEN="${DINGTALK_TOKEN}"   # 替换${DINGTALK_TOKEN}
DT_SIGN="${DINGTALK_SIGN}"     # 替换${DINGTALK_SIGN}

恢复操作流程

  1. 准备配置文件:
# 创建配置文件
sudo mkdir -p /etc/galera
sudo nano /etc/galera/restore.conf
# 添加内容:
# DT_TOKEN="your_actual_token"
# DT_SIGN="your_actual_sign"
sudo chmod 600 /etc/galera/restore.conf
  1. 选择备份文件:
# 列出可用备份
ls -l ${SECURE_BACKUP_PATH}/galera_backups/*.tar.xz
  1. 执行恢复脚本:
chmod +x /usr/local/bin/restore_galera.sh
/usr/local/bin/restore_galera.sh ${SECURE_BACKUP_PATH}/galera_backups/20231001-020000.tar.xz
  1. 验证恢复结果:
-- 检查集群状态
SHOW STATUS LIKE 'wsrep%';

-- 验证关键数据
SELECT COUNT(*) FROM important_table;

恢复流程注意事项

  1. 配置文件安全:确保配置文件权限为600
  2. 空间要求:确保有足够的磁盘空间(XZ压缩需要更多临时空间)
  3. 服务中断:恢复期间数据库不可用
  4. 验证步骤:恢复后必须执行数据验证
  5. 性能优化:XZ多核解压会充分利用CPU资源

故障处理手册

常规故障排查

-- 检查集群状态
SHOW STATUS LIKE 'wsrep%';
-- 关键指标:
-- wsrep_cluster_size: 集群节点数 (应为3)
-- wsrep_cluster_status: 集群状态 (应为Primary)
-- wsrep_connected: 网络连接状态 (应为ON)
-- wsrep_ready: 节点就绪状态 (应为ON)

脑裂检测与处理

自动化脑裂检测脚本

文件路径/usr/local/bin/cluster_integrity_check.sh

#!/bin/bash
# 配置文件
CONFIG_FILE="/etc/galera/monitor.conf"
source "$CONFIG_FILE"

# 获取当前节点状态
CURRENT_STATUS=$(mysql -u$MONITOR_USER -p$MONITOR_PASS -e "SHOW STATUS LIKE 'wsrep_cluster_status'" -sN | awk '{print $2}')
CURRENT_SIZE=$(mysql -u$MONITOR_USER -p$MONITOR_PASS -e "SHOW STATUS LIKE 'wsrep_cluster_size'" -sN | awk '{print $2}')

if [ "$CURRENT_STATUS" != "Primary" ]; then
  MESSAGE="[CRITICAL] Node $(hostname) in $CURRENT_STATUS state!"
  /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "$MESSAGE"
  exit 1
fi

if [ "$CURRENT_SIZE" -lt "$EXPECTED_SIZE" ]; then
  MESSAGE="[CRITICAL] Cluster size $CURRENT_SIZE < $EXPECTED_SIZE!"
  /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "$MESSAGE"
  exit 1
fi

监控配置文件

文件路径/etc/galera/monitor.conf

# 监控配置 - 请替换实际值
EXPECTED_SIZE=3
MONITOR_USER="exporter"
MONITOR_PASS="${EXPORTER_PASSWORD}"  # 替换${EXPORTER_PASSWORD}
DT_TOKEN="${DINGTALK_TOKEN}"         # 替换${DINGTALK_TOKEN}
DT_SIGN="${DINGTALK_SIGN}"           # 替换${DINGTALK_SIGN}

附录

敏感信息占位符说明

占位符 说明 示例值
${SECURE_BACKUP_PATH} 安全备份存储路径 /secure/db_backups
${SST_PASSWORD} SST用户密码 s3cretPass
${EXPORTER_PASSWORD} 监控用户密码 ExporterPass123
${HAPROXY_ADMIN_PASS} HAProxy管理密码 AdminPass123
${KEEPALIVED_AUTH_PASS} Keepalived认证密码 Keep@l1ve
${DINGTALK_TOKEN} 钉钉机器人Token d14e9ddfefd414…
${DINGTALK_SIGN} 钉钉机器人签名 SEC1089bac8cb8b…

安全最佳实践

  1. 配置文件管理:
# 设置配置文件权限
chmod 600 /etc/galera/*.conf
chown root:root /etc/galera/*.conf
  1. 密码轮换策略:
  • SST密码:每90天轮换
  • 监控密码:每180天轮换
  • HAProxy密码:每365天轮换
  1. 审计日志:
# 记录所有管理操作
echo "$(date) $(whoami) $@" >> /var/log/galera_audit.log

版本更新记录

版本 日期 修改内容
1.7 2025-10-03 添加自动化恢复脚本
1.8 2025-10-04 统一敏感信息管理,增强安全性
备份压缩优化为XZ多核格式
敏感路径隐私保护

紧急恢复联系人

角色 联系方式 响应时间
值班运维 +86-138XXXXXXX 24/7
高级DBA dba-oncall@example.com 24/7

最后更新: 2025-10-04
安全等级: 机密
文档校验: 使用校验和验证完整性
md5sum: a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6


安全增强说明

敏感信息管理策略

  1. 统一配置中心
  • 所有敏感信息存储在/etc/galera/目录下的配置文件中
  • 配置文件权限设置为600,仅root可访问
  • 使用环境变量或source命令加载配置
  1. 占位符系统
    | 占位符 | 实际值位置 |
    |————————|————————–|
    | ${SECURE_BACKUP_PATH} | backup.conf |
    | ${SST_PASSWORD} | galera.cnf, backup.conf |
    | ${EXPORTER_PASSWORD} | healthcheck.cnf, monitor.conf |
    | ${HAPROXY_ADMIN_PASS} | haproxy.cfg |
    | ${KEEPALIVED_AUTH_PASS}| keepalived.conf |
    | ${DINGTALK_TOKEN} | 各脚本配置文件 |
    | ${DINGTALK_SIGN} | 各脚本配置文件 |

  2. 部署流程

操作检查清单

  1. [ ] 创建/etc/galera目录
  2. [ ] 生成配置文件模板
  3. [ ] 替换所有占位符为实际值
  4. [ ] 设置配置文件权限为600
  5. [ ] 设置安全备份路径 ${SECURE_BACKUP_PATH}
  6. [ ] 验证各服务启动正常
  7. [ ] 测试备份恢复流程
  8. [ ] 验证监控告警功能

备份路径安全实践

  1. 专用存储位置
# 创建专用备份目录
mkdir -p ${SECURE_BACKUP_PATH}/galera_backups
chown -R mysql:mysql ${SECURE_BACKUP_PATH}
chmod 700 ${SECURE_BACKUP_PATH}
  1. 访问控制策略
    | 角色 | 访问权限 | 审计要求 |
    |————–|——————-|—————-|
    | root | 读写 | 完整操作日志 |
    | mysql | 读写 | 备份操作日志 |
    | 其他用户 | 无访问权限 | 访问拒绝记录 |


  • 无标签