注意:本文档中所有敏感信息均已模糊化处理,实际使用时请替换占位符
目录
架构概述
架构说明:
- 负载层: HAProxy + Keepalived 提供高可用负载均衡和VIP管理
- 数据库层: MariaDB Galera 集群实现多活数据库
- 监控层: Prometheus + Grafana 实现全方位监控
- 仲裁层: Galera Arbitrator 提供奇数节点支持
环境准备
系统要求
- 操作系统: RockyLinux 8/9
- 硬件要求:
- 最小 4GB RAM (生产环境建议 16GB+)
- 最小 2 CPU 核心 (生产环境建议 8 核心+)
- 网络要求:
- 节点间延迟 < 5ms
- 节点间带宽 > 1Gbps
基础配置
# 所有节点执行 # 1. 禁用SELinux sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config setenforce 0 # 2. 配置防火墙 firewall-cmd --permanent --add-port={3306,4444,4567,4568,9200,8080}/tcp firewall-cmd --reload # 3. 配置主机名解析 cat >> /etc/hosts <<EOF 192.168.1.101 db1 192.168.1.102 db2 192.168.1.103 db3 192.168.1.201 lb1 192.168.1.202 lb2 EOF # 4. 时间同步 dnf install -y chrony systemctl enable --now chronyd chronyc sources
MariaDB Galera 集群部署
安装组件
# 所有数据库节点执行 # 1. 添加MariaDB仓库 curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash # 2. 安装必要组件 dnf install -y MariaDB-server MariaDB-backup galera-4
Galera 配置
文件路径:/etc/my.cnf.d/galera.cnf
[mysqld] # 基础配置 binlog_format=ROW default-storage-engine=InnoDB innodb_autoinc_lock_mode=2 # Galera 配置 wsrep_on=ON wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so wsrep_cluster_name="galera_cluster" wsrep_cluster_address="gcomm://db1,db2,db3" wsrep_node_name="db1" # 节点名需修改 wsrep_node_address="192.168.1.101" # 节点IP需修改 wsrep_sst_method=mariabackup wsrep_sst_auth="sst_user:${SST_PASSWORD}" # 替换${SST_PASSWORD} # 防脑裂配置 wsrep_provider_options="gcache.size=1G; pc.ignore_sb=true; evs.suspect_timeout=PT30S; evs.inactive_timeout=PT1M"
健康检测配置 (9200端口)
# 所有数据库节点执行 # 1. 安装xinetd dnf install -y xinetd # 2. 创建检测脚本 cat > /usr/local/bin/galera_healthcheck <<'EOF' #!/bin/bash CONFIG_FILE="/etc/mysql/conf.d/healthcheck.cnf" RESPONSE=$(mysql --defaults-extra-file=$CONFIG_FILE -e "SHOW STATUS LIKE 'wsrep_ready'" 2>/dev/null | grep 'ON') if [ -n "$RESPONSE" ]; then echo -e "HTTP/1.1 200 OK\r\nContent-Length: 0\r\n\r\n" exit 0 else echo -e "HTTP/1.1 503 Service Unavailable\r\nContent-Length: 0\r\n\r\n" exit 1 fi EOF # 3. 创建配置文件 cat > /etc/mysql/conf.d/healthcheck.cnf <<EOF [client] user=exporter password=${EXPORTER_PASSWORD} # 替换${EXPORTER_PASSWORD} EOF chmod 600 /etc/mysql/conf.d/healthcheck.cnf chmod +x /usr/local/bin/galera_healthcheck # 4. 配置xinetd服务 cat > /etc/xinetd.d/galera-healthcheck <<EOF service galera-healthcheck { disable = no flags = REUSE socket_type = stream port = 9200 wait = no user = root group = root server = /usr/local/bin/galera_healthcheck log_on_failure += USERID per_source = UNLIMITED } EOF # 5. 启动服务 systemctl enable --now xinetd # 验证健康检查 curl -i http://localhost:9200
初始化集群
# 1. 在第一个节点初始化集群 galera_new_cluster # 2. 启动其他节点 systemctl start mariadb # 3. 所有节点设置开机自启 systemctl enable mariadb
创建系统用户
-- 在所有节点执行 -- SST用户 (替换${SST_PASSWORD}) CREATE USER 'sst_user'@'localhost' IDENTIFIED BY '${SST_PASSWORD}'; GRANT RELOAD, PROCESS, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sst_user'@'localhost'; -- 监控用户 (替换${EXPORTER_PASSWORD}) CREATE USER 'exporter'@'localhost' IDENTIFIED BY '${EXPORTER_PASSWORD}' WITH MAX_USER_CONNECTIONS 3; GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost'; FLUSH PRIVILEGES;
HAProxy + Keepalived 部署
HAProxy 配置
文件路径:/etc/haproxy/haproxy.cfg
global log /dev/log local0 maxconn 4000 user haproxy group haproxy defaults mode tcp timeout connect 10s timeout client 30s timeout server 30s frontend galera_front bind *:3306 default_backend galera_back backend galera_back balance leastconn option tcp-check tcp-check connect port 9200 server db1 192.168.1.101:3306 check port 9200 inter 5s rise 2 fall 3 server db2 192.168.1.102:3306 check port 9200 inter 5s rise 2 fall 3 server db3 192.168.1.103:3306 check port 9200 inter 5s rise 2 fall 3 listen stats bind *:8080 mode http stats enable stats uri /haproxy?stats stats refresh 10s stats auth admin:${HAPROXY_ADMIN_PASS} # 替换${HAPROXY_ADMIN_PASS}
Keepalived 配置
主节点文件路径:/etc/keepalived/keepalived.conf
vrrp_script chk_haproxy { script "pidof haproxy" interval 2 weight 2 } vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 51 priority 100 advert_int 1 authentication { auth_type PASS auth_pass ${KEEPALIVED_AUTH_PASS} # 替换${KEEPALIVED_AUTH_PASS} } virtual_ipaddress { 192.168.1.200/24 } track_script { chk_haproxy } }
备节点配置:
-state
设置为BACKUP
-priority
设置为90
启动服务
# 所有负载节点执行 systemctl enable --now haproxy keepalived # 验证VIP ip addr show eth0 | grep 192.168.1.200
备份与恢复
定时备份脚本
文件路径:/usr/local/bin/backup_galera.sh
#!/bin/bash # 配置文件:包含所有敏感信息 CONFIG_FILE="/etc/galera/backup.conf" # 加载配置文件 if [ ! -f "$CONFIG_FILE" ]; then echo "$(date) - 配置文件 $CONFIG_FILE 不存在" >> /var/log/mariabackup.log exit 1 fi source "$CONFIG_FILE" # 创建备份目录 mkdir -p $BACKUP_DIR/$DATE || { echo "$(date) - 创建目录失败" >> $LOG_FILE /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "MariaDB节点$(hostname)备份失败:创建目录失败" exit 1 } chown -R mysql:mysql $BACKUP_DIR/$DATE # 设置文件限制 ulimit -n 65535 # 开始备份 echo "===== 开始备份: $(date) =====" >> $LOG_FILE mariabackup --backup \ --user=$BACKUP_USER \ --password=$BACKUP_PASS \ --target-dir=$BACKUP_DIR/$DATE \ --parallel=4 \ --galera-info \ --open-files-limit=65535 2>> $LOG_FILE if [ $? -ne 0 ]; then echo "备份失败!" >> $LOG_FILE /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "MariaDB节点$(hostname)备份失败:$(tail -n 10 $LOG_FILE)" exit 1 fi # 准备备份 echo "===== 准备备份: $(date) =====" >> $LOG_FILE mariabackup --prepare \ --target-dir=$BACKUP_DIR/$DATE \ --open-files-limit=65535 2>> $LOG_FILE # 验证关键文件存在 if [ -f "$BACKUP_DIR/$DATE/xtrabackup_checkpoints" ]; then echo "备份验证成功" >> $LOG_FILE # 使用XZ多核压缩备份目录 echo "===== 压缩备份: $(date) =====" >> $LOG_FILE tar -I 'xz -T0' -cf $BACKUP_DIR/$DATE.tar.xz -C $BACKUP_DIR $DATE if [ $? -eq 0 ]; then # 删除原始备份目录 rm -rf $BACKUP_DIR/$DATE echo "备份压缩成功" >> $LOG_FILE else echo "备份压缩失败!" >> $LOG_FILE /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "MariaDB节点$(hostname)备份压缩失败" fi else echo "备份验证失败: 缺少xtrabackup_checkpoints文件!" >> $LOG_FILE /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "MariaDB节点$(hostname)备份验证失败:缺少关键文件" exit 1 fi # 保留最近365天备份 find $BACKUP_DIR -name "*.tar.xz" -mtime +365 -delete echo "===== 备份完成: $(date) =====" >> $LOG_FILE
备份配置文件
文件路径:/etc/galera/backup.conf
# 备份配置 - 请替换实际值 BACKUP_ROOT="${SECURE_BACKUP_PATH}" # 替换为安全路径 BACKUP_DIR="${BACKUP_ROOT}/galera_backups" DATE=$(date +%Y%m%d-%H%M%S) BACKUP_USER="sst_user" BACKUP_PASS="${SST_PASSWORD}" # 替换${SST_PASSWORD} LOG_FILE="/var/log/mariabackup.log" DT_TOKEN="${DINGTALK_TOKEN}" # 替换${DINGTALK_TOKEN} DT_SIGN="${DINGTALK_SIGN}" # 替换${DINGTALK_SIGN}
设置定时任务
chmod 600 /etc/galera/backup.conf chmod +x /usr/local/bin/backup_galera.sh # 每天凌晨2点执行备份 echo "0 2 * * * /usr/local/bin/backup_galera.sh" | crontab -
自动化恢复脚本
文件路径:/usr/local/bin/restore_galera.sh
#!/bin/bash # Galera集群数据恢复脚本 # 用法: restore_galera.sh <备份文件路径> # 示例: restore_galera.sh ${BACKUP_ROOT}/galera_backups/20231001-020000.tar.xz # 配置文件 CONFIG_FILE="/etc/galera/restore.conf" source "$CONFIG_FILE" if [ $# -eq 0 ]; then echo "错误:请提供备份文件路径" echo "用法: $0 <备份文件路径>" exit 1 fi BACKUP_FILE=$1 TEMP_DIR="/tmp/galera_restore_$(date +%s)" LOG_FILE="/var/log/galera_restore.log" # 验证备份文件存在 if [ ! -f "$BACKUP_FILE" ]; then echo "错误:备份文件 $BACKUP_FILE 不存在" | tee -a $LOG_FILE exit 1 fi echo "===== 开始恢复: $(date) =====" | tee -a $LOG_FILE # 1. 停止MariaDB服务 echo "[步骤1/8] 停止MariaDB服务..." | tee -a $LOG_FILE systemctl stop mariadb >> $LOG_FILE 2>&1 if [ $? -ne 0 ]; then echo "错误:停止MariaDB服务失败" | tee -a $LOG_FILE /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "Galera恢复失败:停止服务失败" exit 1 fi # 2. 清空数据目录 echo "[步骤2/8] 清空数据目录..." | tee -a $LOG_FILE rm -rf /var/lib/mysql/* >> $LOG_FILE 2>&1 # 3. 创建临时目录 echo "[步骤3/8] 创建临时目录 $TEMP_DIR..." | tee -a $LOG_FILE mkdir -p $TEMP_DIR >> $LOG_FILE 2>&1 if [ $? -ne 0 ]; then echo "错误:创建临时目录失败" | tee -a $LOG_FILE /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "Galera恢复失败:创建临时目录失败" exit 1 fi # 4. 解压备份(使用XZ多核解压) echo "[步骤4/8] 解压备份文件 $BACKUP_FILE..." | tee -a $LOG_FILE tar -I 'xz -T0' -xf $BACKUP_FILE -C $TEMP_DIR >> $LOG_FILE 2>&1 if [ $? -ne 0 ]; then echo "错误:解压备份失败" | tee -a $LOG_FILE /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "Galera恢复失败:解压备份失败" exit 1 fi # 获取解压后的目录名 BACKUP_DIR_NAME=$(ls $TEMP_DIR) RESTORE_DIR="$TEMP_DIR/$BACKUP_DIR_NAME" # 5. 执行恢复 echo "[步骤5/8] 恢复数据到/var/lib/mysql..." | tee -a $LOG_FILE mariabackup --copy-back \ --target-dir=$RESTORE_DIR \ --open-files-limit=65535 >> $LOG_FILE 2>&1 if [ $? -ne 0 ]; then echo "错误:数据恢复失败" | tee -a $LOG_FILE /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "Galera恢复失败:数据恢复失败" exit 1 fi # 6. 修复权限 echo "[步骤6/8] 修复数据目录权限..." | tee -a $LOG_FILE chown -R mysql:mysql /var/lib/mysql >> $LOG_FILE 2>&1 # 7. 启动服务 echo "[步骤7/8] 启动MariaDB服务..." | tee -a $LOG_FILE systemctl start mariadb >> $LOG_FILE 2>&1 if [ $? -ne 0 ]; then echo "错误:启动MariaDB服务失败" | tee -a $LOG_FILE /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "Galera恢复失败:启动服务失败" exit 1 fi # 8. 验证恢复 echo "[步骤8/8] 验证集群状态和数据一致性..." | tee -a $LOG_FILE mysql -e "SHOW STATUS LIKE 'wsrep%'" >> $LOG_FILE 2>&1 mysql -e "CHECK TABLE mysql.user EXTENDED" >> $LOG_FILE 2>&1 # 清理临时文件 rm -rf $TEMP_DIR echo "===== 恢复完成: $(date) =====" | tee -a $LOG_FILE echo "恢复日志: $LOG_FILE" echo "请验证数据完整性和集群状态" # 发送成功通知 /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "Galera数据恢复成功完成"
恢复配置文件
文件路径:/etc/galera/restore.conf
# 恢复配置 - 请替换实际值 DT_TOKEN="${DINGTALK_TOKEN}" # 替换${DINGTALK_TOKEN} DT_SIGN="${DINGTALK_SIGN}" # 替换${DINGTALK_SIGN}
恢复操作流程
- 准备配置文件:
# 创建配置文件 sudo mkdir -p /etc/galera sudo nano /etc/galera/restore.conf # 添加内容: # DT_TOKEN="your_actual_token" # DT_SIGN="your_actual_sign" sudo chmod 600 /etc/galera/restore.conf
- 选择备份文件:
# 列出可用备份 ls -l ${SECURE_BACKUP_PATH}/galera_backups/*.tar.xz
- 执行恢复脚本:
chmod +x /usr/local/bin/restore_galera.sh /usr/local/bin/restore_galera.sh ${SECURE_BACKUP_PATH}/galera_backups/20231001-020000.tar.xz
- 验证恢复结果:
-- 检查集群状态 SHOW STATUS LIKE 'wsrep%'; -- 验证关键数据 SELECT COUNT(*) FROM important_table;
恢复流程注意事项
- 配置文件安全:确保配置文件权限为600
- 空间要求:确保有足够的磁盘空间(XZ压缩需要更多临时空间)
- 服务中断:恢复期间数据库不可用
- 验证步骤:恢复后必须执行数据验证
- 性能优化:XZ多核解压会充分利用CPU资源
故障处理手册
常规故障排查
-- 检查集群状态 SHOW STATUS LIKE 'wsrep%'; -- 关键指标: -- wsrep_cluster_size: 集群节点数 (应为3) -- wsrep_cluster_status: 集群状态 (应为Primary) -- wsrep_connected: 网络连接状态 (应为ON) -- wsrep_ready: 节点就绪状态 (应为ON)
脑裂检测与处理
自动化脑裂检测脚本
文件路径:/usr/local/bin/cluster_integrity_check.sh
#!/bin/bash # 配置文件 CONFIG_FILE="/etc/galera/monitor.conf" source "$CONFIG_FILE" # 获取当前节点状态 CURRENT_STATUS=$(mysql -u$MONITOR_USER -p$MONITOR_PASS -e "SHOW STATUS LIKE 'wsrep_cluster_status'" -sN | awk '{print $2}') CURRENT_SIZE=$(mysql -u$MONITOR_USER -p$MONITOR_PASS -e "SHOW STATUS LIKE 'wsrep_cluster_size'" -sN | awk '{print $2}') if [ "$CURRENT_STATUS" != "Primary" ]; then MESSAGE="[CRITICAL] Node $(hostname) in $CURRENT_STATUS state!" /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "$MESSAGE" exit 1 fi if [ "$CURRENT_SIZE" -lt "$EXPECTED_SIZE" ]; then MESSAGE="[CRITICAL] Cluster size $CURRENT_SIZE < $EXPECTED_SIZE!" /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "$MESSAGE" exit 1 fi
监控配置文件
文件路径:/etc/galera/monitor.conf
# 监控配置 - 请替换实际值 EXPECTED_SIZE=3 MONITOR_USER="exporter" MONITOR_PASS="${EXPORTER_PASSWORD}" # 替换${EXPORTER_PASSWORD} DT_TOKEN="${DINGTALK_TOKEN}" # 替换${DINGTALK_TOKEN} DT_SIGN="${DINGTALK_SIGN}" # 替换${DINGTALK_SIGN}
附录
敏感信息占位符说明
占位符 | 说明 | 示例值 |
---|---|---|
${SECURE_BACKUP_PATH} |
安全备份存储路径 | /secure/db_backups |
${SST_PASSWORD} |
SST用户密码 | s3cretPass |
${EXPORTER_PASSWORD} |
监控用户密码 | ExporterPass123 |
${HAPROXY_ADMIN_PASS} |
HAProxy管理密码 | AdminPass123 |
${KEEPALIVED_AUTH_PASS} |
Keepalived认证密码 | Keep@l1ve |
${DINGTALK_TOKEN} |
钉钉机器人Token | d14e9ddfefd414… |
${DINGTALK_SIGN} |
钉钉机器人签名 | SEC1089bac8cb8b… |
安全最佳实践
- 配置文件管理:
# 设置配置文件权限 chmod 600 /etc/galera/*.conf chown root:root /etc/galera/*.conf
- 密码轮换策略:
- SST密码:每90天轮换
- 监控密码:每180天轮换
- HAProxy密码:每365天轮换
- 审计日志:
# 记录所有管理操作 echo "$(date) $(whoami) $@" >> /var/log/galera_audit.log
版本更新记录
版本 | 日期 | 修改内容 |
---|---|---|
1.7 | 2025-10-03 | 添加自动化恢复脚本 |
1.8 | 2025-10-04 | 统一敏感信息管理,增强安全性 |
备份压缩优化为XZ多核格式 | ||
敏感路径隐私保护 |
紧急恢复联系人
角色 | 联系方式 | 响应时间 |
---|---|---|
值班运维 | +86-138XXXXXXX | 24/7 |
高级DBA | dba-oncall@example.com | 24/7 |
最后更新: 2025-10-04
安全等级: 机密
文档校验: 使用校验和验证完整性md5sum: a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6
安全增强说明
敏感信息管理策略
- 统一配置中心:
- 所有敏感信息存储在
/etc/galera/
目录下的配置文件中 - 配置文件权限设置为
600
,仅root可访问 - 使用环境变量或source命令加载配置
-
占位符系统:
| 占位符 | 实际值位置 |
|————————|————————–|
|${SECURE_BACKUP_PATH}
| backup.conf |
|${SST_PASSWORD}
| galera.cnf, backup.conf |
|${EXPORTER_PASSWORD}
| healthcheck.cnf, monitor.conf |
|${HAPROXY_ADMIN_PASS}
| haproxy.cfg |
|${KEEPALIVED_AUTH_PASS}
| keepalived.conf |
|${DINGTALK_TOKEN}
| 各脚本配置文件 |
|${DINGTALK_SIGN}
| 各脚本配置文件 | -
部署流程:
操作检查清单
- [ ] 创建
/etc/galera
目录 - [ ] 生成配置文件模板
- [ ] 替换所有占位符为实际值
- [ ] 设置配置文件权限为600
- [ ] 设置安全备份路径
${SECURE_BACKUP_PATH}
- [ ] 验证各服务启动正常
- [ ] 测试备份恢复流程
- [ ] 验证监控告警功能
备份路径安全实践
- 专用存储位置:
# 创建专用备份目录 mkdir -p ${SECURE_BACKUP_PATH}/galera_backups chown -R mysql:mysql ${SECURE_BACKUP_PATH} chmod 700 ${SECURE_BACKUP_PATH}
- 访问控制策略:
| 角色 | 访问权限 | 审计要求 |
|————–|——————-|—————-|
| root | 读写 | 完整操作日志 |
| mysql | 读写 | 备份操作日志 |
| 其他用户 | 无访问权限 | 访问拒绝记录 |