...
Markdown |
---|
**架构说明**: - **负载层**: HAProxy + Keepalived 提供高可用负载均衡和VIP管理 - **数据库层**: MariaDB Galera 集群实现多活数据库 - **监控层**: Prometheus + Grafana 实现全方位监控 - **仲裁层**: Galera Arbitrator 提供奇数节点支持 --- ## 环境准备 ### 系统要求 - **操作系统**: RockyLinux 8/9 - **硬件要求**: - 最小 4GB RAM (生产环境建议 16GB+) - 最小 2 CPU 核心 (生产环境建议 8 核心+) - **网络要求**: - 节点间延迟 < 5ms - 节点间带宽 > 1Gbps ### 基础配置 ```bash # 所有节点执行 # 1. 禁用SELinux sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config setenforce 0 # 2. 配置防火墙 firewall-cmd --permanent --add-port={3306,4444,4567,4568,9200,8080}/tcp firewall-cmd --reload # 3. 配置主机名解析 cat >> /etc/hosts <<EOF 192.168.1.101 db1 192.168.1.102 db2 192.168.1.103 db3 192.168.1.201 lb1 192.168.1.202 lb2 EOF # 4. 时间同步 dnf install -y chrony systemctl enable --now chronyd chronyc sources ``` --- ## MariaDB Galera 集群部署 ### 安装组件 ```bash # 所有数据库节点执行 # 1. 添加MariaDB仓库 curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash # 2. 安装必要组件 dnf install -y MariaDB-server MariaDB-backup galera-4 ``` ### Galera 配置 **文件路径**:`/etc/my.cnf.d/galera.cnf` ```ini [mysqld] # 基础配置 binlog_format=ROW default-storage-engine=InnoDB innodb_autoinc_lock_mode=2 # Galera 配置 wsrep_on=ON wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so wsrep_cluster_name="galera_cluster" wsrep_cluster_address="gcomm://db1,db2,db3" wsrep_node_name="db1" # 节点名需修改 wsrep_node_address="192.168.1.101" # 节点IP需修改 wsrep_sst_method=mariabackup wsrep_sst_auth="sst_user:${SST_PASSWORD}" # 替换${SST_PASSWORD} # 防脑裂配置 wsrep_provider_options="gcache.size=1G; pc.ignore_sb=true; evs.suspect_timeout=PT30S; evs.inactive_timeout=PT1M" ``` ### 健康检测配置 (9200端口) ```bash # 所有数据库节点执行 # 1. 安装xinetd dnf install -y xinetd # 2. 创建检测脚本 cat > /usr/local/bin/galera_healthcheck <<'EOF' #!/bin/bash CONFIG_FILE="/etc/mysql/conf.d/healthcheck.cnf" RESPONSE=$(mysql --defaults-extra-file=$CONFIG_FILE -e "SHOW STATUS LIKE 'wsrep_ready'" 2>/dev/null | grep 'ON') if [ -n "$RESPONSE" ]; then echo -e "HTTP/1.1 200 OK\r\nContent-Length: 0\r\n\r\n" exit 0 else echo -e "HTTP/1.1 503 Service Unavailable\r\nContent-Length: 0\r\n\r\n" exit 1 fi EOF # 3. 创建配置文件 cat > /etc/mysql/conf.d/healthcheck.cnf <<EOF [client] user=exporter password=${EXPORTER_PASSWORD} # 替换${EXPORTER_PASSWORD} EOF chmod 600 /etc/mysql/conf.d/healthcheck.cnf chmod +x /usr/local/bin/galera_healthcheck # 4. 配置xinetd服务 cat > /etc/xinetd.d/galera-healthcheck <<EOF service galera-healthcheck { disable = no flags = REUSE socket_type = stream port = 9200 wait = no user = root group = root server = /usr/local/bin/galera_healthcheck log_on_failure += USERID per_source = UNLIMITED } EOF # 5. 启动服务 systemctl enable --now xinetd # 验证健康检查 curl -i http://localhost:9200 ``` ### 初始化集群 ```bash # 1. 在第一个节点初始化集群 galera_new_cluster # 2. 启动其他节点 systemctl start mariadb # 3. 所有节点设置开机自启 systemctl enable mariadb ``` ### 创建系统用户 ```sql -- 在所有节点执行 -- SST用户 (替换${SST_PASSWORD}) CREATE USER 'sst_user'@'localhost' IDENTIFIED BY '${SST_PASSWORD}'; GRANT RELOAD, PROCESS, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sst_user'@'localhost'; -- 监控用户 (替换${EXPORTER_PASSWORD}) CREATE USER 'exporter'@'localhost' IDENTIFIED BY '${EXPORTER_PASSWORD}' WITH MAX_USER_CONNECTIONS 3; GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost'; FLUSH PRIVILEGES; ``` --- ## HAProxy + Keepalived 部署 ### HAProxy 配置 **文件路径**:`/etc/haproxy/haproxy.cfg` ```conf global log /dev/log local0 maxconn 4000 user haproxy group haproxy defaults mode tcp timeout connect 10s timeout client 30s timeout server 30s frontend galera_front bind *:3306 default_backend galera_back backend galera_back balance leastconn option tcp-check tcp-check connect port 9200 server db1 192.168.1.101:3306 check port 9200 inter 5s rise 2 fall 3 server db2 192.168.1.102:3306 check port 9200 inter 5s rise 2 fall 3 server db3 192.168.1.103:3306 check port 9200 inter 5s rise 2 fall 3 listen stats bind *:8080 mode http stats enable stats uri /haproxy?stats stats refresh 10s stats auth admin:${HAPROXY_ADMIN_PASS} # 替换${HAPROXY_ADMIN_PASS} ``` ### Keepalived 配置 **主节点文件路径**:`/etc/keepalived/keepalived.conf` ```conf vrrp_script chk_haproxy { script "pidof haproxy" interval 2 weight 2 } vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 51 priority 100 advert_int 1 authentication { auth_type PASS auth_pass ${KEEPALIVED_AUTH_PASS} # 替换${KEEPALIVED_AUTH_PASS} } virtual_ipaddress { 192.168.1.200/24 } track_script { chk_haproxy } } ``` > **备节点配置**: > - `state` 设置为 `BACKUP` > - `priority` 设置为 `90` ### 启动服务 ```bash # 所有负载节点执行 systemctl enable --now haproxy keepalived # 验证VIP ip addr show eth0 | grep 192.168.1.200 ``` --- ## 备份与恢复 ### 定时备份脚本 **文件路径**:`/usr/local/bin/backup_galera.sh` ```bash #!/bin/bash # 配置文件:包含所有敏感信息 CONFIG_FILE="/etc/galera/backup.conf" # 加载配置文件 if [ ! -f "$CONFIG_FILE" ]; then echo "$(date) - 配置文件 $CONFIG_FILE 不存在" >> /var/log/mariabackup.log exit 1 fi source "$CONFIG_FILE" # 创建备份目录 mkdir -p $BACKUP_DIR/$DATE || { echo "$(date) - 创建目录失败" >> $LOG_FILE /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "MariaDB MariaDB节点$(hostname)备份失败:创建目录失败" exit 1 } chown -R mysql:mysql $BACKUP_DIR/$DATE # 设置文件限制 ulimit -n 65535 # 开始备份 echo "===== 开始备份: $(date) =====" >> $LOG_FILE mariabackup --backup \ --user=$BACKUP_USER \ --password=$BACKUP_PASS \ --target-dir=$BACKUP_DIR/$DATE \ --parallel=4 \ --galera-info \ --open-files-limit=65535 2>> $LOG_FILE if [ $? -ne 0 ]; then echo "备份失败!" >> $LOG_FILE /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "MariaDB MariaDB节点$(hostname)备份失败:$(tail -n 10 $LOG_FILE)" exit 1 fi # 准备备份 echo "===== 准备备份: $(date) =====" >> $LOG_FILE mariabackup --prepare \ --target-dir=$BACKUP_DIR/$DATE \ --open-files-limit=65535 2>> $LOG_FILE # 验证关键文件存在 if [ -f "$BACKUP_DIR/$DATE/xtrabackup_checkpoints" ]; then echo "备份验证成功" >> $LOG_FILE # 使用XZ多核压缩备份目录 echo "===== 压缩备份: $(date) =====" >> $LOG_FILE tar -I 'xz -T0' -cf $BACKUP_DIR/$DATE.tar.xz -C $BACKUP_DIR $DATE if [ $? -eq 0 ]; then # 删除原始备份目录 rm -rf $BACKUP_DIR/$DATE echo "备份压缩成功" >> $LOG_FILE else echo "备份压缩失败!" >> $LOG_FILE /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "MariaDB MariaDB节点$(hostname)备份压缩失败" fi else echo "备份验证失败: 缺少xtrabackup_checkpoints文件!" >> $LOG_FILE /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "MariaDB MariaDB节点$(hostname)备份验证失败:缺少关键文件" exit 1 fi # 保留最近365天备份 find $BACKUP_DIR -name "*.tar.xz" -mtime +365 -delete echo "===== 备份完成: $(date) =====" >> $LOG_FILE ``` ### 备份配置文件 **文件路径**:`/etc/galera/backup.conf` ```bash # 备份配置 - 请替换实际值 BACKUP_ROOT="${SECURE_BACKUP_PATH}" # 替换为安全路径 BACKUP_DIR="${BACKUP_ROOT}/galera_backups" DATE=$(date +%Y%m%d-%H%M%S) BACKUP_USER="sst_user" BACKUP_PASS="${SST_PASSWORD}" # 替换${SST_PASSWORD} LOG_FILE="/var/log/mariabackup.log" DT_TOKEN="${DINGTALK_TOKEN}" # 替换${DINGTALK_TOKEN} DT_SIGN="${DINGTALK_SIGN}" # 替换${DINGTALK_SIGN} ``` ### 设置定时任务 ```bash chmod 600 /etc/galera/backup.conf chmod +x /usr/local/bin/backup_galera.sh # 每天凌晨2点执行备份 echo "0 2 * * * /usr/local/bin/backup_galera.sh" | crontab - ``` ### 自动化恢复脚本 **文件路径**:`/usr/local/bin/restore_galera.sh` ```bash #!/bin/bash # Galera集群数据恢复脚本 # 用法: restore_galera.sh <备份文件路径> # 示例: restore_galera.sh ${BACKUP_ROOT}/galera_backups/20231001-020000.tar.xz # 配置文件 CONFIG_FILE="/etc/galera/restore.conf" source "$CONFIG_FILE" if [ $# -eq 0 ]; then echo "错误:请提供备份文件路径" echo "用法: $0 <备份文件路径>" exit 1 fi BACKUP_FILE=$1 TEMP_DIR="/tmp/galera_restore_$(date +%s)" LOG_FILE="/var/log/galera_restore.log" # 验证备份文件存在 if [ ! -f "$BACKUP_FILE" ]; then echo "错误:备份文件 $BACKUP_FILE 不存在" | tee -a $LOG_FILE exit 1 fi echo "===== 开始恢复: $(date) =====" | tee -a $LOG_FILE # 1. 停止MariaDB服务 echo "[步骤1/8] 停止MariaDB服务..." | tee -a $LOG_FILE systemctl stop mariadb >> $LOG_FILE 2>&1 if [ $? -ne 0 ]; then echo "错误:停止MariaDB服务失败" | tee -a $LOG_FILE /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "Galera恢复失败:停止服务失败" exit 1 fi # 2. 清空数据目录 echo "[步骤2/8] 清空数据目录..." | tee -a $LOG_FILE rm -rf /var/lib/mysql/* >> $LOG_FILE 2>&1 # 3. 创建临时目录 echo "[步骤3/8] 创建临时目录 $TEMP_DIR..." | tee -a $LOG_FILE mkdir -p $TEMP_DIR >> $LOG_FILE 2>&1 if [ $? -ne 0 ]; then echo "错误:创建临时目录失败" | tee -a $LOG_FILE /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "Galera恢复失败:创建临时目录失败" exit 1 fi # 4. 解压备份(使用XZ多核解压) echo "[步骤4/8] 解压备份文件 $BACKUP_FILE..." | tee -a $LOG_FILE tar -I 'xz -T0' -xf $BACKUP_FILE -C $TEMP_DIR >> $LOG_FILE 2>&1 if [ $? -ne 0 ]; then echo "错误:解压备份失败" | tee -a $LOG_FILE /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "Galera恢复失败:解压备份失败" exit 1 fi # 获取解压后的目录名 BACKUP_DIR_NAME=$(ls $TEMP_DIR) RESTORE_DIR="$TEMP_DIR/$BACKUP_DIR_NAME" # 5. 执行恢复 echo "[步骤5/8] 恢复数据到/var/lib/mysql..." | tee -a $LOG_FILE mariabackup --copy-back \ --target-dir=$RESTORE_DIR \ --open-files-limit=65535 >> $LOG_FILE 2>&1 if [ $? -ne 0 ]; then echo "错误:数据恢复失败" | tee -a $LOG_FILE /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "Galera恢复失败:数据恢复失败" exit 1 fi # 6. 修复权限 echo "[步骤6/8] 修复数据目录权限..." | tee -a $LOG_FILE chown -R mysql:mysql /var/lib/mysql >> $LOG_FILE 2>&1 # 7. 启动服务 echo "[步骤7/8] 启动MariaDB服务..." | tee -a $LOG_FILE systemctl start mariadb >> $LOG_FILE 2>&1 if [ $? -ne 0 ]; then echo "错误:启动MariaDB服务失败" | tee -a $LOG_FILE /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "Galera恢复失败:启动服务失败" exit 1 fi # 8. 验证恢复 echo "[步骤8/8] 验证集群状态和数据一致性..." | tee -a $LOG_FILE mysql -e "SHOW STATUS LIKE 'wsrep%'" >> $LOG_FILE 2>&1 mysql -e "CHECK TABLE mysql.user EXTENDED" >> $LOG_FILE 2>&1 # 清理临时文件 rm -rf $TEMP_DIR echo "===== 恢复完成: $(date) =====" | tee -a $LOG_FILE echo "恢复日志: $LOG_FILE" echo "请验证数据完整性和集群状态" # 发送成功通知 /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "Galera数据恢复成功完成" ``` ### 恢复配置文件 **文件路径**:`/etc/galera/restore.conf` ```bash # 恢复配置 - 请替换实际值 DT_TOKEN="${DINGTALK_TOKEN}" # 替换${DINGTALK_TOKEN} DT_SIGN="${DINGTALK_SIGN}" # 替换${DINGTALK_SIGN} ``` ### 恢复操作流程 1. **准备配置文件**: ```bash # 创建配置文件 sudo mkdir -p /etc/galera sudo nano /etc/galera/restore.conf # 添加内容: # DT_TOKEN="your_actual_token" # DT_SIGN="your_actual_sign" sudo chmod 600 /etc/galera/restore.conf ``` 2. **选择备份文件**: ```bash # 列出可用备份 ls -l ${SECURE_BACKUP_PATH}/galera_backups/*.tar.xz ``` 3. **执行恢复脚本**: ```bash chmod +x /usr/local/bin/restore_galera.sh /usr/local/bin/restore_galera.sh ${SECURE_BACKUP_PATH}/galera_backups/20231001-020000.tar.xz ``` 4. **验证恢复结果**: ```sql -- 检查集群状态 SHOW STATUS LIKE 'wsrep%'; -- 验证关键数据 SELECT COUNT(*) FROM important_table; ``` ### 恢复流程注意事项 1. **配置文件安全**:确保配置文件权限为600 2. **空间要求**:确保有足够的磁盘空间(XZ压缩需要更多临时空间) 3. **服务中断**:恢复期间数据库不可用 4. **验证步骤**:恢复后必须执行数据验证 5. **性能优化**:XZ多核解压会充分利用CPU资源 --- ## 故障处理手册 ### 常规故障排查 ```sql -- 检查集群状态 SHOW STATUS LIKE 'wsrep%'; -- 关键指标: -- wsrep_cluster_size: 集群节点数 (应为3) -- wsrep_cluster_status: 集群状态 (应为Primary) -- wsrep_connected: 网络连接状态 (应为ON) -- wsrep_ready: 节点就绪状态 (应为ON) ``` ### 脑裂检测与处理 #### 自动化脑裂检测脚本 **文件路径**:`/usr/local/bin/cluster_integrity_check.sh` ```bash #!/bin/bash # 配置文件 CONFIG_FILE="/etc/galera/monitor.conf" source "$CONFIG_FILE" # 获取当前节点状态 CURRENT_STATUS=$(mysql -u$MONITOR_USER -p$MONITOR_PASS -e "SHOW STATUS LIKE 'wsrep_cluster_status'" -sN | awk '{print $2}') CURRENT_SIZE=$(mysql -u$MONITOR_USER -p$MONITOR_PASS -e "SHOW STATUS LIKE 'wsrep_cluster_size'" -sN | awk '{print $2}') if [ "$CURRENT_STATUS" != "Primary" ]; then MESSAGE="[CRITICAL] Node $(hostname) in $CURRENT_STATUS state!" /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "$MESSAGE" exit 1 fi if [ "$CURRENT_SIZE" -lt "$EXPECTED_SIZE" ]; then MESSAGE="[CRITICAL] Cluster size $CURRENT_SIZE < $EXPECTED_SIZE!" /usr/local/bin/dingtalk-cli-sender -t $DT_TOKEN -s $DT_SIGN "$MESSAGE" exit 1 fi ``` ### 监控配置文件 **文件路径**:`/etc/galera/monitor.conf` ```bash # 监控配置 - 请替换实际值 EXPECTED_SIZE=3 MONITOR_USER="exporter" MONITOR_PASS="${EXPORTER_PASSWORD}" # 替换${EXPORTER_PASSWORD} DT_TOKEN="${DINGTALK_TOKEN}" # 替换${DINGTALK_TOKEN} DT_SIGN="${DINGTALK_SIGN}" # 替换${DINGTALK_SIGN} ``` --- ## 附录 ### 敏感信息占位符说明 | **占位符** | **说明** | **示例值** | |--------------------------|------------------------------|-------------------------| | `${SECURE_BACKUP_PATH}` | 安全备份存储路径 | /secure/db_backups | | `${SST_PASSWORD}` | SST用户密码 | s3cretPass | | `${EXPORTER_PASSWORD}` | 监控用户密码 | ExporterPass123 | | `${HAPROXY_ADMIN_PASS}` | HAProxy管理密码 | AdminPass123 | | `${KEEPALIVED_AUTH_PASS}`| Keepalived认证密码 | Keep@l1ve | | `${DINGTALK_TOKEN}` | 钉钉机器人Token | d14e9ddfefd414... | | `${DINGTALK_SIGN}` | 钉钉机器人签名 | SEC1089bac8cb8b... | ### 安全最佳实践 1. **配置文件管理**: ```bash # 设置配置文件权限 chmod 600 /etc/galera/*.conf chown root:root /etc/galera/*.conf ``` 2. **密码轮换策略**: - SST密码:每90天轮换 - 监控密码:每180天轮换 - HAProxy密码:每365天轮换 3. **审计日志**: ```bash # 记录所有管理操作 echo "$(date) $(whoami) $@" >> /var/log/galera_audit.log ``` ### 版本更新记录 | **版本** | **日期** | **修改内容** | |----------|----------------|----------------------------------| | 1.7 | 2025-10-03 | 添加自动化恢复脚本 | | **1.8** | **2025-10-04** | **统一敏感信息管理,增强安全性** | | | | **备份压缩优化为XZ多核格式** | | | | **敏感路径隐私保护** | ### 紧急恢复联系人 | **角色** | **联系方式** | **响应时间** | |----------------|-------------------------|--------------| | 值班运维 | +86-138XXXXXXX | 24/7 | | 高级DBA | dba-oncall@example.com | 24/7 | > **最后更新**: 2025-10-04 > **安全等级**: 机密 > **文档校验**: 使用校验和验证完整性 > `md5sum: a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6` --- ## 安全增强说明 ### 敏感信息管理策略 1. **统一配置中心**: - 所有敏感信息存储在`/etc/galera/`目录下的配置文件中 - 配置文件权限设置为`600`,仅root可访问 - 使用环境变量或source命令加载配置 2. **占位符系统**: | 占位符 | 实际值位置 | |------------------------|--------------------------| | `${SECURE_BACKUP_PATH}` | backup.conf | | `${SST_PASSWORD}` | galera.cnf, backup.conf | | `${EXPORTER_PASSWORD}` | healthcheck.cnf, monitor.conf | | `${HAPROXY_ADMIN_PASS}` | haproxy.cfg | | `${KEEPALIVED_AUTH_PASS}`| keepalived.conf | | `${DINGTALK_TOKEN}` | 各脚本配置文件 | | `${DINGTALK_SIGN}` | 各脚本配置文件 | 3. **部署流程**: |
...