[manager:154]
shell> masterha_check_ssh --conf=/etc/masterha/app1.cnf
Sun Mar 2 17:45:38 2014 - [debug] ok.
Sun Mar 2 17:45:38 2014 - [info] All SSH connection tests passed successfully.
2.masterha_check_repl 验证mysql复制是否成功
[manager:154]
shell> masterha_check_repl --conf=/etc/masterha/app1.cnf
---------------------------------------------------------
Sun Mar 2 13:16:57 2014 - [info] Slaves settings check done.
Sun Mar 2 13:16:57 2014 - [info]
10.10.54.156 (current master)
+--10.10.54.155
+--10.10.54.157
...
MySQL Replication Health is OK.
---------------------------------------------------------------
八.启动MHA manager,并监控日志文件
[manager:154]
shell> nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log 2>&1
shell> tail -f /masterha/app1/manager.log
---------------------------------------------------------------
10.10.54.156 (current master)
+--10.10.54.155
+--10.10.54.157
...
Sun Mar 2 13:09:25 2014 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
-----------------------------------------------------------------
监控的manager.log文件表明MHA运行良好,正在 "waiting until MySQL doesn't respond" 九.测试master(156)宕机后,是否会自动切换
1.测试自动切换是否成功
当掉master机子
shell> /etc/init.d/myqld stop
当掉master后,manager上的监控文件/masterha/app1/manager.log显示错误信息,表示不能自动切换:
[error]
-----------------------------------------------------------
Sun Mar 2 13:13:46 2014 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln178] Got ERROR: Use of uninitialized value $msg in scalar chomp at /usr/share/perl5/vendor_perl/MHA/ManagerConst.pm line 90.
-----------------------------------------------------------
解决这个错误是在文件/usr/share/perl5/vendor_perl/MHA/ManagerConst.pm 第90行(chomp $msg)前加入一行:
$msg = "" unless($msg);
好了,错误解决了,下面我们再次重复上面步骤:
master上mysql服务:shell> /etc/init.d/mysqld stop
再次查看manager机子上监控文件内容
shell> tail -f tail -f /masterha/app1/manager.log
日志文件显示:
-----------------------------------------------------------
----- Failover Report -----
app1: MySQL Master failover 10.10.54.156 to 10.10.54.155 succeeded
Master 10.10.54.156 is down!
Check MHA Manager logs at mycentos4:/masterha/app1/manager.log for details.
Started automated(non-interactive) failover.
The latest slave 10.10.54.155(10.10.54.155:3306) has all relay logs for recovery.
Selected 10.10.54.155 as a new master.
10.10.54.155: OK: Applying all logs succeeded.
10.10.54.157: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
10.10.54.157: OK: Applying all logs succeeded. Slave started, replicating from 10.10.54.155.
10.10.54.155: Resetting slave info succeeded.
Master failover to 10.10.54.155(10.10.54.155:3306) completed successfully.
--------------------------------------------------------
2.切换成功后,检查replication状态
[master:156]
shell> /etc/init.d/mysqld start
[manager:154]
shell> masterha_check_repl --conf=/etc/masterha/app1.cnf
--------------------------------------------------------------
Sun Mar 2 13:22:11 2014 - [info] Slaves settings check done.
Sun Mar 2 13:22:11 2014 - [info]
10.10.54.155 (current master)
+--10.10.54.156
+--10.10.54.157
...
MySQL Replication Health is OK.
---------------------------------------------------------------
上面的"10.10.54.155 (current master)" 这句表明master成功切换到155机子上 十.上一步测试之后,新master机为155,宕掉155机子,再次测试故障转移
1.启动管理节点
shell> nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log 2>&1
2.启动日志检测,然后当掉新master(155),然后查看监控文件变化
shell> tail -f /masterha/app1/manager.log
3.当掉155机子(即新的master)
shell> /etc/init.d/mysqld stop
4.查看manager主机上的监控文件变化
[error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln295] Last failover was done at 2014/03/02 13:02:47. Current time is too early to do failover again. If you want to do failover, manually remove /masterha/app1/app1.failover.complete and run this script again.
错误解决办法
1.日志文件提示切换master过快,需要删除/masterha/app1/app1.failover.complete
1.删除app1.failover.complete
shell> rm /masterha/app1/app1.failover.complete
5.重新测试:
master转移成功,重新转为156机子
--------------------------------------------------------
Master 10.10.54.155 is down!
Check MHA Manager logs at mycentos4:/masterha/app1/manager.log for details.
Started automated(non-interactive) failover.
The latest slave 10.10.54.156(10.10.54.156:3306) has all relay logs for recovery.
Selected 10.10.54.156 as a new master.
10.10.54.156: OK: Applying all logs succeeded.
10.10.54.157: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
10.10.54.157: OK: Applying all logs succeeded. Slave started, replicating from 10.10.54.156.
10.10.54.156: Resetting slave info succeeded.
Master failover to 10.10.54.156(10.10.54.156:3306) completed successfully.
-----------------------------------------------------------