3. 1.MHA 场景:
在下面的集群中, 通过手工控制, 模拟出 master 和各个 slave 不一致。 master
如
上表 qwsh 有四条记录,而 10.0.0.75 上只有一条记录:
10.0.0.13 (current master)
+--10.0.0.74
+--10.0.0.11
+--10.0.0.75
Server Role Table Column Rows
10.0.0.13 Master Qwsh Aa int 1,2,3,4
10.0.0.11 Slave Qwsh Aa int 1,2,3
10.0.0.74 Slave(candidate master) Qwsh Aa int 1,2
10.0.0.75 slave Qwsh Aa int 1
2.MHA 切换过程
以下通过 manual failover 来详细解析一下过程:
2.1 Phase 1: Configuration Check Phase..
主要是检查各节点的状态:
一是 dead 与 alive;
二是 Primary candidate for the new Master 等
2.2 Phase 2: Dead Master Shutdown Phase..
一是检查是否可以 ssh 到 Dead Master
二是对 Dead Master 做一些处理,如 Disable VIP,Shutdown 主机等
3
4. 2.3 Phase 3: Master Recovery Phase..
2.3.1 Phase 3.1: Getting Latest Slaves Phase..
根据各 slave 的同步情况得到 Latest slaves(mysql-bin.000034:250773)和
Oldest slaves(mysql-bin.000034:250405)
2.3.2 Phase 3.2: Saving Dead Master's Binlog Phase..
如果 Dead Master 仍是可以 ssh 到, 获取 lasted slave 与 master 之间的 bin log
(start mysql-bin.000034:250773)
save_binary_logs --command=save --start_file=mysql-bin.000034
--start_pos=250773 --binlog_dir=/data/mysql/arch
--output_file=/var/tmp/saved_master_binlog_from_10.0.0.13_3306_201303
25143805.binlog --handle_raw_binlog=1 --disable_log_bin=0
--manager_version=0.55
如下为对应的 bin log 的内容:
[root@db-13~]# mysqlbinlog
/var/tmp/saved_master_binlog_from_10.0.0.13_3306_20130325143805.binlo
g
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET
@OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#130325 10:40:31 server id 1 end_log_pos 107 Start: binlog v 4, server
v 5.5.27-log created 130325 10:40:31 at startup
ROLLBACK/*!*/;
BINLOG '
H7lPUQ8BAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAA
AAAAAAAAAAAAAAAAAAAfuU9REzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
# at 107
#130325 14:18:47 server id 1 end_log_pos 250841 Query
thread_id=21 exec_time=0 error_code=0
SET TIMESTAMP=1364192327/*!*/;
SET @@session.pseudo_thread_id=21/*!*/;
4
5. SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0,
@@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
SET @@session.auto_increment_increment=1,
@@session.auto_increment_offset=1/*!*/;
/*!C utf8 *//*!*/;
SET
@@session.character_set_client=33,@@session.collation_connection=33,@
@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 175
#130325 14:18:47 server id 1 end_log_pos 250930 Query
thread_id=21 exec_time=0 error_code=0
use test/*!*/;
SET TIMESTAMP=1364192327/*!*/;
insert into qwsh values(4)
/*!*/;
# at 264
#130325 14:18:47 server id 1 end_log_pos 250957 Xid = 2425
COMMIT/*!*/;
# at 291
#130325 14:19:42 server id 1 end_log_pos 250976 Stop
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
2.3.3 Phase 3.3: Determining New Master Phase..
检查 latest slave 是否有所有的 relay log 用来修复其他的 slave(oldest pos:
mysql-bin.000034:250405)。然后根据候选规则,选出新的主库(会检查是否有
设置 candidate_master=1 和 no_master=1 等):
apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000034
--latest_rmlp=250773 --target_mlf=mysql-bin.000034
--target_rmlp=250405 --server_id=3 --workdir=/var/tmp
--timestamp=20130325143805 --manager_version=0.55
5
6. --relay_log_info=/data/mysql/data/relay-log.info
--relay_dir=/data/mysql/data/
2.3.4 Phase 3.4: New Master Diff Log Generation Phase..
候选 master 与 lasted slave 比较,是否要生产差异 log (10.0.0.74 received
relay logs up to: mysql-bin.000034:250589 , the latest slave(10.0.0.11)
up to: mysql-bin.000034:250773 )
apply_diff_relay_logs --command=generate_and_send --scp_user=root
--scp_host=10.0.0.74 --latest_mlf=mysql-bin.000034
--latest_rmlp=250773 --target_mlf=mysql-bin.000034
--target_rmlp=250589 --server_id=3
--diff_file_readtolatest=/var/tmp/relay_from_read_to_latest_10.0.0.74
_3306_20130325143805.binlog --workdir=/var/tmp
--timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0
--manager_version=0.55
--relay_log_info=/data/mysql/data/relay-log.info
--relay_dir=/data/mysql/data/
如下为对应的 bin log 的内容:
[root@db-11~]#mysqlbinlog
/var/tmp/relay_from_read_to_latest_10.0.0.74_3306_20130325143805.binl
og
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET
@OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#130325 11:03:52 server id 3 end_log_pos 107 Start: binlog v 4, server
v 5.5.27-log created 130325 11:03:52
BINLOG '
mL5PUQ8DAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
# at 107
#700101 8:00:00 server id 1 end_log_pos 0 Rotate to
mysql-bin.000034 pos: 107
# at 150
#130325 10:40:31 server id 1 end_log_pos 0 Start: binlog v 4, server
6
7. v 5.5.27-log created 130325 10:40:31
BINLOG '
H7lPUQ8BAAAAZwAAAAAAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
# at 253
#130325 14:12:19 server id 1 end_log_pos 250657 Query
thread_id=21 exec_time=0 error_code=0
SET TIMESTAMP=1364191939/*!*/;
SET @@session.pseudo_thread_id=21/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0,
@@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
SET @@session.auto_increment_increment=1,
@@session.auto_increment_offset=1/*!*/;
/*!C utf8 *//*!*/;
SET
@@session.character_set_client=33,@@session.collation_connection=33,@
@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 321
#130325 14:12:19 server id 1 end_log_pos 250746 Query
thread_id=21 exec_time=0 error_code=0
use test/*!*/;
SET TIMESTAMP=1364191939/*!*/;
insert into qwsh values(3)
/*!*/;
# at 410
#130325 14:12:19 server id 1 end_log_pos 250773 Xid = 2424
COMMIT/*!*/;
# at 437
#130325 14:12:36 server id 3 end_log_pos 250938 Stop
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
7
8. 2.3.5 Phase 3.5: Master Log Apply Phase..
一是 Waiting until all relay logs are applied。
二是合并 lasted slave 和 dead master 的日志,因为有些日志的 events 可能
不完整,合并过程中要检查:All apply target binary logs are concatinated
at /var/tmp/total_binlog_for_10.0.0.74_3306.20130325143805.binlog .
以下是对应的 log 内容:
[mysql@db-74 ~]$ mysqlbinlog
/var/tmp/total_binlog_for_10.0.0.74_3306.20130325143805.binlog
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET
@OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#130325 11:03:52 server id 3 end_log_pos 107 Start: binlog v 4, server
v 5.5.27-log created 130325 11:03:52
BINLOG '
mL5PUQ8DAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
# at 107
#700101 8:00:00 server id 1 end_log_pos 0 Rotate to
mysql-bin.000034 pos: 107
# at 150
#130325 10:40:31 server id 1 end_log_pos 0 Start: binlog v 4, server
v 5.5.27-log created 130325 10:40:31
BINLOG '
H7lPUQ8BAAAAZwAAAAAAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
# at 253
#130325 14:12:19 server id 1 end_log_pos 250657 Query
thread_id=21 exec_time=0 error_code=0
SET TIMESTAMP=1364191939/*!*/;
SET @@session.pseudo_thread_id=21/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0,
@@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
8
9. SET @@session.auto_increment_increment=1,
@@session.auto_increment_offset=1/*!*/;
/*!C utf8 *//*!*/;
SET
@@session.character_set_client=33,@@session.collation_connection=33,@
@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 321
#130325 14:12:19 server id 1 end_log_pos 250746 Query
thread_id=21 exec_time=0 error_code=0
use test/*!*/;
SET TIMESTAMP=1364191939/*!*/;
insert into qwsh values(3)
/*!*/;
# at 410
#130325 14:12:19 server id 1 end_log_pos 250773 Xid = 2424
COMMIT/*!*/;
# at 437
#130325 14:12:36 server id 3 end_log_pos 250938 Stop
# at 456
#130325 14:18:47 server id 1 end_log_pos 250841 Query
thread_id=21 exec_time=0 error_code=0
SET TIMESTAMP=1364192327/*!*/;
BEGIN
/*!*/;
# at 524
#130325 14:18:47 server id 1 end_log_pos 250930 Query
thread_id=21 exec_time=0 error_code=0
SET TIMESTAMP=1364192327/*!*/;
insert into qwsh values(4)
/*!*/;
# at 613
#130325 14:18:47 server id 1 end_log_pos 250957 Xid = 2425
COMMIT/*!*/;
# at 640
#130325 14:19:42 server id 1 end_log_pos 250976 Stop
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
9
10. 三是记录新的 master 的 log file 和 pos:
All other slaves should start replication from here. Statement should be:
CHANGE MASTER TO MASTER_HOST='10.0.0.74', MASTER_PORT=3306,
MASTER_LOG_FILE='mysql-bin.000003', MASTER_LOG_POS=475,
MASTER_USER='repl', MASTER_PASSWORD='xxx';
四是 Executing master IP activate script;
五是 Set read_only=0 on the new master
2.4 Phase 4: Slaves Recovery Phase..
2.4.1 Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
判断各个 slave 与 lastest slave 是否存在 relay log 差异,在 latest slave
上执行如下命令,生成差异 relay log 文件,并通过 scp 拷贝到对应的从库上:
(Server 10.0.0.75 received relay logs up to: mysql-bin.000034:250405.
Need to get diffs from the latest slave(10.0.0.11) up to:
mysql-bin.000034:250773)
apply_diff_relay_logs --command=generate_and_send --scp_user=root
--scp_host=10.0.0.75 --latest_mlf=mysql-bin.000034
--latest_rmlp=250773 --target_mlf=mysql-bin.000034
--target_rmlp=250405 --server_id=3
--diff_file_readtolatest=/var/tmp/relay_from_read_to_latest_10.0.0.75
_3306_20130325143805.binlog --workdir=/var/tmp
--timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0
--manager_version=0.55
--relay_log_info=/data/mysql/data/relay-log.info
--relay_dir=/data/mysql/data/
2.4.2 Phase 4.2: Starting Parallel Slave Log Apply Phase..
一是 Waiting until all relay logs are applied
二是检查是否有最新的 relay log,然后合并后应用
10.0.0.11 有 lasted relay log:
10
11. apply_diff_relay_logs --command=apply --slave_user='root'
--slave_host=10.0.0.11 --slave_ip=10.0.0.11 --slave_port=3306
--apply_files=/var/tmp/saved_master_binlog_from_10.0.0.13_3306_201303
25143805.binlog --workdir=/var/tmp --target_version=5.5.27-log
--timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0
--manager_version=0.55 --slave_pass=xxx
10.0.0.75 没有最新的 relay log,需要合并 relay log 和 dead master 的 bin
log:
apply_diff_relay_logs --command=apply --slave_user='root'
--slave_host=10.0.0.75 --slave_ip=10.0.0.75 --slave_port=3306
--apply_files=/var/tmp/relay_from_read_to_latest_10.0.0.75_3306_20130
325143805.binlog,/var/tmp/saved_master_binlog_from_10.0.0.13_3306_201
30325143805.binlog --workdir=/var/tmp --target_version=5.5.27-log
--timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0
--manager_version=0.55 --slave_pass=xxx
以下是对应 log 的内容:
[mysql@db-75 data]$ mysqlbinlog
/var/tmp/total_binlog_for_10.0.0.75_3306.20130325143805.binlog
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET
@OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#130325 11:03:52 server id 3 end_log_pos 107 Start: binlog v 4, server
v 5.5.27-log created 130325 11:03:52
BINLOG '
mL5PUQ8DAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
# at 107
#700101 8:00:00 server id 1 end_log_pos 0 Rotate to
mysql-bin.000034 pos: 107
# at 150
#130325 10:40:31 server id 1 end_log_pos 0 Start: binlog v 4, server
v 5.5.27-log created 130325 10:40:31
BINLOG '
H7lPUQ8BAAAAZwAAAAAAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
11
12. # at 253
#130325 14:09:57 server id 1 end_log_pos 250473 Query
thread_id=21 exec_time=0 error_code=0
SET TIMESTAMP=1364191797/*!*/;
SET @@session.pseudo_thread_id=21/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0,
@@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
SET @@session.auto_increment_increment=1,
@@session.auto_increment_offset=1/*!*/;
/*!C utf8 *//*!*/;
SET
@@session.character_set_client=33,@@session.collation_connection=33,@
@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 321
#130325 14:09:57 server id 1 end_log_pos 250562 Query
thread_id=21 exec_time=0 error_code=0
use test/*!*/;
SET TIMESTAMP=1364191797/*!*/;
insert into qwsh values(2)
/*!*/;
# at 410
#130325 14:09:57 server id 1 end_log_pos 250589 Xid = 2423
COMMIT/*!*/;
# at 437
#130325 14:12:19 server id 1 end_log_pos 250657 Query
thread_id=21 exec_time=0 error_code=0
SET TIMESTAMP=1364191939/*!*/;
BEGIN
/*!*/;
# at 505
#130325 14:12:19 server id 1 end_log_pos 250746 Query
thread_id=21 exec_time=0 error_code=0
SET TIMESTAMP=1364191939/*!*/;
insert into qwsh values(3)
/*!*/;
# at 594
#130325 14:12:19 server id 1 end_log_pos 250773 Xid = 2424
COMMIT/*!*/;
12
13. # at 621
#130325 14:12:36 server id 3 end_log_pos 250938 Stop
# at 640
#130325 14:18:47 server id 1 end_log_pos 250841 Query
thread_id=21 exec_time=0 error_code=0
SET TIMESTAMP=1364192327/*!*/;
BEGIN
/*!*/;
# at 708
#130325 14:18:47 server id 1 end_log_pos 250930 Query
thread_id=21 exec_time=0 error_code=0
SET TIMESTAMP=1364192327/*!*/;
insert into qwsh values(4)
/*!*/;
# at 797
#130325 14:18:47 server id 1 end_log_pos 250957 Xid = 2425
COMMIT/*!*/;
# at 824
#130325 14:19:42 server id 1 end_log_pos 250976 Stop
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
三是 Executed CHANGE MASTER
2.5 Phase 5: New master cleanup phase..
Resetting slave info on the new master
13