Update 2017-04-17: since the publication of this post, many things happened:
- the procedure for fixing a crashed slave has been automated (Bug#77496)
- Bug#80103 as been closed at the same time as Bug#77496
- but I still think there are unfixed things, see Bug#81840
I will be talking about parallel replication at FOSDEM in Brussel on January 30th and at Percona Live Santa Clara in April (link to the talk descriptions here and here). Come to one (or both) of those talks to learn more about this subject.
MySQL 5.6 introduced replication crash safety for single-threaded replication. I already blogged about this, more information can be found in Better Crash-safe replication for MySQL. To benefit from single-threaded crash safe replication, you need to:
- configure relay-log-info-repository = TABLE,
- configure relay_log_recovery = 1,
- use an atomic storage engines (InnoDB).
- the binary logs of the master contain transactions A, B, C, D and E in this order,
- all those transactions are in different schema, so a MTS slave can run them in parallel,
- A, C and E are small transactions, so they execute quickly on the slave,
- transactions B and D are bigger, so they take more time to execute on the slave,
- one of the commit orders on the slave could be ACEBD (but there are many others: ACEDB, EACBD, ...).
Note that out-of-order commit also happens with MySQL 5.7 when slave-parallel-type=DATABASE, and with slave_parallel_type=LOGICAL_CLOCK if slave_preserve_commit_order=0.
With out-of-order commit, a single position in the relay logs of the slave, and the corresponding position in the binary logs of the master, are not enough to track the replication state. If you are using GTIDs, gtid_executed will hold the replication state (and there might be some temporary holes in the gtid set because of out-of-order commit). If you are not using GTIDs (like me), the replication state will be kept in the mysql.slave_worker_info table.
Now the problem: after the crash of a MTS slave that is not using GTIDs, restarting MySQL 5.6.28 (or 5.7.10) with relay-log-recovery=1 gives the following errors:
At this point, you probably understood: a MTS MySQL 5.6.28 slave (and MySQL 5.7.10) is not as replication crash safe as we would expect.2016-01-18 22:41:22 41180 [ERROR] --relay-log-recovery cannot be executed when the slave was stopped with an error or killed in MTS mode; consider using RESET SLAVE or restart the server with --relay-log-recovery = 0 followed by START SLAVE UNTIL SQL_AFTER_MTS_GAPS 2016-01-18 22:41:22 41180 [ERROR] Failed to initialize the master info structure
One thing you should not do, and it could be a misunderstanding of the above error message: never start a crashed slave with relay-log-recovery=0 alone. This could corrupt your relay logs, and in the best case, the slave will break on a duplicate or not found key error, but in the worst case, you will have silent data corruption. When you start MySQL with relay-log-recovery=0, either be sure that MySQL was stopped cleanly, or that you also use skip-slave-start to avoid starting the IO_Thread. More details about relay log corruption after a crash in my previous post. I logged Bug#80102 (Message in log after MTS crash misleading) for having this message clarified.
In the above error message, I do not see how RESET SLAVE would help resume replication. The way I know to resume replication is the following
- restart MySQL with relay-log-recovery=0 and skip-slave-start,
- run the following command: START SLAVE UNTIL SQL_AFTER_MTS_GAPS,
- wait for the SQL_Thread to stop,
- restart MySQL with relay-log-recovery=1 (and without skip-slave-start if you want replication to resume automatically).
The procedure above is needed to resume replication for:
- MySQL 5.6.28 running MTS,
- MySQL 5.7.10 running MTS with slave-parallel-type=DATABASE,
- MySQL 5.7.10 running MTS with slave-parallel-type=LOGICAL_CLOCK and slave_preserve_commit_order=0,
- and MySQL 5.7.10 running MTS with slave-parallel-type=LOGICAL_CLOCK and slave_preserve_commit_order=1.
One last thing: the manual process that I described above to restart a crashed MTS slave does not work in all cases. It works in most situations where MySQL crashes, but it might fail in the situation were the operating system crashes. Here is how:
- we have transactions A, B, C, D, E in the relay logs, all from different independent schema (they can be run in parallel)
- everything up to and including transaction A is committed (Relay_Master_Log_File and Exec_Master_Log_Pos point to transaction B)
- transaction C and E are committed, but B and D are not (we have gaps)
- transactions up to and including C are synced in the relay logs, but D and E are not (sync_relay_log=10000 by default, so many events can be in the relay logs without being synced to disk)
- the OS crashes (D and E disappear from the relay logs)
2016-01-24 19:35:21 2976 [Note] Slave: MTS group recovery relay log info based on Worker-Id 9, group_relay_log_name ../log2/relaylog.000002, group_relay_log_pos 10809 group_master_log_name binlog.000001, group_master_log_pos 10649 2016-01-24 19:35:21 2976 [ERROR] Error looking for file after ../log2/relaylog.000003. 2016-01-24 19:35:21 2976 [ERROR] Failed to initialize the master info structure 2016-01-24 19:35:21 2976 [Note] Check error log for additional messages. You will not be able to start replication until the issue is resolved and the server restarted.
Once restarted with relay-log-recovery=0 and skip-slave-start=1, START SLAVE UNTIL SQL_AFTER_MTS_GAPS fails with this message:
ERROR 1201 (HY000): Could not initialize master info structure; more error messages can be found in the MySQL error log
and the following can be found in the logs:
2016-01-24 19:39:15 5190 [Note] Slave: MTS group recovery relay log info based on Worker-Id 9, group_relay_log_name ../log2/relaylog.000002, group_relay_log_pos 10809 group_master_log_name binlog.000001, group_master_log_pos 10649 2016-01-24 19:39:15 5190 [Note] Slave: MTS group recovery relay log info group_master_log_name binlog.000001, event_master_log_pos 143. 2016-01-24 19:39:15 5190 [ERROR] Error looking for file after ../log2/relaylog.000004.
This makes sense as group_relay_log_pos 10809 is lost following the OS crash, below the size on disk of the corresponding relay log file:
# ls -l relaylog.000002 -rw-rw---- 1 mysql mysql 5984 Jan 24 19:34 relaylog.000002
- if you are using GTIDs, MTS replication crash safety might be a reality,
- in both MySQL 5.6.28 and MySQL 5.7.10, after a MTS crash where the slave is not using GTIDs, if you are lucky, you might be able to manually recover replication,
- in some operating system crashes, you will have a very hard time recovering from a MTS crash, so in this case, MTS replication crash safety is an illusion.