Sunday, January 24, 2016

Replication crash safety with MTS in MySQL 5.6 and 5.7: reality or illusion?

Reminder: MTS = Multi-Threaded Slave.

Update 2017-04-17: since the publication of this post, many things happened:
  • the procedure for fixing a crashed slave has been automated (Bug#77496)
  • Bug#80103 as been closed at the same time as Bug#77496
  • but I still think there are unfixed things, see Bug#81840
Update 2018-08-20: it was brought to my attention that MySQL 5.6 with GTID enabled and with sync_binlog != 1 is not replication crash safe, even with single threaded replication.  This is reported in Bug#70659 - Make crash safe slave work with gtid + less durable settings.  Thanks Valeriy for mentioning, in your last blog post, the bug from Sveta Smirnova (Bug#90997) which pointed me to Yoshinori Matsunobu's bug.  To my knowledge, MySQL 5.7 and 8.0 are also affected by this bug, even if this should not be the case.  I will soon open a new bug report about this and will put the bug number below.
    End of updates.

    I will be talking about parallel replication at FOSDEM in Brussel on January 30th and at Percona Live Santa Clara in April (link to the talk descriptions here and here).  Come to one (or both) of those talks to learn more about this subject.

    MySQL 5.6 introduced replication crash safety for single-threaded replication. I already blogged about this, more information can be found in Better Crash-safe replication for MySQL.  To benefit from single-threaded crash safe replication, you need to:
    When using parallel replication in MySQL 5.6, the transactions on the slave can be committed in a different order than the one they appear in the binary logs of the master.  Let's take an example:
    • the binary logs of the master contain transactions A, B, C, D and E in this order,
    • all those transactions are in different schema, so a MTS slave can run them in parallel,
    • A, C and E are small transactions, so they execute quickly on the slave,
    • transactions B and D are bigger, so they take more time to execute on the slave,
    • one of the commit orders on the slave could be ACEBD (but there are many others: ACEDB, EACBD, ...).
    This out-of-order commit will cause problems with replication crash safety.

    Note that out-of-order commit also happens with MySQL 5.7 when slave-parallel-type=DATABASE, and with slave_parallel_type=LOGICAL_CLOCK if slave_preserve_commit_order=0.

    With out-of-order commit, a single position in the relay logs of the slave, and the corresponding position in the binary logs of the master, are not enough to track the replication state.  If you are using GTIDs, gtid_executed will hold the replication state (and there might be some temporary holes in the gtid set because of out-of-order commit).  If you are not using GTIDs (like me), the replication state will be kept in the mysql.slave_worker_info table.

    Now the problem: after the crash of a MTS slave that is not using GTIDs, restarting MySQL 5.6.28 (or 5.7.10) with relay-log-recovery=1 gives the following errors:
    2016-01-18 22:41:22 41180 [ERROR] --relay-log-recovery cannot be executed when the slave was stopped with an error or killed in MTS mode; consider using RESET SLAVE or restart the server with --relay-log-recovery = 0 followed by START SLAVE UNTIL SQL_AFTER_MTS_GAPS
    2016-01-18 22:41:22 41180 [ERROR] Failed to initialize the master info structure
    
    At this point, you probably understood: a MTS MySQL 5.6.28 slave (and MySQL 5.7.10) is not as replication crash safe as we would expect.

    One thing you should not do, and it could be a misunderstanding of the above error message: never start a crashed slave with relay-log-recovery=0 alone.  This could corrupt your relay logs, and in the best case, the slave will break on a duplicate or not found key error, but in the worst case, you will have silent data corruption.  When you start MySQL with relay-log-recovery=0, either be sure that MySQL was stopped cleanly, or that you also use skip-slave-start to avoid starting the IO_Thread.  More details about relay log corruption after a crash in my previous post. I logged Bug#80102 (Message in log after MTS crash misleading) for having this message clarified.

    In the above error message, I do not see how RESET SLAVE would help resume replication.  The way I know to resume replication is the following
    1. restart MySQL with relay-log-recovery=0 and skip-slave-start,
    2. run the following command: START SLAVE UNTIL SQL_AFTER_MTS_GAPS,
    3. wait for the SQL_Thread to stop,
    4. restart MySQL with relay-log-recovery=1 (and without skip-slave-start if you want replication to resume automatically).
    Without restarting with relay-log-recovery disabled, MySQL does not allow to run START SLAVE UNTIL SQL_AFTER_MTS_GAPS.  After restarting and running that command, once the SQL_Thread stops, the replication state is simplified to a single position in the relay logs of the slave and the corresponding position in the binary logs of the master (there are no more gaps in the execution order of the transactions on the slave).  However, the IO_Thread position cannot yet be trusted (again, more details in my previous post), so the IO_Thread should not be restarted yet.  In this state (no gaps), relay log recovery will work, so restarting MySQL with relay-log-recovery=1 will clean up the relay logs and allow us to safely restart both the IO and SQL threads.

    The procedure above is needed to resume replication for:
    • MySQL 5.6.28 running MTS,
    • MySQL 5.7.10 running MTS with slave-parallel-type=DATABASE,
    • MySQL 5.7.10 running MTS with slave-parallel-type=LOGICAL_CLOCK and slave_preserve_commit_order=0,
    • and MySQL 5.7.10 running MTS with slave-parallel-type=LOGICAL_CLOCK and slave_preserve_commit_order=1.
    Concerning slave-parallel-type=LOGICAL_CLOCK with slave_preserve_commit_order=1, this configuration does not commit transaction out-of-order on the slave, so it does not generate gaps.  So a single position in the relay logs of the slave, and the corresponding position in the binary logs of the master, is enough to keep track of the replication state.  I am surprised that in-order commit cannot take advantage of standard relay log recovery and I logged Bug#80103 (MTS LOGICAL_CLOCK with slave_preserve_commit_order=1 not replication crash safe) for that.

    One last thing: the manual process that I described above to restart a crashed MTS slave does not work in all cases.  It works in most situations where MySQL crashes, but it might fail in the situation were the operating system crashes.  Here is how:
    • we have transactions A, B, C, D, E in the relay logs, all from different independent schema (they can be run in parallel)
    • everything up to and including transaction A is committed (Relay_Master_Log_File and Exec_Master_Log_Pos point to transaction B)
    • transaction C and E are committed, but B and D are not (we have gaps)
    • transactions up to and including C are synced in the relay logs, but D and E are not (sync_relay_log=10000 by default, so many events can be in the relay logs without being synced to disk)
    • the OS crashes (D and E disappear from the relay logs)
    In the above situation, START SLAVE UNTIL SQL_AFTER_MTS_GAPS will fail as D is lost from the relay logs.  Such situation gives the below error in the logs with MySQL 5.6.28 when restarting with relay-log-recovery=1:
    2016-01-24 19:35:21 2976 [Note] Slave: MTS group recovery relay log info based on Worker-Id 9, group_relay_log_name ../log2/relaylog.000002, group_relay_log_pos 10809 group_master_log_name binlog.000001, group_master_log_pos 10649
    2016-01-24 19:35:21 2976 [ERROR] Error looking for file after ../log2/relaylog.000003.
    2016-01-24 19:35:21 2976 [ERROR] Failed to initialize the master info structure
    2016-01-24 19:35:21 2976 [Note] Check error log for additional messages. You will not be able to start replication until the issue is resolved and the server restarted.
    
    Once restarted with relay-log-recovery=0 and skip-slave-start=1, START SLAVE UNTIL SQL_AFTER_MTS_GAPS fails with this message:
    ERROR 1201 (HY000): Could not initialize master info structure; more error messages can be found in the MySQL error log
    and the following can be found in the logs:
    2016-01-24 19:39:15 5190 [Note] Slave: MTS group recovery relay log info based on Worker-Id 9, group_relay_log_name ../log2/relaylog.000002, group_relay_log_pos 10809 group_master_log_name binlog.000001, group_master_log_pos 10649
    2016-01-24 19:39:15 5190 [Note] Slave: MTS group recovery relay log info group_master_log_name binlog.000001, event_master_log_pos 143.
    2016-01-24 19:39:15 5190 [ERROR] Error looking for file after ../log2/relaylog.000004.
    
    This makes sense as group_relay_log_pos 10809 is lost following the OS crash, below the size on disk of the corresponding relay log file:
    # ls -l relaylog.000002
    -rw-rw---- 1 mysql mysql 5984 Jan 24 19:34 relaylog.000002
    
    In summary:
    1. if you are using GTIDs, MTS replication crash safety might be a reality,
    2. in both MySQL 5.6.28 and MySQL 5.7.10, after a MTS crash where the slave is not using GTIDs, if you are lucky, you might be able to manually recover replication,
    3. in some operating system crashes, you will have a very hard time recovering from a MTS crash, so in this case, MTS replication crash safety is an illusion.

    3 comments:

    1. I am Using GTIDs with MTS as also with relay_log_recovery as ON.
      I am facing the problem that slave is not able to come up and fails with
      Slave failed to initialize relay log info structure from the repository

      ReplyDelete
      Replies
      1. Hi Prasad, sorry to read you are having troubles with MySQL replication. I saw your comments in Bug#80103 and Bug#81840 and I also had the error you describe in the past but I cannot remember in which situation. What is strange to me is that you write that the server was brought down for maintenance, which should not corrupt any state if the server was graciously shutdown, in which case, it is not a but with crash safety. Out of curiosity, can you run the three following commands, they should return "ON", "TABLE" and "TABLE".

        show global variables like 'slave_preserve_commit_order';
        show global variables like 'master_info_repository';
        show global variables like 'relay_log_info_repository';

        You could also run the following to have a better idea of what the problem is, but do not post results as they might include password.

        select * from mysql.slave_master_info\G
        select * from mysql.slave_relay_log_info\G
        select * from mysql.slave_worker_info\G
        select * from gtid_executed\G

        By running "show master status", you should be able to see the GTID state of your slave. With that information and the data from the SELECTs above, you might be able to re-initialize replication on your mysql instance, but make sure to fully know what you are doing as you might loose data if you do something wrong.

        Good luck.

        Delete
    2. Hi Jean - Thanks for your response. Unfortunately, I have lost this box. I will monitor this again and update this thread when I hit the problem again.

      ReplyDelete