Monday, January 27, 2020

A Legacy Behavior of MySQL Corrupting Restored Backups (replicate-same-server-id = OFF)

In my previous post (Puzzled by MySQL Replication), I describe a weird, but completely documented, behavior of replication that had me scratching my head for hours because it was causing data corruption.  I did not give too many details then as I also wanted allowing you to scratch your head if you wished.  In this post, I describe this behavior in more details.

But first I need to apologize to my readers: I was planning to release this post a few days after the previous one and it took me more time to finalize it.  Sorry for keeping you waiting.

One thing to know about our subject is that it is a legacy behavior.  It is inherited from a time when people deployed MySQL as a ring replication architecture, also named circular replication, or miss-called master-master (I do not like this last name as it introduces confusion with Galera or Group Replication).  The ring replication deployment is dangerous in so many ways that if you are running MySQL this way, you should probably review your architecture (maybe you will keep it, but then know that you are in the danger zone).  Stéphane Combaudon wrote Why [ring replication] it is a bad option some time ago on the Percona blog and his argumentation is still valid.  I also talked about it at DataOps Barcelona 2019 (slide) and at GrowIT 2018 (slide).  As you can check these links, I am not describing again why it is a risky architecture.

The ring replication architecture is still supported in recent version of MySQL.  When deployed in a modern way with GTIDs, a replication infinite loop is prevented becauswe a transaction whose GTID is already in the gtid_executed of the node is not executed again.  But before GTIDs, something else was preventing infinite loop of transactions, and this is what is of interest to us in this post.

In a ring replication deployment predating GTIDs, the way MySQL avoids having transactions cycling the ring indefinitely is by not replicating them when they have the same server-id as the current node.  As a master tags transactions with its server-id in the binary logs, when these transactions reach its initiating node in a ring deployment, not replicating them prevents infinite loop.  And there is a risk here: guess what happens when you remove a node from the ring while one of its transactions is still in transit...  Some interesting war stories were presented in the past, and you can protect against this with the IGNORE_SERVER_IDS option of the CHANGE MASTER TO command.

Not replicating transactions having the same server-id as the current node is a configurable behavior with the replicate-same-server-id parameter.  The default value for this parameter is OFF, which is the root cause of the data corruption described in the previous post.  This problem predates GTIDs, and making things safe is not as simple as changing the default.
The default of replicate-same-server-id (OFF) is a problem
In the deployment below, it is completely ok to take a backup of S1.

+---+      +---+
| M | ---> | S1|
+---+      +---+

After taking a backup, making S1 the new master is something we might wish to do.  The replication topology becomes the following:

+---+      +---+
| S1| ---> | M |
+---+      +---+

In this situation, S1 tags transactions with its own server-id, and these transactions will reach the binary logs of M; nothing unusual here...  And then, S1 might fail, and we would promote M as the new master to restore the service; there is again nothing unusual here...  Things go wrong when we restore the backup on S1.  And this is exactly the scenario I described in the previous post.

After being a slave of S1, M has transactions in its binary log tagged with the server-id of S1.  After the crash of S1, M is the new master and we restore the backup on S1.  S1 starts replicating from M, and will eventually get the transactions tagged with its server-id.  And because replicate-same-server-id is OFF (the default), those transactions will be skipped.  This will either lead to silent data corruption or to a replication breakage.

It should be noted that this data corruption happens independently of GTID being enabled or not.  In the previous post, I gave a test case with GTID enabled and at the end of this post I give a test case with GTID disabled.  So this problem is not specific to GTIDs, it has been in MySQL for a very long time.

The solution suggested by LeFred in his post about MySQL GTID: restore a master from a replica’s backup is to change the server-id when restoring a backup.  I have to say it is not the solution I like to ear.  I have to admit it is the only solution that currently works with MySQL 5.7 or with MySQL 8.0 without GTID, but IMHO it should not be this way.

For me, the right solution would be not having to mind the value of server-id.  IMHO, ring replication is not the normal case, so I believe replicate-same-server-id should not be OFF by default.  However it is not possible to set it to ON with MySQL 5.7 when log-slave-updates is enabled, and it is the same with 8.0 with GTID disabled.  Those restrictions are described in the manual (link to 5.7 manual and link to 8.0 manual).  In the manual of 8.0, we can also read that when running with GTID, it is possible to set replicate-same-server-id to ON, but it is not the default setting.  So if you are running MySQL 8.0 with GTID, you probably want to set replicate-same-server-id to ON.  But there is not much you can do to get a sane behavior if you are not using GTID or if you are running MySQL 5.7: in these cases you need to mind the value of server-id when restoring a backup, which is a major hurdle and a risk of getting things wrong.
If running MySQL 8.0 with GTID, you probably want
to set replicate-same-server-id to ON
If you are not using GTID or running MySQL 5.7,
mind server-id when restoring a backup
In both 5.7 and 8.0 manuals, we can read "Normally, this [setting replicate-same-server-id to ON] is useful only in rare configurations".  I do not agree with this.  It is obvious to me that this would be useful when restoring a backup as exposed in this post.  So I opened Bug#98408: Make replicate-same-server-id behavior flexible and safe by default.  The full behavior I would like from MySQL is described in this bug/feature request.  In short, I would like that:
  • In MySQL 8.0 when GTID is enabled, replicate-same-server-id should be automatically ON.
  • Add a new option for replicate-same-server-id in which replication would stop with an error if a SQL Thread sees its own server-id (suggested name “Break”).
  • Make this “Break” behavior the default.
  • Allow replicate-same-server-id = ON and = Break even with log-slave-updates or without GTID.
  • Make replicate-same-server-id dynamic.
Feel free to comment on the bug or on this post if you think I missed something.

And one last thing, this is the list of people who gave the right answer to the challenge of the last post:
  • Gillian Gunson
  • 2x anonymous/unknown in comments to my original post
  • Art van Scheppingen
  • Hugo Dubois
  • Frédéric Descamps
  • Ivan Groenewold

Test case for reproducing the problem without GTID and with dbdeployer:

# Create a sandbox (we need replication crash safe to extract user/password from table):
dbdeployer deploy replication mysql_5.7.28 --repl-crash-safe -c log-slave-updates

# We do not need s2/node2:

# Initialize the test environment:
./m <<< "
  CREATE TABLE test_jfg.t(
    v INTEGER)"

# Load data in m (v = 1):
yes "INSERT INTO test_jfg.t(v) value(1);" | head -n 100 | ./m

# Take a backup of s1/node1:
./node1/stop && tar -zcf node1.tgz node1/ && ./node1/start

# Load more data in m (v = 1):
yes "INSERT INTO test_jfg.t(v) value(1);" | head -n 100 | ./m

# Failover to s1/node1:
s1_port=$(./s1 -N <<< "SELECT @@port") &&
  s1_user=$(./s1 -N <<< "SELECT User_name from mysql.slave_master_info") &&
  s1_password=$(./s1 -N <<< "SELECT User_password from mysql.slave_master_info") && 
  sms="$(./s1 <<< "SHOW MASTER STATUS\G")" &&
  mlf=$(awk '$1 == "File:"{print $2}' <<< "$sms") &&
  mlp=$(awk '$1 == "Position:"{print $2}' <<< "$sms") &&
  ./m <<< "
      MASTER_USER = '$s1_user', MASTER_PASSWORD = '$s1_password',
      MASTER_LOG_FILE='$mlf', MASTER_LOG_POS=$mlp;

# Load data in s1/node1 (v = 2):
yes "INSERT INTO test_jfg.t(v) value(2);" | head -n 100 | ./s1

# s1/node1 crash, m is the new master:

# Load even more data in m (v = 1):
yes "INSERT INTO test_jfg.t(v) value(1);" | head -n 100 | ./m

# Stop and restore s1/node1:
./node1/stop && rm -rf ./node1 && tar -zxf node1.tgz && ./node1/start

# Show table content of m and s1/node1:
# (should be the same on both nodes but nothing in s1 from when it was the master !):
./m -N <<< "SELECT 'm ', v, COUNT(v) FROM test_jfg.t GROUP BY v" &&
  ./s1 -N <<< "SELECT 's1 ', v, COUNT(v) FROM test_jfg.t GROUP BY v"
m       1       300
m       2       100
s1      1       300

No comments:

Post a Comment