J-F Gagné's MySQL Blog: GTID

Showing posts with label GTID. Show all posts

Saturday, December 5, 2020

Fixing low durability GTID replica with Voodoo

At my FOSDEM talk earlier this year, I gave a trick for fixing a crashed GTID replica. I never blogged about this, so now is a good time. What is pushing me to write on this today is my talk at MinervaDB Athena 2020 this Friday. At this conference, I will present more details about MySQL replication crash safety. So you know what to do if you want to learn more about this subject. Let's now talk about voodoo.

A Legacy Behavior of MySQL Corrupting Restored Backups (replicate-same-server-id = OFF)

In my previous post (Puzzled by MySQL Replication), I describe a weird, but completely documented, behavior of replication that had me scratching my head for hours because it was causing data corruption. I did not give too many details then as I also wanted allowing you to scratch your head if you wished. In this post, I describe this behavior in more details.

Puzzled by MySQL Replication (War Story)

Recently, I was puzzled by MySQL replication ! Some weird, but completely documented, behavior of replication had me scratching my head for hours. I am sharing this war story so you can avoid losing time like me (and also maybe avoid corrupting your data when restoring a backup). The exact justification will come in a follow-up post, so you can also scratch your head trying to understand what I faced. So let's dive-in.

MySQL Master High Availability and Failover: more thoughts

Some months ago, Shlomi Noach published a series about Service Discovery. In his posts, Shlomi describes many ways for an application to find the master. He also gives detail on how these solutions cope with failing-over to a slave, including their integration with Orchestrator.

This is a great series, and I recommend its reading for everybody implementing master failover, with or without Orchestrator, even if you are not fully automating the process yet. Taking a step back, I realized that service discovery is only one of the five parts of a full MySQL Master Failover Strategy; this post is about these five parts. In some follow-up posts, I might analyze some deployments using the framework presented in this post.

MySQL Master Replication Crash Safety Part #3: GTID

This is a follow-up post in the MySQL Master Replication Crash Safety series. In the two previous posts, we explored the consequence of reducing durability on masters (including setting sync_binlog to a value different than 1) when slaves are using legacy file+position replication. In this post, we cover GTID replication. This introduces a new inconsistency scenario with a potential replication breakage that depends on transaction execution on the master and timing on the slave. Before discussing this violation of ACID, we start with some reminders about the last posts and with some explanations about GTIDs.

Unforeseen use case of my GTID work: replicating from AWS Aurora to Google CloudSQL

A colleague brought an article to my attention. I did not see it on Planet MySQL where I get most of the MySQL news (or it did not catch my eye there). As it is interesting replication stuff, I think it is important to bring it to the attention of the MySQL Community, so I am writing this short post.

The surprising part for me is that it uses my 4-year-old work for online migration to GTID with MySQL 5.6. This is a completely unforeseen use case of my work as I never thought that my hack would be useful after Oracle include an online migration path to GTID in MySQL 5.7 (Percona did something similar for MySQL 5.6).

Do not run those commands with MariaDB GTIDs - part # 2

Update 2016-01-30: restarting the IO_THREAD might be considered useful in some situations (avoiding MDEV-9138). Look for "in contrast, if the IO thread was also stopped first" in MDEV-6589 for more information.

In a previous post, I listed some sequences of commands that you should not run on a MariaDB slave that is lagging and which is using the GTID protocol. Those are the following (do not run them, it's a trap):

"STOP SLAVE; START SLAVE UNTIL ...;",
or "STOP SLAVE; START SLAVE;" (to remove an UNTIL condition as an example),
or "STOP SLAVE; SET GLOBAL slave_parallel_threads=...; START SLAVE;",
and maybe others.

Do not run those commands with MariaDB GTIDs - part # 1

In the spirit of sharing war stories and avoiding others to do the same mistakes as I did, here are some sequences of commands that you should avoid to run on a MariaDB slave that is lagging and which is using the GTID protocol. Remember, do not run those because...

Self-Critic and Slides of my PLMCE Talks

The link to the slides of my talks can be found at the end of this post but first, let me share some thoughts about PLMCE.

Talking with people, I was surprised to be criticized of presenting only the good sides of my solution without giving credit to the good side of the alternative solutions. More than surprised, I was also a little shocked as I want to be perceived as objective as possible. Let me try to fix that:

Even Easier Master Promotion (and High Availability) for MySQL (no need to touch any slave)

Dealing with the failure of a MySQL master is not simple. The most common solution is to promote a slave as the new master but in an environment where you have many slaves, the asynchronous implementation of replication gets in your way. The problem is that each slave might be in a different state:

some could be very close to the dead master,
some could be missing the latest transactions,
and some could be far behind (lagging, delayed slaves, or slaves in maintenance).

Follow up on MySQL 5.6 GTIDs: Evaluation and Online Migration

One year ago, I blogged about Evaluation and Online Migration of MySQL 5.6 GTIDs. At that time, we setup the following test environment where:

A is a production master with GTIDs disabled,
D to Z are standard slaves with GTIDs disabled,
B is an intermediate master running my recompiled version of MySQL implementing the ANONYMOUS_IN-GTID_OUT mode (see the details my previous post),
C is a slave with GTID enabled.

J-F Gagné's MySQL Blog

Saturday, December 5, 2020

Fixing low durability GTID replica with Voodoo

Monday, January 27, 2020

A Legacy Behavior of MySQL Corrupting Restored Backups (replicate-same-server-id = OFF)

Thursday, January 9, 2020

Puzzled by MySQL Replication (War Story)

Tuesday, February 26, 2019

MySQL Master High Availability and Failover: more thoughts

Tuesday, February 12, 2019

MySQL Master Replication Crash Safety Part #3: GTID

Tuesday, September 11, 2018

Unforeseen use case of my GTID work: replicating from AWS Aurora to Google CloudSQL

Thursday, October 15, 2015

Do not run those commands with MariaDB GTIDs - part # 2

Monday, October 12, 2015

Do not run those commands with MariaDB GTIDs - part # 1

Thursday, April 23, 2015

Self-Critic and Slides of my PLMCE Talks

Wednesday, April 8, 2015

Even Easier Master Promotion (and High Availability) for MySQL (no need to touch any slave)

Wednesday, March 25, 2015

Follow up on MySQL 5.6 GTIDs: Evaluation and Online Migration