Tuesday, February 26, 2019

MySQL Master High Availability and Failover: more thoughts

Some months ago, Shlomi Noach published a series about Service Discovery.  In his posts, Shlomi describes many ways for an application to find the master.  He also gives detail on how these solutions cope with failing-over to a slave, including their integration with Orchestrator.

This is a great series, and I recommend its reading for everybody implementing master failover, with or without Orchestrator, even if you are not fully automating the process yet.  Taking a step back, I realized that service discovery is only one of the five parts of a full MySQL Master Failover Strategy; this post is about these five parts.  In some follow-up posts, I might analyze some deployments using the framework presented in this post.

Tuesday, February 12, 2019

MySQL Master Replication Crash Safety Part #3: GTID

This is a follow-up post in the MySQL Master Replication Crash Safety series.  In the two previous posts, we explored the consequence of reducing durability on masters (including setting sync_binlog to a value different than 1) when slaves are using legacy file+position replication.  In this post, we cover GTID replication.  This introduces a new inconsistency scenario with a potential replication breakage that depends on transaction execution on the master and timing on the slave.  Before discussing this violation of ACID, we start with some reminders about the last posts and with some explanations about GTIDs.