Wednesday, August 16, 2017

The danger of no Primary Key when replicating in RBR (and a partial protection with MariaDB 10.1)

TL;DR: unless you know what you are doing, you should always have a primary key on your tables when replicating in RBR (and maybe even all the time).

TL;DR2: MariaDB 10.1 has an interesting way to protect against missing a primary key (innodb_force_primary_key) but it could be improved.

A few weeks ago, I was called off hours because replication delay on all the slaves from a replication chain was high and growing.  It was not the first time this happened on that chain, so I thought right away that this was probably an UPDATE or DELETE of many rows on a table without a primary key.  Let's see what is the problem with this and to understand that, we have to talk about binary log formats.

There are two main binary log formats: Statement Based [Replication] (SBR) and Row-Based [Replication] (RBR).  There is also the Mixed format but I will not talk about it here.  There are also RBR subtypes (full, minimal and noblob) and proposed RBR subtypes (Lossless and another by Facebook which I do not find a reference to, but I think to remember that it was an interesting improvement on minimal RBR).

SBR binary logs contain the statement that was run on the master and a few other things to make sure the execution of that statement will produce the same result on slaves as on the master.  RBR binary logs contain the actual fields of the rows that are inserted, updated or deleted.  For a longer description, you will have to read the section Replication Formats of the manual.

When applying an Update_row or a Delete_row event on a slave (this is how events are called in RBR binary logs), the SQL thread must find (fetch) the updated or deleted row and it must execute the corresponding operation.  For a single statement that updates or deletes Y rows on the master, the slave (SQL thread) will have to execute Y operations.  If the table has a primary key (or another index, like a unique key, that makes the fetch efficient), the SQL thread is able to run those Y operations quickly.  However, if there is no such key, each of those Y operations will need to scan the table.  With a table of size N, this will be a O(Y * N) process.  On a big table (when N is large) and even if Y is relatively small, this might take a very long time.

To illustrate that, I deployed a simple replication chain using MySQL 5.7 with one master and one slave and I used full RBR as the binary log format for the master.  I created two tables, one without and one with a primary key, and I filled both tables with 50,000 rows.
$ ( echo "CREATE TABLE wo_pk(id INT UNSIGNED NOT NULL, value INT);"
    echo "CREATE TABLE with_pk LIKE wo_pk;"
    echo "ALTER TABLE with_pk ADD PRIMARY KEY (id);"
    echo "BEGIN;"
    seq 1 50000 | while read i; do
      echo "INSERT INTO with_pk VALUES ($i, $(($RANDOM % 10)));"
    done
    echo "COMMIT;"
    echo "INSERT INTO wo_pk SELECT * FROM with_pk;"
  ) | mysql test_jfg
When running the below DELETE statement on the master, both execute quickly.  The one on the table with a primary key did not generate any noticeable replication delay on the slave.  However, the same statement on the table without a primary key generated noticeable replication delay on the slave.
> DELETE FROM test_jfg.with_pk WHERE value = 5;
Query OK, 4970 rows affected (0.05 sec)

> DELETE FROM test_jfg.wo_pk WHERE value = 5;
Query OK, 4970 rows affected (0.05 sec)
Below is the graph showing the replication delay on the slave by observing the Seconds_Behind_Master field of SHOW SLAVE STATUS output.  For executing 4970 Delete_row events, even if the corresponding statement took less than a second to run on the master, it took about 90 seconds on the slave.  Not having a primary key greatly penalized replication speed in this case.


To look in more details at what is happening on the slave, I used the excellent ps-top utility developed by my colleague Simon Mudd.  This tool is a top-like visualization of performance schema.  If we look at the view table_io_latency of ps-top while the slave is running the DELETE without a primary key, we have the following:
+------------------------------------------------------------------+
¦ ps-top 0.7.5 - 22:09:24 hostname / 5.7.19-log, up 23d 9h 25m 16s ¦
¦ Table  (table_io_waits_summary_by_table) 1 rows                  ¦
¦    Latency      %| Fetch Insert Update Delete|Table Name         ¦
¦   00:01:21 100.0%|100.0%                 0.0%|test_jfg.wo_pk     ¦
¦                  |                           |                   ¦
¦                  |                           |                   ¦
¦                  |                           |                   ¦
¦   00:01:21 100.0%|100.0%                 0.0%|Totals             ¦
+------------------------------------------------------------------+
So performance schema is reporting executing mostly Fetch operations on the test_jfg.wo_pk table.  If we look at the view table_io_ops, we see the following:
+------------------------------------------------------------------+
¦ ps-top 0.7.5 - 22:09:38 hostname / 5.7.19-log, up 23d 9h 25m 30s ¦
¦ Table  (table_io_waits_summary_by_table) 1 rows                  ¦
¦        Ops      %| Fetch Insert Update Delete|Table Name         ¦
¦   105.11 M 100.0%|100.0%                     |test_jfg.wo_pk     ¦
¦                  |                           |                   ¦
¦                  |                           |                   ¦
¦                  |                           |                   ¦
¦   105.11 M 100.0%|100.0%                     |Totals             ¦
+------------------------------------------------------------------+
So about 105 million operations, mostly fetches, were executed on the test_jfg.wo_pk table.  For comparison, below are the two ps-top output for the DELETE with a primary key:
+------------------------------------------------------------------+
¦ ps-top 0.7.5 - 22:06:35 hostname / 5.7.19-log, up 23d 9h 22m 27s ¦
¦ Table  (table_io_waits_summary_by_table) 1 rows                  ¦
¦    Latency      %| Fetch Insert Update Delete|Table Name         ¦
¦   26.01 ms 100.0%| 43.6%                56.4%|test_jfg.with_pk   ¦
¦                  |                           |                   ¦
¦                  |                           |                   ¦
¦                  |                           |                   ¦
¦   26.01 ms 100.0%| 43.6%                56.4%|Totals             ¦
+------------------------------------------------------------------+
+------------------------------------------------------------------+
¦ ps-top 0.7.5 - 22:06:59 hostname / 5.7.19-log, up 23d 9h 22m 51s ¦
¦ Table  (table_io_waits_summary_by_table) 1 rows                  ¦
¦        Ops      %| Fetch Insert Update Delete|Table Name         ¦
¦     9.71 k 100.0%| 50.0%                50.0%|test_jfg.with_pk   ¦
¦                  |                           |                   ¦
¦                  |                           |                   ¦
¦                  |                           |                   ¦
¦     9.71 k 100.0%| 50.0%                50.0%|Totals             ¦
+------------------------------------------------------------------+
So, if you are planning to perform large UPDATE or DELETE operations on a master that is using RBR, not having a primary key on your tables will cause big problems in replication.  I guess that from now on, you will try to always have a primary key on your tables.  But you might forget, or the next person joining your team might not know that it is important to always have a primary key.  This brings us to the protection implemented in MariaDB 10.1.

With MariaDB 10.1, you can enable innodb_force_primary_key which reduces the probability of having this kind of problem.  I am writing that the problem is reduced and not completely solved because there are some cases that are not covered by this solution (I will write another post about this).

Before finishing this post, let's mention a few things:

Regarding the two feature requests above, maybe it is not too late to do something in MySQL 8.0...

1 comment: