Tuesday, July 9, 2019

MySQL Master Replication Crash Safety part #4: benchmarks (under the hood)

This post is a sister post to MySQL Master Replication Crash Safety Part #4: benchmarks of high and low durability.  There are no introduction or conclusion to this post, only landing sections: reading this post without its context is not recommended. You should start with the main post and come back here for more details.


My benchmark environment is composed of three vms in Google Cloud Platform (GCP).  One vm is running MySQL in dbdeployer and the two others are sysbench clients.  The MySQL vm (let's call it mysql) is a n1-standard-4 instance (4 vCPUs and 15 GB memory), and the sysbench clients vms are custom 4 vCPUs and 4 GB memory instances.  One of the sysbench clients (let's call it sysbench1) is co-located in the same zone as the MySQL vm (europe-west4-a) and the other (let's call it sysbench2) is in another zone (europe-west4-b).  You can learn more about GCP regions and zones in the reference documentation.  From this, we can expect a higher ping latency from sysbench2 to mysql than from sysbench1 to mysql, and this is exactly what we observe:
sysbench1$ ping mysql -c 2000 -i 0
--- mysql ping statistics ---
2000 packets transmitted, 2000 received, 0% packet loss, time 287ms
rtt min/avg/max/mdev = 0.047/0.109/1.445/0.068 ms, ipg/ewma 0.143/0.108 ms

sysbench2$ ping mysql -c 2000 -i 0
--- mysql ping statistics ---
200 packets transmitted, 200 received, 0% packet loss, time 40ms
rtt min/avg/max/mdev = 0.063/0.168/0.674/0.072 ms, ipg/ewma 0.202/0.128 ms
The MySQL vm has an SSD persistent disk of 400 GB for hosting sandboxes.  The write latency to this disk is the following (-L for sequential writes as this is how binary logs and InnoDB Redo logs are written to):
$ ioping -WWW -c 20000 -i 0 -L -s 4k /dev/sdb
--- /dev/sdb (block device 400 GiB) ioping statistics ---
20.0 k requests completed in 11.8 s, 78.1 MiB written, 1.69 k iops, 6.61 MiB/s
generated 20 k requests in 12.4 s, 78.1 MiB, 1.61 k iops, 6.30 MiB/s
min/avg/max/mdev = 370.3 us / 590.5 us / 24.9 ms / 274.1 us
And the advertised performance characteristics are the following (Google is not providing any latency indicators, which is a little disappointing as many applications, including MySQL, are heavily relying on this):

Note that the advertised IOPS above are random IOPS.  In the cases of appending to the InnoDB Redo Log and to binary log, I only need sequential IOPS.

(I am voluntarily omitting filesystem details, this will be the subject of another post.)


I am using dbdeployer version 1.31.0 and the glibc2.12 tarball distribution of MySQL 5.7.26:
$ dbdeployer --version
dbdeployer version 1.31.0
$ dbdeployer unpack --prefix=mysql_ mysql-5.7.26-linux-glibc2.12-x86_64.tar.gz
I am using those commands to create the sandbox.
$ dbdeployer deploy replication mysql_5.7.26 \
    --gtid --repl-crash-safe \
    --bind-address "*" \
    -c log-slave-updates \
    -c skip-slave-start \
    -c "innodb_buffer_pool_size = $((4*1024*1024*1024))" \
    -c "innodb_doublewrite = OFF"
Ok, this needs a little more explanation...  I use GTIDs and replication crash-safety as this is the safe way to run MySQL (Binlog Servers are coming which could change things for GTIDs, but this is the subject for a future post, and if you are already curious, look at the Percona blog: MySQL Ripple: The First Impression of a MySQL Binlog Server; or at LeFred's blog: Ripple Binlog Server for MySQL).  By default, a sandbox only listen on, and as I want to do remote tests, I need to listen on all interfaces.  I need binary log generation on slaves to do fair replication tests (without this, the impact of the sync_binlog parameter is invisible on slaves).  I do not want to start replication automatically as I am using tricks for replication tests (more on this below).  I need to grow the InnoDB Buffer Pool (I am using 4 GB) as the default (128 MB) is too small and risks making InnoDB flushing behaviour impacts the result of the tests.  I also disable the InnoDB Double Write Buffer as I want to avoid this to impact the tests (I am not sure it actually makes a big difference though as flushing is asynchronous to transaction committing, and writing to the double write buffer is sequential while writing to the tables is random, so a single big sequential write in front on many random writes is not a relatively big cost).

After the sandbox creation, I stop slave1 and slave2 as my first tests are not involving replication (below is run in the sandbox directory):
$ ./node1/stop
$ ./node2/stop
Now I am left with only the master running, and I need to prepare the database for the benchmark, which includes creating the schema and user for sysbench:
$ ./m -u root performance_schema <<< "
    UPDATE setup_instruments SET ENABLED = 'YES' WHERE NAME = 'transaction';
    CREATE USER 'sysbench'@'%' IDENTIFIED BY 'sysbench';
And this also deserves a little more explanations...  I enable transaction instrumentation in Performance Schema as I am checking TPS during the benchmark execution.  Database creation is straightforward.  The host part of this user is a littler wide (@'%') and the same is true for its permission.  Normally, I would restrict to a subnet (I am not because I am lazy) and I would only grant ON sbtest.*, but as I need to change MySQL configuration during my tests, I am re-using the same user which needs wider permissions.  Do not create users in @'%' or GRANT [...] ON *.* to an application user on a production system, I am doing this here only because it is a test environment !


I am running sysbench 1.0.17 installed from the Percona Repositories.  I am using a configuration file to ease its execution ($c below), and I am running the oltp_insert benchmark ($b below) without secondary index:
$ sysbench --version
sysbench 1.0.17
$ c=<path to my conf file>/sysbench.cnf
$ b=/usr/share/sysbench/oltp_insert.lua
$ sysbench --config-file=$c $b --create_secondary=off prepare
$ sysbench --config-file=$c $b --time=$time run
I am not using secondary indexes as it would slow down the benchmark and my goal is to test the impact of the durability setting on transaction throughput, not other parts of MySQL (tests in low durability gave ~5730 TPS with index and ~6270 TPS without index, so index has a non-negligible impact).  I am using the oltp_insert benchmark for the same reason: it is a benchmark that should maximise the visibility of changing durability while minimising stress on InnoDB flushing (it is inserting in primary key order, which should have a very high insert to page flushing ratio while always being in memory — if you do not understand this sentence, you need to dig into InnoDB Internals which I cannot cover here).


To do replication tests, I start by injecting a workload on the master with both slaves stopped and with reduced durability (my goal here is to quickly generate binary logs with replication workload, not to test high and low durability):
$ ./m <<< "SET GLOBAL sync_binlog = 0; SET GLOBAL innodb_flush_log_at_trx_commit = 2"
$ sysbench --config-file=$c $b --create_secondary=off prepare
$ sysbench --config-file=$c $b --time=$time run
During this injection, both slaves of the sandbox are stopped.  I then start slave2 and download all binary logs from the master.  The data directory of slave2, including the relay-logs, will be the starting point of my test.  After the download is completed, I stopped both slave2 and the master:
$ ./node2/start
$ # Wait until all binary logs are downloaded...
$ ./node2/stop
$ ./master/stop
Here it would be useful to have a function allowing to wait for the IO Thread to reach a specific position.  The functions MASTER_POS_WAIT, WAIT_FOR_EXECUTED_GTID_SET and WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS allow waiting for the SQL Thread to reach a certain position, but there are no corresponding functions for the IO Thread.  In the same logic, START SLAVE UNTIL allows starting replication until the SQL Thread reaches a specific position, there is no corresponding command for the IO Thread.  I opened Bug#96134 (Please provide control functions for the IO Thread) for adding those features.

For running a test , I copy the content of slave2 to slave1 and then make slave1 ready for the test with those commands:
$ ./node1/stop
$ rm -rf ./node1/data
$ ( cd node2; tar -c data; ) | ( cd node1; tar -x)
$ ./node1/start
$ .s1 performance_schema <<< "
    UPDATE setup_instruments SET ENABLED = 'YES' WHERE NAME = 'transaction'"
And now the time for a quick rant: doing above does not work alone ! :-(  The reason is that on startup and because I enabled replication crash safety, relay-log-recovery wipes the relay logs. :-( :-(  To avoid that, I need to edit the configuration file of slave1 and disable relay-log-recovery (which also somehow disable replication crash safety, but I do not need this part of replication crash safety during my tests):
$ sed -i -e '/relay-log-recovery/s/on/off/' ./node1/my.sandbox.cnf
I reported that a long time ago as Bug#74321 in MySQL 5.6, it is disappointing that this is still not fixed after 4 years and 5.7 & 8.0 being released.

EOA: End Of Annexe.

No comments:

Post a Comment