Monday, May 26, 2025

Interesting Binary Logging Optimization in MariaDB

As I wrote in a LinkedIn post, I am working on a blog post related to binary logging of big transactions.  I thought I would split this post in two, so here is the first part where I cover the recent binary logging optimization in MariaDB and its unspoken advantage.

(And this post is a way for me to see if Planet MySQL is still blocking posts mentioning MariaDB.  This censorship / filtering is one of the reasons I created Planet for the MySQL Community.  To be perfectly clear, even if this post is not filtered, I will keep maintaining the Community Aggregator.  You can learn more about the history of the Community Aggregator in my post Planet for the MySQL Community Graduating from Beta.)

Update 2025-06-04 : the post did not show-up on Planet MySQL.  It might be understandable as in addition to explaining how MySQL works and could be optimized, I also write about a "competitor databases", which is forbidden.  But in my next post, I mostly write about MySQL, so I was extra-careful in avoiding mentioning MariaDB.

Update 2025-06-04 (bis) : I published the follow-up post : Interesting Troubleshooting of a MySQL Crash : filling then freeing the disk.

TL&DR : the unspoken advantage of the MariaDB optimization is being leaner on disk space utilization during the binary logging of big transaction.

When a transaction is being executed, the generated binary log events are accumulated in a memory buffer.  When this buffer is full (its size is of binlog_cache_size), the binary logging of the transaction spills to disk in a temporary file (it might never hit the disk, it can stay in the filesystem cache, but this detail is not relevant here).  Monitoring how often this spilling occurs can be done by looking at the global statuses Binlog_cache_use and Binlog_cache_disk_use.  On commit, and if a transaction spilled to disk, the temporary file is copied to the binary log.

Copying might take time

Because copying the temporary file to the binary log might take time, especially for big transactions, it can block other concurrent transactions trying to commit.  For this reason, MariaDB optimized the committing of big transaction (this optimization is a contribution by Libing Song of  Alibaba Cloud).  The idea is simple : instead of copying the temporary file, this file becomes the next binary log file.  This optimization and its limitations are described on the MariaDB blog in the post Binlog Commit Optimization For Large Transaction.

Avoiding copy is leaner on disk space

It is not explicitly mentioned in the post that this optimization is leaner on disk space.  For me this is a very interesting side effect of this optimization.  As an example when modifying an INTEGER on all rows of a table (the details are in annex #1) :

  • with a row size of about 4 KiB and 1 Mi rows (table size of 6.2 GiB),
  • the size of the full RBR transaction in the binary log is 7.6 GiB,
  • so a such transaction needs a total free disk space of 2x 7.6 = 15.2 GiB.

With the MariaDB optimization, above transaction only needs 7.6 GiB to complete (not 15.2).  It also commits quicker.

For both of these reasons - the quicker commit and the saving of disk space - I think it would be great to have this optimization in MySQL.  So I opened Bug #118332 : Please consider optimizing the binary log committing of big transactions (latency and disk usage).

Update 2025-06-04 (ter) : in comments of the bug above, I referenced this post and attribute the idea of this to MariaDB, but these comments were made private. It is not my intention of implying that the idea of using the tmp file as a new binlog file is mine.

A quick closing comment : it is usually a MySQL best practice to avoid big transactions.  Me suggesting optimizing the binary logging of big transactions does not mean I endorse their usage.  I still think big transactions should be avoided (transactions should be smaller than 10 MiB and they should last less than one second, maybe even less).  Independently of that best practice, MySQL should make big transactions quicker and leaner to commit.


Annex #1 : Example of Big Transaction in the Binary Logs

The timing of below is on a m6id.large AWS vm (2 vCPU, 8 GiB RAM, and 118 GB local SSD).

# Create a sandbox for our tests.
dbdeployer deploy single mysql_8.4.5

# Creating a table, inserting 1 Mi rows, and growing each row to about 4 KiB.
# (the pv commands are a trick to time command execution)
{
  nb_rows=$((1024*1024))

  ./use <<< "
     CREATE DATABASE test_jfg;
     CREATE TABLE test_jfg.t (id INT AUTO_INCREMENT PRIMARY KEY, a INT DEFAULT 0)"

  seq 1 $nb_rows |
    awk '{print "(null)"}' |
    tr " " "," | paste -s -d "$(printf ',%.0s' {1..100})\n" |
    sed -e 's/.*/INSERT INTO t(id) values &;/' |
    ./use test_jfg | pv -tN insert

  y200="$(yes | head -n 200 | paste -s -d "")"
  y240="$(yes | head -n 240 | paste -s -d "")"
  { echo "ALTER TABLE t ADD COLUMN c0 CHAR(200) DEFAULT '$y200'"
           seq -f " ADD COLUMN c%.0f CHAR(240) DEFAULT '$y240'" 1 15
  } | paste -s -d "," | ./use test_jfg

  ./use test_jfg <<< "ALTER TABLE t FORCE"      | pv -tN alter
  ./use test_jfg <<< "FLUSH TABLE t FOR EXPORT" | pv -tN flush

  ls -lh data/test_jfg/t.ibd
}
   insert: 0:00:08
    alter: 0:04:00
    flush: 0:00:00
-rw-r----- 1 jgagne jgagne 6.2G Jun  2 17:45 data/test_jfg/t.ibd

# Rotate the binary logs and update all rows of the table,
#   showing a big transaction in the binary logs (7.6 GiB),
#   and the commit delay (computed to 1:52).
{
  sql="
    FLUSH BINARY LOGS;
    BEGIN;
    SELECT now(); UPDATE t SET a = a+1;
    SELECT now(); COMMIT;"
  ./use -N test_jfg <<< "$sql" | pv -tN update
  ls -lh data/binlog.0*
}
2025-06-02 17:45:53
2025-06-02 17:50:12
   update: 0:06:11
-rw-r----- 1 jgagne jgagne  12M Jun  2 17:45 data/binlog.000001
-rw-r----- 1 jgagne jgagne 7.6G Jun  2 17:52 data/binlog.000002
-rw-r----- 1 jgagne jgagne  158 Jun  2 17:52 data/binlog.000003

No comments:

Post a Comment