Monday, January 5, 2026

Undo Log Truncation Bug in 8.0 leads to Data Corruption

I am upset about this one : I have a hard time not seeing this as negligence, and it starts to become a pattern...  So please forgive me if this post is not my most diplomatic, because I really think someone deserves a kick in the butt !  But what is all this about...

There is a MySQL bug, which can lead to data corruption, opened for 8.0 in September 2023, fixed in MySQL 8.4.0 (released in February 2024), AND STILL NOT FIXED IN MySQL 8.0.44 !  Luckily for some, it is fixed in Percona Server 8.0.39 (released in October 2024).  As usual, I start this post with some context before diving in the nitty-gritty details.

TL&DR : on a super_read_only MySQL 8.0, set innodb_undo_log_truncate to OFF.

InnoDB Undo Logs

(if you know all there is to know about Undo Logs, feel free to jump to the next section about the InnoDB Undo Logs Truncation Bug, but check below TL&DR before skipping this section)

TL&DR about the InnoDB Undo Logs :

  • Deleted rows are not copied to the Undo Logs;
  • Undo Logs are used for rollback and MVCC (and crash recovery and hot backup preparation, both rolling back uncommitted transactions --> this will be important);
  • The Undo Logs is the InnoDB Transaction Log (the Redo Logs is the Physical Log, no transaction data in the Redo Logs);
  • Cleaning up the Undo Logs is done by a background process called the InnoDB Purge (commit is fast, leaving all cleanup for the asynchronous Purge; rollback is slow needing to undo all the transaction synchronously);
  • Undo Log Truncation is a relatively recent feature of InnoDB, introduced in MySQL 5.7 (before that, runaway Undo Logs could cause them to grow bigger than the size of the actual data, and it was very tedious to reclaim this disk space).

Information about the InnoDB Undo Logs can be found in the MySQL Documentation (Undo Logs, and Undo Tablespaces).  But the most important thing I learned about the Undo Logs is from talking with an InnoDB Expert (thanks Sunny) :

The Undo Logs is the InnoDB Transaction Log

From a transaction point of view, InnoDB Redo and Undo Logs are often confused.  The Redo Log is the physical log of InnoDB : there are no transaction information there, only InnoDB Page Mutations mapping to Log Sequence Numbers.  When a transaction is rolled back / undone, this generates new data in InnoDB (this new data happens to be identical to the old one).  A rollback causes InnoDB data to move forward, with the transaction being undone generating new Redo Logs.

Rollback is not done from the Redo Logs,
it is done from the Undo Logs,
and rollback generates new Redo Logs

When a transaction is started, an entry for this transaction is created in the Undo Logs / Transaction Log.  All the rows added, removed and modified by this transaction are tracked in this log.  An added / inserted row is physically added in the table, with the log containing a pointer to this row "remembering" that it was added by the transaction.  If a transaction removes a row, this row is marked as deleted in the table, and a pointer to this row is stored in the log (a deleted row is not copied in the Undo Logs, only a pointer to it is stored).  For an update, the previous values are stored in an Undo Log Record, the row is updated with the new values and with a rollback pointer to this Undo Log Record, and the Transaction Log / Undo Log stores a pointer to this modified row (at this point, the table contains uncommitted data).

On rollback, the Undo Logs allows going through all the rows impacted by this transaction to synchronously Undo the changes.  On commit, the transaction is "only" marked as committed in the log, all the cleanup is done asynchronously by the InnoDB Purge.  And it is important to not yet garbage-collect the delete-marked rows because they might still be needed by consistent views opened before the row was deleted.  The same is true for Undo Log Record of committed modified rows : these might be needed by previously opened consistent views to reconstruct the content of the row.

Undo Logs reside in Tablespaces, organized in Pages, cached in the InnoDB Buffer Pool.  So when a modification is done in the Undo Log, it will be flushed to disk asynchronously by the InnoDB Page Cleaner, and this modification generates a Redo Log entry.

In MySQL 5.6 and before, the Undo Logs were residing in the System Tablespace.  In some conditions, the Undo Logs could grow a lot, sometimes even bigger than the actual data (this is called runaway Undo Logs, and it can happen if the Purge is not keeping-up, or if a transaction is left opened which prevents the Purge to advance).  Runaway Undo Logs were leaving a very large ibdata1 file on disk (the file storing the System Tablespace), and it was very complex to reclaim this disk space.

MySQL 5.7 allowed storing the Undo Logs in their dedicated tablespaces, and it is the default in 8.0.  With dedicated tablespaces, it is now possible to reclaim disk space used by runaway Undo Logs.  There is a Manual way and an Automated way of Truncating Undo Logs, and the Automated Truncation is the source of the bug discussed in this post.

I cannot close this section without mentioning a post by Jeremy Cole where you can learn more about Undo Logging : The basics of the InnoDB undo logging and history system.

InnoDB Undo Logs Truncation Bug

(the actual name of the procedure is Undo Tablespace Truncation, but in this post, I use both Undo Log Truncation and Undo Tablespace Truncation, and so does the MySQL Documentation)

So since MySQL 5.7, it is possible to truncate Undo Tablespace, which allows reclaiming disk space in case of runaway Undo Logs (either if InnoDB the Purge was lagging, or if it was blocked by a long-running transaction).  I am not describing the details of 5.7, I am going straight to 8.0.

MySQL 8.0 also introduced the new Transactional Data Dictionary (DD).  This dictionary resides in InnoDB, and updating it needs write access to the database.  But there is one thing not playing nice with needing write access to the database and the Data Dictionary : super_read_only.  Many bugs have been opened for that, some still not fixed (like this one which also affects 5.7 : Unable to analyze table whilst super_read_only is enabled).  And guess what needs to write to the data dictionary...

Undo Log Truncation needs writing to the DD

If a MySQL instance is super_read_only, the manual procedure for truncating Undo Tablespace fails as shown below.  This is not such a big problem because a smart dba will figure things out.  However, we cannot say the same thing for MySQL Automated Undo Tablespace Truncation, but I am getting ahead of myself... 

> CREATE UNDO TABLESPACE undo_003 ADD DATAFILE 'undo_003.ibu';
ERROR 1290 (HY000): The MySQL server is running with the --super-read-only option so it cannot execute this statement

> ALTER UNDO TABLESPACE innodb_undo_001 SET INACTIVE;
ERROR 1290 (HY000): The MySQL server is running with the --super-read-only option so it cannot execute this statement

When someone sees above, the solution is straightforward : disable super_read_only and retries.  But...

Automated Undo Log Truncation fail miserably
with super_read_only

When Automated Undo Log Truncation runs on a super_read_only instance, file shrinkage succeeds, but it leaves a trunc.log file on disk.  Also, the content of the INFORMATION_SCHEMA table INNODB_TABLESPACES is corrupted (which I call Data Dictionary Corruption for the rest of this post, not to be confused by actual Data Corruption in user tables).  I will cover the impacts of this corruption and the trunc.log file later in this post, but let's recap all this with details.

In annex #1, I show how to trigger the Automated Undo Log Truncation Bug.  For this, I create a dbdeployer sandbox with five (5) replicas (the replicas have different settings, some triggering the bug, some not triggering the bug, with and without verbose logging).  On the primary, I run a big-enough transaction to grow the Undo Log to a size which triggers Automated Truncation.  On both replica #1 and #2, the bug is triggered because they are super_read_only.  In the logs of replica #1, there is a missing line about "Completed truncate" (it is present in replica #3 where the bug was not triggered).  Beside the missing line, nothing in the logs indicates that truncation failed, which is very bad error handling !  What allows us to see something went wrong is the trunc.log file left next to the Undo Log files, and the size of the Undo Log being reported as zero (0) in the table INNODB_TABLESPACES.  At this point, MySQL is running correctly, but the trunc.log file will cause problems on a MySQL restart.

In annex #2, we can see that on MySQL startup, the presence of a trunc.log file causes an Undo Tablespace Reconstruction (see the relevant log lines below).  These lines are only present with verbose logging, which is also bad considering the seriousness of this operation !

[InnoDB] Undo tablespace number 2 was being truncated when mysqld quit.
[InnoDB] Will create 1 new undo tablespaces.
[InnoDB] Reconstructing undo tablespace number 2.
[InnoDB] Creating UNDO Tablespace ./undo_002

So far, we have not seen any data corruption, only the reconstruction of the Undo Logs caused by a trunc.log file.  But because they contain important information, discarding / reconstructing Undo Logs is not something anyone familiar with them should be comfortable with.  One such important information is how to roll back uncommitted transaction on a crash, so let's explore this...

In annex #3, after generating two trunc.log files, I kill MySQL with an uncommitted UPDATE transaction.  Normally, we would expect this transaction to be rolled back on startup, but this is not what we observe in the presence of the trunc.log files.  After crash recovery, the data in InnoDB is the one of the uncommitted transaction because the required Undo Log to roll back the transaction has been deleted.

Undo Log Reconstruction because of trunc.log files
leads to trx not being rolled back on crash-recovery

I think there is another potential corruption because Undo Log Reconstruction, but I have not explored it in detail.  If MySQL is stopped while the Undo Logs contain pointers to delete-marked rows, and if a full purged is not done on shutdown (innodb_fast_shutdown != 0), these pointers will be lost on startup, and the delete-marked rows will not be removed from the table.

Undo Log Reconstruction because of trunc.log files
might leave garbage in tables (delete-marked rows)

Also another potential corruption I have not fully explore is the preparation of a hot backup in the presence of trunc.log files.  Similarly to rolling back uncommitted transaction on crash recovery, preparing a hot backup needs rolling back uncommitted transactions.  A trunc.log file might be messing up with this process.

Undo Log Reconstruction because of trunc.log files
might mess up with the preparation of a hot backup

So, we have seen that this is quite bad !  In summary and with super_read_only, Automated Undo Tablespace Truncation fails, leaving a trunc.log file on disk.  On restart, trunc.log files cause an Undo Tablespace Reconstruction.  This wipes the Undo Log, and can lead to data corruption after crash-recovery.  What can be done about this ?

Undo Logs Truncation Bug Workaround

The Undo Log Truncation Bug is caused by super_read_only (sro), so a workaround is to not use sro.  But because using sro is a replication best practice, we should look for another solution.

A way to avoid the problems caused by the Undo Log Truncation Bug is to disable MySQL Automated Undo Tablespace Truncation.  For this, innodb_undo_log_truncate should be set to OFF.  Obviously, with this disabled, we might end up with a lot of disk space used by the Undo Logs, and to fix this, one can run Manual Undo Tablespace Truncation (which needs disabling sro).

To avoid data corruption
set 
innodb_undo_log_truncate to OFF

In addition to avoiding the bug, we should deal with the consequence of the bug having been previously triggered.  If you already had a MySQL restart with a trunc.log file, I am afraid I do not have a good solution (except maybe detecting this if you had verbose logging by looking for the log line "Reconstructing undo tablespace").

If you had the bug triggered and did not yet restart, I know of two solutions.  This situation can be detected by looking for trunc.log files, or by looking for a zero-sized Undo Log in the table INNODB_TABLESPACES.

The first solution is to do a slow shutdown with a full InnoDB Purge (innodb_fast_shutdown = 0).  This way, the reconstruction of the Undo Log is not problematic, and this removes the Data Dictionary Corruption.

The second solution is closer to Voodoo incantations.  If doing a slow shutdown is not something you can do right now, you can rename / move the trunc.log files to avoid Undo Tablespace Reconstruction on the next unplanned / fast shutdown.  This avoids Undo Log Reconstruction, but keeps the Data Dictionary Corruption.  This corruption prevents doing Manual Undo Tablespace Truncation (as shown below), but at least, InnoDB Data Corruption caused by Undo Log Reconstruction is avoided.  To remove the DD Corruption, you need to place back the trunc.log files and do a slow shutdown to trigger a safe Undo Log Reconstruction (if you deleted the trunc.log files, you can have MySQL re-create them with Automated Undo Log Truncation).  This trunc.log files Voodoo is described in annex #4.

> SET GLOBAL super_read_only = off;
Query OK, 0 rows affected (0.00 sec)

> CREATE UNDO TABLESPACE undo_003 ADD DATAFILE 'undo_003.ibu';
Query OK, 0 rows affected (0.03 sec)

> SELECT NAME, STATE, FILE_SIZE FROM INFORMATION_SCHEMA.INNODB_TABLESPACES WHERE NAME LIKE '%undo%';
+-----------------+--------+-----------+
| NAME            | STATE  | FILE_SIZE |
+-----------------+--------+-----------+
| innodb_undo_001 | active |         0 |
| innodb_undo_002 | active |  16777216 |
| undo_003        | active |  16777216 |
+-----------------+--------+-----------+
2 rows in set (0.01 sec)

> ALTER UNDO TABLESPACE innodb_undo_001 SET INACTIVE;
ERROR 1533 (HY000): Failed to alter: UNDO TABLESPACE innodb_undo_001
> ALTER UNDO TABLESPACE innodb_undo_002 SET INACTIVE;
Query OK, 0 rows affected (0.00 sec)

> ALTER UNDO TABLESPACE innodb_undo_002 SET ACTIVE;
Query OK, 0 rows affected (0.00 sec)

> ALTER UNDO TABLESPACE undo_003 SET INACTIVE;
Query OK, 0 rows affected (0.00 sec)

> ALTER UNDO TABLESPACE undo_003 SET ACTIVE;
Query OK, 0 rows affected (0.00 sec)

History of this Bug and Rant

The bug about Undo Log Truncation Failure was reported for 8.0 in September 2023 (Bug #112262: tablespace is missing for table innodb_undo_002).  This bug was fixed in MySQL 8.4.0 (released in February 2024), BUT IT WAS NOT FIXED IN 8.0 (Oracle/MySQL 8.0.44 still affected).  And for that, I opened Bug #119628: Undo Truncation Bug, not fixed in 8.0, leads to Data Corruption.

Yes, MySQL 8.0 will be EOL in 2026, but this bug was reported in 2023, and fixed in 8.4 in 2024.  I fail to see why it was not fixed in 8.0.  And it is not the first time I see bugs fixed in 8.x and not in 8.0.  I blogged about this in the past (before the release of 8.4) : Thoughts on the October 2023 MySQL Releases.

Luckily, we have Percona
back-porting fixes in their fork of MySQL

The Undo Log Truncation Failure Bug is fixed in Percona Server 8.0.39 (released in October 2024).  I think the delay for the fix by Percona is that this bug was only brought to their attention in July 2024 with PS-9322 (With super_read_only=1, undo truncation cannot update DD and leaves orphan truncate log files).  Thanks Percona for back-porting bug fixes !

As I wrote in the introduction, I have a hard time not seeing this bug still affecting 8.0 as negligence, and I am very upset about this one (in addition to all the others I previously mentioned - blog posttweet #1 and tweet #2 - and the ones I am yet to talk about).  I think Oracle should clarify their policy on bug fixes, and in the meantime, initiatives to Rebuild the MySQL Community are desperately needed !

Annex #1 : Triggering the Undo Log Truncation Bug

# Create a sandbox for our tests.
# (we use 6 nodes: 1 primary and 5 replicas, different configurations are applied below)
# (we need replication-crash-safety as we will kill mysqld in annex #3)
# (innodb_purge_rseg_truncate_frequency=1 for quick automated truncation)
# (the pv command is a trick to time command execution)
{
  v=8.0.44; d=${v//./_}
  dbda="-n 6 --repl-crash-safe -c innodb_purge_rseg_truncate_frequency=1"
  dbdeployer deploy replication $v $dbda | pv -tN dbdepl. > /dev/null
  cd ~/sandboxes/rsandbox_$d
}
  dbdepl.: 0:00:27

# Create a table for our tests, and configure replicas.
# (replicas 1 and 2 are super_read_only: automated truncation will fail there)
# (replicas 1 and 3 have verbose logging, replicas 2 and 4 have normal logging)
# (replica 5 will not auto-truncate)
{
  ./m <<< "
    CREATE DATABASE test_jfg;
    CREATE TABLE test_jfg.t1(id INT PRIMARY KEY, v INT)"

  for i in {1,2};  do ./s$i <<< "SET PERSIST super_read_only = 1"; done
  for i in {1,3};  do ./s$i <<< "SET PERSIST log_error_verbosity = 3"; done
  for i in {1..4}; do ./s$i <<< "SET PERSIST innodb_max_undo_log_size = 12582912"; done

  sql1="@@innodb_max_undo_log_size, @@super_read_only, @@log_error_verbosity"
  for i in {1..5}; do ./s$i -N <<< "SELECT 's$i', $sql1"; done
}
s1	12582912	1	3
s2	12582912	1	2
s3	12582912	0	3
s4	12582912	0	2
s5	1073741824	0	2

# Some functions to make tests more compact.
function flush_error_logs() {
  for i in {1..5}; do
    rm node$i/data/msandbox.err
    ./s$i <<< "FLUSH LOCAL ERROR LOGS"
  done
}

function inserts() {
  ./m test_jfg <<< "TRUNCATE TABLE t1"
  seq 1 1000000 |
    while read i; do echo "($i,$(($i % 10)))"; done |
    paste -s -d "$(printf ',%.0s' {1..1000})\n"     |
    sed -e 's/.*/INSERT INTO t1 values &;/'         |
    sed -e '1s/^/BEGIN; /;$s/$/ COMMIT;/'           |
    ./m test_jfg | pv -tN insert
}

function replication_wait() {
  local sms="$(./m <<< "SHOW MASTER STATUS\G")"
  local file="$(awk '$1 == "File:"{print $2}' <<< "$sms")"
  local pos="$(awk '$1 == "Position:"{print $2}' <<< "$sms")"

  for n in s{1..5}; do
    ./$n <<< "SELECT MASTER_POS_WAIT('$file', $pos)" > /dev/null
  done | pv -tN rplwait
}

# Check state after Automated Undo Log Truncation.
# (the sleep is to make sure truncation has happened)
# (ignoring MY-010559 because we do not care about MTR logs)
# 
# In node2 logs (or their absence), we see truncation silently failed.
# In node1 logs, we gather truncation did not complete, but without a descriptive error.
# We have trunc.log files in node1 and node2 (where truncation failed because of sro).
# And we see corrupted data dictionary (FILE_SIZE = 0 in I_S.INNODB_TABLESPACES).
{
  flush_error_logs; inserts; replication_wait; sleep 2
  grep -i -e . node*/data/msandbox.err | grep -v -e MY-010559
  ls -l node*/data/undo*
  sql2="NAME, FILE_SIZE FROM INNODB_TABLESPACES WHERE NAME LIKE '%undo%'"
  for n in s{1..5}; do ./$n -N INFORMATION_SCHEMA <<< "SELECT '$n', $sql2"; done
}
   insert: 0:00:06
  rplwait: 0:00:03
node1/data/msandbox.err:2026-01-05T21:29:05.266859Z 0 [Note] [MY-012994] [InnoDB] Truncating UNDO tablespace innodb_undo_002
node3/data/msandbox.err:2026-01-05T21:29:05.282640Z 0 [Note] [MY-012994] [InnoDB] Truncating UNDO tablespace innodb_undo_002
node3/data/msandbox.err:2026-01-05T21:29:05.366678Z 0 [Note] [MY-013000] [InnoDB] Completed truncate of undo tablespace innodb_undo_002.
-rw-r----- 1 jfgagne staff 16777216 Jan  5 16:29 node1/data/undo_001
-rw-r----- 1 jfgagne staff 16777216 Jan  5 16:29 node1/data/undo_002
-rw-r----- 1 jfgagne staff    16384 Jan  5 16:29 node1/data/undo_2_trunc.log
-rw-r----- 1 jfgagne staff 16777216 Jan  5 16:29 node2/data/undo_001
-rw-r----- 1 jfgagne staff 16777216 Jan  5 16:29 node2/data/undo_002
-rw-r----- 1 jfgagne staff    16384 Jan  5 16:29 node2/data/undo_2_trunc.log
-rw-r----- 1 jfgagne staff 16777216 Jan  5 16:29 node3/data/undo_001
-rw-r----- 1 jfgagne staff 16777216 Jan  5 16:29 node3/data/undo_002
-rw-r----- 1 jfgagne staff 16777216 Jan  5 16:29 node4/data/undo_001
-rw-r----- 1 jfgagne staff 16777216 Jan  5 16:29 node4/data/undo_002
-rw-r----- 1 jfgagne staff 16777216 Jan  5 16:29 node5/data/undo_001
-rw-r----- 1 jfgagne staff 33554432 Jan  5 16:29 node5/data/undo_002
s1	innodb_undo_001	16777216
s1	innodb_undo_002	0
s2	innodb_undo_001	16777216
s2	innodb_undo_002	0
s3	innodb_undo_001	16777216
s3	innodb_undo_002	16777216
s4	innodb_undo_001	16777216
s4	innodb_undo_002	16777216
s5	innodb_undo_001	16777216
s5	innodb_undo_002	33554432

Annex #2 : MySQL Startup with trunc.log Files (reconstructing undo logs)

Below needs to be run after the commands of Annex #1.

# More function used below to make tests more compact.
function restart_with_empty_logs() {
  ( # In a sub-shell to avoid job logs.
    for i in {1..5}; do
      rm node$i/data/msandbox.err
      ./node$i/restart > /dev/null &
    done
    wait; ) | pv -tN restart
}

# Restart MySQL...
# In the logs of node1, we see the trunc.log file causing "Reconstructing undo tablespace".
# Nothing in the logs of node2 even if the file inode and tablespace SPACE changed (no change in node3).
# The reconstruction fixed the table I_S.INNODB_TABLESPACES.
{
  ls -il node{1..3}/data/undo*
  sql3="NAME, SPACE, FILE_SIZE FROM INNODB_TABLESPACES WHERE NAME LIKE '%undo%'"
  for n in s{1..3}; do ./$n -N INFORMATION_SCHEMA <<< "SELECT '$n', $sql3"; done
  restart_with_empty_logs
  grep -i undo node{1..3}/data/msandbox.err
  ls -il node{1..3}/data/undo*
  for n in s{1..3}; do ./$n -N INFORMATION_SCHEMA <<< "SELECT '$n', $sql3"; done
}
3604066 -rw-r----- 1 jfgagne staff 16777216 Jan  5 16:29 node1/data/undo_001
3606122 -rw-r----- 1 jfgagne staff 16777216 Jan  5 16:29 node1/data/undo_002
3606121 -rw-r----- 1 jfgagne staff    16384 Jan  5 16:29 node1/data/undo_2_trunc.log
3604397 -rw-r----- 1 jfgagne staff 16777216 Jan  5 16:29 node2/data/undo_001
3606128 -rw-r----- 1 jfgagne staff 16777216 Jan  5 16:29 node2/data/undo_002
3606125 -rw-r----- 1 jfgagne staff    16384 Jan  5 16:29 node2/data/undo_2_trunc.log
3604725 -rw-r----- 1 jfgagne staff 16777216 Jan  5 16:29 node3/data/undo_001
3606124 -rw-r----- 1 jfgagne staff 16777216 Jan  5 16:29 node3/data/undo_002
s1	innodb_undo_001	4294967279	16777216
s1	innodb_undo_002	4294967278	0
s2	innodb_undo_001	4294967279	16777216
s2	innodb_undo_002	4294967278	0
s3	innodb_undo_001	4294967279	16777216
s3	innodb_undo_002	4294967151	16777216
  restart: 0:00:07
node1/data/msandbox.err:2026-01-05T21:29:47.043019Z 1 [Note] [MY-013252] [InnoDB] Using undo tablespace './undo_001'.
node1/data/msandbox.err:2026-01-05T21:29:47.044382Z 1 [Note] [MY-012902] [InnoDB] Undo tablespace number 2 was being truncated when mysqld quit.
node1/data/msandbox.err:2026-01-05T21:29:47.044746Z 1 [Note] [MY-013252] [InnoDB] Using undo tablespace './undo_002'.
node1/data/msandbox.err:2026-01-05T21:29:47.044985Z 1 [Note] [MY-013040] [InnoDB] Will create 1 new undo tablespaces.
node1/data/msandbox.err:2026-01-05T21:29:47.044990Z 1 [Note] [MY-012910] [InnoDB] Opened 1 existing undo tablespaces.
node1/data/msandbox.err:2026-01-05T21:29:47.045027Z 1 [Note] [MY-012915] [InnoDB] Created 1 undo tablespaces.
node1/data/msandbox.err:2026-01-05T21:29:47.074159Z 1 [Note] [MY-012904] [InnoDB] Reconstructing undo tablespace number 2.
node1/data/msandbox.err:2026-01-05T21:29:47.074550Z 1 [Note] [MY-012896] [InnoDB] Creating UNDO Tablespace ./undo_002
node1/data/msandbox.err:2026-01-05T21:29:47.074566Z 1 [Note] [MY-012897] [InnoDB] Setting file ./undo_002 size to 16 MB
node1/data/msandbox.err:2026-01-05T21:29:47.077524Z 1 [Note] [MY-013018] [InnoDB] Created 128 and tracked 128 new rollback segment(s) in undo tablespace number 2. 128 are now active.
node3/data/msandbox.err:2026-01-05T21:29:47.823016Z 1 [Note] [MY-013252] [InnoDB] Using undo tablespace './undo_001'.
node3/data/msandbox.err:2026-01-05T21:29:47.823209Z 1 [Note] [MY-013252] [InnoDB] Using undo tablespace './undo_002'.
node3/data/msandbox.err:2026-01-05T21:29:47.823584Z 1 [Note] [MY-012910] [InnoDB] Opened 2 existing undo tablespaces.
3604066 -rw-r----- 1 jfgagne staff 16777216 Jan  5 16:29 node1/data/undo_001
3606370 -rw-r----- 1 jfgagne staff 16777216 Jan  5 16:29 node1/data/undo_002
3604397 -rw-r----- 1 jfgagne staff 16777216 Jan  5 16:29 node2/data/undo_001
3606581 -rw-r----- 1 jfgagne staff 16777216 Jan  5 16:29 node2/data/undo_002
3604725 -rw-r----- 1 jfgagne staff 16777216 Jan  5 16:29 node3/data/undo_001
3606124 -rw-r----- 1 jfgagne staff 16777216 Jan  5 16:29 node3/data/undo_002
s1	innodb_undo_001	4294967279	16777216
s1	innodb_undo_002	4294967151	16777216
s2	innodb_undo_001	4294967279	16777216
s2	innodb_undo_002	4294967151	16777216
s3	innodb_undo_001	4294967279	16777216
s3	innodb_undo_002	4294967151	16777216

Annex #3 : Triggering Data Corruption (undo log reconstruction after a crash)

Below needs to be run after the commands of Annex #1.

# Generate two trunc.log files on either node1 or node2.
# (unclear why, but below sometimes only generate 1 trunc.log file)
# (with a single trunc.log file, there will not be data corruption)
# (rerun the commands until there are two trunc.log files for node1 or node2)
{
  for i in 1 2; do inserts; replication_wait; sleep 2; done
  ls -l node{1,2,3}/data/undo*
}
   insert: 0:00:06
  rplwait: 0:00:03
   insert: 0:00:06
  rplwait: 0:00:04
-rw-r----- 1 jfgagne staff 16777216 Jan  5 16:32 node1/data/undo_001
-rw-r----- 1 jfgagne staff 16777216 Jan  5 16:32 node1/data/undo_002
-rw-r----- 1 jfgagne staff    16384 Jan  5 16:32 node1/data/undo_1_trunc.log
-rw-r----- 1 jfgagne staff    16384 Jan  5 16:32 node1/data/undo_2_trunc.log
-rw-r----- 1 jfgagne staff 16777216 Jan  5 16:32 node2/data/undo_001
-rw-r----- 1 jfgagne staff 16777216 Jan  5 16:32 node2/data/undo_002
-rw-r----- 1 jfgagne staff    16384 Jan  5 16:32 node2/data/undo_1_trunc.log
-rw-r----- 1 jfgagne staff    16384 Jan  5 16:32 node2/data/undo_2_trunc.log
-rw-r----- 1 jfgagne staff 16777216 Jan  5 16:32 node3/data/undo_001
-rw-r----- 1 jfgagne staff 16777216 Jan  5 16:32 node3/data/undo_002

# With two trunc.log files, crash MySQL with an uncommitted transaction.
# (cd in a sub-shell for the commands to also work for node2 and node3)
#
# In below, after killing mysqld, the transaction is not rolled back on startup.
# (with a rollback, we would have v = 1 for id = 1, but we have v = 2 )
# (running the same commands in node3 leads to a good rollback on startup)

( cd node1/; ./use test_jfg; )  # use node2 if the two trunc.log files are there.

slave1 [localhost:23446] {msandbox} (test_jfg) > SET GLOBAL read_only = off; SELECT * FROM t1 ORDER BY id LIMIT 1; BEGIN; UPDATE t1 SET v = v+1; SELECT * FROM t1 ORDER BY id LIMIT 1;
Query OK, 0 rows affected (0.00 sec)

+----+------+
| id | v    |
+----+------+
|  1 |    1 |
+----+------+
1 row in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 1000000 rows affected (1.88 sec)
Rows matched: 1000000  Changed: 1000000  Warnings: 0

+----+------+
| id | v    |
+----+------+
|  1 |    2 |
+----+------+
1 row in set (0.00 sec)

slave1 [localhost:23446] {msandbox} (test_jfg) > system ls -l data/undo*
-rw-r----- 1 jfgagne staff 16777216 Jan  5 16:32 data/undo_001
-rw-r----- 1 jfgagne staff 50331648 Jan  5 16:33 data/undo_002
-rw-r----- 1 jfgagne staff    16384 Jan  5 16:32 data/undo_1_trunc.log
-rw-r----- 1 jfgagne staff    16384 Jan  5 16:32 data/undo_2_trunc.log
slave1 [localhost:23446] {msandbox} (test_jfg) > system kill -9 $(cat data/mysql_sandbox*.pid)
slave1 [localhost:23446] {msandbox} (test_jfg) > SELECT * FROM t1 ORDER BY id LIMIT 1;
ERROR 2013 (HY000): Lost connection to MySQL server during query
No connection. Trying to reconnect...
Connection id:    14
Current database: test_jfg

+----+------+
| id | v    |
+----+------+
|  1 |    2 |
+----+------+
1 row in set (0.02 sec)

Annex #4 : Voodoo Workaround (without MySQL restart)

Below needs to be run after the commands of Annex #1 and the function of Annex #2.

# Clean up everything, restart, and get a trunc.log file.
# (replica drift does not cause a replication breakage because of TRUNCATE TABLE)
{
  restart_with_empty_logs; inserts; replication_wait; sleep 2
  sql2="NAME, FILE_SIZE FROM INNODB_TABLESPACES WHERE NAME LIKE '%undo%'"
  ./s1 -N INFORMATION_SCHEMA <<< "SELECT $sql2"
  ls -l node1/data/undo*
}
  restart: 0:00:04
   insert: 0:00:06
  rplwait: 0:00:04
innodb_undo_001	16777216
innodb_undo_002	0
-rw-r----- 1 jfgagne staff 16777216 Jan  5 16:34 node1/data/undo_001
-rw-r----- 1 jfgagne staff 16777216 Jan  5 16:34 node1/data/undo_002
-rw-r----- 1 jfgagne staff    16384 Jan  5 16:34 node1/data/undo_2_trunc.log

# Move the trunc.log file and restart.
# There is no Undo Log Reconstruction, but the DD Corruption is still there.
{
  sql3="NAME, SPACE, FILE_SIZE FROM INNODB_TABLESPACES WHERE NAME LIKE '%undo%'"
  ./s1 -N INFORMATION_SCHEMA <<< "SELECT $sql3"
  ls -il node1/data/undo*
  mv node1/data/undo_*_trunc.log /tmp
  restart_with_empty_logs
  grep -i undo node1/data/msandbox.err
  ./s1 -N INFORMATION_SCHEMA <<< "SELECT $sql3"
  ls -il node1/data/undo*
}
innodb_undo_001	4294967152	16777216
innodb_undo_002	4294967024	0
3608192 -rw-r----- 1 jfgagne staff 16777216 Jan  5 16:34 node1/data/undo_001
3609063 -rw-r----- 1 jfgagne staff 16777216 Jan  5 16:34 node1/data/undo_002
3609062 -rw-r----- 1 jfgagne staff    16384 Jan  5 16:34 node1/data/undo_2_trunc.log
  restart: 0:00:03
2026-01-05T21:35:26.829010Z 1 [Note] [MY-013252] [InnoDB] Using undo tablespace './undo_001'.
2026-01-05T21:35:26.829879Z 1 [Note] [MY-013252] [InnoDB] Using undo tablespace './undo_002'.
2026-01-05T21:35:26.830941Z 1 [Note] [MY-012910] [InnoDB] Opened 2 existing undo tablespaces.
innodb_undo_001	4294967152	16777216
innodb_undo_002	4294967024	0
3608192 -rw-r----- 1 jfgagne staff 16777216 Jan  5 16:35 node1/data/undo_001
3609063 -rw-r----- 1 jfgagne staff 16777216 Jan  5 16:35 node1/data/undo_002

# Move back the trunc.log file and restart.
# There is Undo Log Reconstruction, and DD corruption is fixed.
{
  ./s1 -N INFORMATION_SCHEMA <<< "SELECT $sql3"
  ls -li node1/data/undo*
  mv /tmp/undo_*_trunc.log node1/data
  restart_with_empty_logs
  ./s1 -N INFORMATION_SCHEMA <<< "SELECT $sql3"
  ls -li node1/data/undo*
}
innodb_undo_001	4294967152	16777216
innodb_undo_002	4294967024	0
3608192 -rw-r----- 1 jfgagne staff 16777216 Jan  5 16:35 node1/data/undo_001
3609063 -rw-r----- 1 jfgagne staff 16777216 Jan  5 16:35 node1/data/undo_002
  restart: 0:00:03
innodb_undo_001	4294967152	16777216
innodb_undo_002	4294966897	16777216
3608192 -rw-r----- 1 jfgagne staff 16777216 Jan  5 16:36 node1/data/undo_001
3609813 -rw-r----- 1 jfgagne staff 16777216 Jan  5 16:36 node1/data/undo_002

2 comments:

  1. Thank you for your thorough and well-informed explanation, Jean-François. As far as I understand, the root cause of this corruption is a bad design choice of implementing recovery on an unnecessarily high level, introducing coupling between subsystems that should have been kept independent. The simplest design would have been not to introduce yet another persistent data structure to keep track of truncation events, but to implement some special handling of updates of the FSP_SIZE page header field, without even having to change the format of the existing write-ahead log (WAL, redo log). Only the buffer pool flushing and recovery would have had to be tweaked in such a way that the undo tablespace will be entirely reinitialized in a single atomic mini-transaction.

    The over-complicated design in MySQL 5.7 involves executing a full checkpoint around tablespace truncation, that is, writing the entire buffer pool back to the modified data files, possibly including a massive amount of garbage undo log pages that would be discarded by truncating the file soon thereafter.

    In MariaDB Server, undo tablespace truncation was redesigned in MDEV-13564 back in 2018. The main motivation of this was to guarantee consistent backups. Since 2020, thanks to MDEV-15528, MariaDB avoids writes to data pages that have been marked as freed in the file. If the workload allows frequent enough undo tablespace truncation, MariaDB would durably write a mini-transaction for reinitializing the tablespace, and then discard the affected old undo pages from the buffer pool, without writing much to the undo tablespace files.

    When it comes to the InnoDB WAL in MariaDB Server, the upcoming 12.3 release should include an option to persist the binlog via InnoDB WAL (MDEV-34705), which solves many old problems, including the infamous MDEV-16589 that you commented on back in 2019. I am currently working on a new innodb_log_archive=ON format option (MDEV-37949). One day, it could be possible to run MariaDB Server in a configuration where the entire history is available in a single set of log files and you can perform arbitrary point-in-time recovery. This would involve some further work, such as an option to do away with binlog files, and a tool for extracting binlog events from the innodb_log_archive files.

    ReplyDelete
    Replies
    1. Thanks for joining the discussion Marko. I see you did not add to my Undo Logs description, it makes me think I am describing it well, if I missed something important, feel free to mention it.

      > the root cause of this corruption is a bad design choice of implementing recovery on an unnecessarily high level

      I am not sure I would describe it this way. IMHO, it is poor error handling for Automated Undo Log Truncation, which results in a partially executed truncation, with leftovers that will cause a bad side effect on restart. I would have preferred InnoDB to crash here than to silently move forward.

      > simplest design [...]

      I have to confess I am not at this level of details. So I cannot have an informed opinion on what you write here. I agree simpler is better, but I do not know if the simpler design you suggest would work.

      > over-complicated design in MySQL 5.7 involves executing a full checkpoint around tablespace truncation

      For me, because truncation is not something that is run often, I do not mind it being wasteful. A full checkpoint is not a heavy price to pay for saving disk space, and because truncation does not need to be quick, I do not mind it needing a full checkpoint (as long as it is non-blocking and not-impacting real-time traffic).

      > The main motivation of this was to guarantee consistent backups.

      Interesting. Does this mean that MySQL current truncation algorithm breaks backup ? IMHO, this does not necessarily need a redesign: we already have the backup lock, and truncation could be done in such a way that it is delayed if a backup is running (or would delay grabbing the backup lock).

      > a tool for extracting binlog events from the innodb_log_archive files

      Isn`t this mysqlbinlog ? It is of my opinion that binlogs should never be directly read from disk, but always read via connecting to the server, which is exactly what mysqlbinlog is doing.

      Thanks for the other MariaDB details Marko. I will have a look. Cheers, J-F

      Delete