InnoDB Basics - Compaction: when and when not

This is old news for MySQL/MariaDB expert but people that are starting using InnoDB do not always know that disk space is not automatically released when deleting data from a table. To explain and demonstrate that, I will take two real-world examples: table1 and table2.

Recently, more than 90% and about 20% of rows were deleted from table1 and table2 (those tables contain real data, I only changed the table names). Those two tables are quite large (121GiB and 846GiB respectively):

# ls -lh table1.ibd table2.ibd
-rw-rw---- 1 user group 121G Jul 12 05:43 table1.ibd
-rw-rw---- 1 user group 846G Jul 12 11:11 table2.ibd

To understand why those table did not shrink when deleting rows, we have to dive into some InnoDB internals. InnoDB stores data in pages (default size of 16 KiB). When a row is deleted from a page, free space is generated but the page still occupies its full size on disk and in RAM. When consecutive pages have enough free space, they are merged. A page-merge generates a free page that can be reused for future row insertion but free pages still occupy space on disk.

An approximation of the amount of space consumed by free pages can be observed by looking at the field Data_free from a SHOW TABLE STATUS output. Below, we can see the disk space consumed by free pages in table1 (converted to GiB with awk).

# sts="show table status like 'table1'\G"
# mysql my_schema <<< "$sts" | awk '$1=="Data_free:"{$2 /= 1024*1024*1024; print}'
Data_free: 117.709

This real-world example is a replicated environment, so running OPTIMIZE TABLE on the master would generate a lot of replication delay (and would lock the table). To reclaim this free space, I used the online schema change utility from the Percona Toolkit with an alter command of ENGINE=InnoDB. The table compaction took a little less than an hour and I have this when it completed:

# ls -lh table1.ibd
-rw-rw---- 1 user group 492M Jul 14 14:42 table1.ibd

So in this case, compaction allowed to save about 120 GiB of disk space. This is more than the 117.7 GiB advertised by the Data_free of the SHOW TABLE STATUS output, so I am concluding that the used pages had significant free space in them.

We are now done with table1, let's have a look at table2.

After deleting about 20% of rows in table2, the amount of Data_free was not significant. This means that we probably have a lot of free space inside InnoDB pages. At this point, it is unclear what to do next:

compacting with an online schema change will be long as the table is big, it will generate a lot of binary logs (RBR) and we do not know how much space we will save,
on the other hand, we could end up freeing significant space, so it might be worth it,
but as this table is big, and if the percentage of reclaimed space is small, we might be doing work and taking risks for a very marginal gain (operations on live system always come with risks).

To decide on the way forward, I ran an OPTIMIZE TABLE on a spare slave (I have a few). So I executed below:

# date; ls -lh table2.ibd; \
  mysql my_schema <<< "STOP SLAVE; OPTIMIZE TABLE LOCAL table2; START SLAVE"; \
  ls -lh table2.ibd; date
Wed Jul 12 11:17:29 CEST 2017
-rw-rw---- 1 user mysql 835G Jul 12 11:17 table2.ibd
Table             Op        Msg_type  Msg_text
my_schema.table2  optimize  note      Table does not support optimize, doing recreate + analyze instead
my_schema.table2  optimize  status    OK
-rw-rw---- 1 user group 640G Jul 13 10:30 table2.ibd
Thu Jul 13 10:30:45 CEST 2017

So it took a little more than 23 hours to compact table2 and it saved a little less than 200 GiB: not bad ! But at this point, the InnoDB pages are mostly full and the next row insertions in the table will generate page splits. Such splits happen when InnoDB tries to add rows in pages without enough free space. In this case, InnoDB splits the page generating two pages that are half-full. The new row is then put in the generated free space. This allows the row to be inserted but it produces about 16 KiB of free space in used pages.

Note that free space is not generated if rows are inserted in primary key order as the next row is always inserted at the end of the table. But this does not apply to indexes because INSERTs there are mostly done in random orders.

So even if we saved 200 GiB after running OPTIMIZE TABLE on table2, page-split could end up consuming that reclaimed space very fast. To confirm that this saving is not virtual, we have to observe the evolution of the size of the table on disk, and it is sadly growing:

# date; ls -lh table2.ibd
Thu Jul 13 10:46:51 CEST 2017
-rw-rw---- 1 user group 660G Jul 13 10:46 table2.ibd
# date; ls -lh table2.ibd
Thu Jul 13 12:56:08 CEST 2017
-rw-rw---- 1 user group 715G Jul 13 12:56 table2.ibd
# date; ls -lh table2.ibd
Fri Jul 14 10:06:55 CEST 2017
-rw-rw---- 1 user group 736G Jul 14 10:06 table2.ibd

So I decided that compacting table2 on all servers of this replicated system was not worth it.

5 comments:

LaurynasJuly 21, 2017 at 3:45:00 AM EDT
I wonder if Jeremy Cole's innodb_ruby could be used on a live instance directly to get the idea of potential OPTIMIZE savings: https://blog.jcole.us/2013/01/03/a-quick-introduction-to-innodb-ruby/ :

$ innodb_space -f test/t.ibd space-index-pages-summary | head -n 10
page index level data free records
3 15 2 26 16226 2
4 15 0 9812 6286 446
5 15 0 15158 860 689
6 15 0 10912 5170 496
7 15 0 10670 5412 485
8 15 0 12980 3066 590
9 15 0 11264 4808 512
10 15 0 4488 11690 204
11 15 0 9680 6418 440
Alexey KopytovJuly 21, 2017 at 4:56:00 AM EDT
As of MySQL 5.7, it is possible to control the page fill factor used by InnoDB when rebuilding indexes with OPTIMIZE TABLE. Which may help reduce page splits after rebuilding the table. However, that variable only applies to table rebuilds. I don't think it will make any difference for tools like pt-osc or gh-ost.

By default, innodb_fill_factor=100 is used, which results in the fill factor of 15/16 for the clustered index (to keep the pre-5.7 behavior) and the 100% fill factor for secondary index pages.
Alexey KopytovJuly 21, 2017 at 7:54:00 AM EDT
Cannot agree more, innodb_fill_factor is quite limited in its current implementation. Being able to specify it per-session or per-table would be useful. What would be even more useful is being able to control fill factor on all inserts, not just on bulk inserts aka index rebuilds. I hope to get around to implementing those features soon.

J-F Gagné's MySQL Blog

Wednesday, July 19, 2017

InnoDB Basics - Compaction: when and when not

5 comments: