Unexpected Memory Consumption for Bulk Index Creation in InnoDB (MySQL)

Saturday, August 29, 2015

Unexpected Memory Consumption for Bulk Index Creation in InnoDB (MySQL)

In my last Booking.com Hackathon, I worked on MyISAM vs InnoDB for data loading (LOAD DATA IN FILE) and bulk index creation. My motivation was the following: knowing that some are still using MyISAM for this particular use-case, I wanted to verify/understand if/why InnoDB is slower than MyISAM. I do not yet have complete results on this specific subject but I found some interesting things that are worth sharing.

The first result is that creating many indexes on an empty InnoDB table with a large value of innodb_sort_buffer_size takes more time than expected (more than 6 seconds for 20 indexes). I opened 2 bugs for MySQL 5.6.26 and MariaDB 10.0.21 for this particular issue. Of course, I do not often create indexes on empty tables, so this bug is very minor, but this raised an unexpected behavior of the InnoDB Sort Buffer. I wanted to know more...

There is not a lot of online information about the inner working of the InnoDB Sort Buffer. Beside the documentation (MySQL and MariaDB), I found a single blog post on this subject. All those describe memory consumption of 3 times the size of the InnoDB Sort Buffer plus some pointers, but my experiments show as much as 10 GB expansion in the mysqld RAM Resident Set Size (RSS) with innodb_sort_buffer_size=64M. Obviously, something strange is happening. I again opened 2 bugs for MySQL 5.6.26 and MariaDB 10.0.21 on this more concerning issue.

My tests start with the following table definition:

CREATE TABLE `test_wo_keys` (
  `f01` int AUTO_INCREMENT,
  `f02` bigint, `f03` bigint, `f04` enum('a','b'),
  `f05` date, `f06` int, `f07` int, `f08` double, `f09` int,
  `f10` bigint, `f11` double, `f12` enum('a','b','c','d','e'),
  `f13` int, `f14` int, `f15` varchar(255), `f16` int,
  `f17` int, `f18` int, `f19` double, `f20` double,
  `f21` double, `f22` double, `f23` double, `f24` tinyint,
  `f25` double, `f26` double, `f27` double, `f28` double,
  `f29` int unsigned, `f30` int unsigned, `f31` bigint,
  `f32` int unsigned, `f33` bigint, `f34` int unsigned,
  `f35` int unsigned,
  PRIMARY KEY `f01` (`f01`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

This table contains 5 million rows. I then create indexes on that table using the following commands:

ALTER TABLE test_wo_keys ADD KEY f02 (f02);

ALTER TABLE test_wo_keys
  ADD KEY f02 (f02), ADD KEY f03 (f03),
  ADD KEY f04 (f04), ADD KEY f05 (f05),
  ADD KEY f06 (f06), ADD KEY f07 (f07),
  ADD KEY f08 (f08), ADD KEY f09 (f09);

ALTER TABLE test_wo_keys
  ADD KEY f02 (f02), ADD KEY f03 (f03),
  ADD KEY f04 (f04), ADD KEY f05 (f05),
  ADD KEY f06 (f06), ADD KEY f07 (f07),
  ADD KEY f08 (f08), ADD KEY f09 (f09), 
  ADD KEY f10 (f10), ADD KEY f11 (f11),
  ADD KEY f12 (f12), ADD KEY f13 (f13),
  ADD KEY f14 (f14), ADD KEY f16 (f16),
  ADD KEY f17 (f17), ADD KEY f18 (f18),
  ADD KEY f19 (f19), ADD KEY f20 (f20),
  ADD KEY f21 (f21), ADD KEY f22 (f22),
  ADD KEY f23 (f23), ADD KEY f24 (f24),
  ADD KEY f25 (f25), ADD KEY f26 (f26),
  ADD KEY f27 (f27), ADD KEY f28 (f28),
  ADD KEY f29 (f29), ADD KEY f30 (f30),
  ADD KEY f31 (f31), ADD KEY f32 (f32),
  ADD KEY f33 (f33), ADD KEY f34 (f34);

The graphs below show that creating 1, 4 and 32 indexes consume respectively ~400 MB, ~2.7 GB and ~11 GB of memory with innodb_sort_buffer_size=64M. This is far from expected.

Note: the InnoDB Buffer Pool is set to 4 GB on this server and I made sure that it was fully loaded before starting my tests.

Note2: those graphs have been taken on MariaDB, but MySQL has a similar behavior (graph below for 32 indexes).

The advertisement from the documentation (3x InnoDB Sort Buffer Size) is obviously wrong. If you observe well on the 3 last graphs, you will see a small decrease on RSS on the right side of the graph: this is the 3x InnoDB Sort Buffer Size. The high consumption of RAM on the left of the graphs happens when the table is scanned to generate temporary files. Once the files are generated, RAM consumption goes back to acceptable levels and stabilizes while the files are sorted.

If you want more details, you can either code dive in the file storage/innobase/row/row0merge.cc of MySQL/MariaDB sources, or wait for my next post on the subject.

Before closing, let me take this opportunity to promote my next talks:

I’ll be giving a talk about Binlog Servers and another talk about Binary Logs in general at Percona Live Amsterdam. Feel free to grab me after the talk, catch me at Booking.com booth (# 205) or share a drink with me at the Community Dinner to exchange thoughts about Binlog Servers of other MySQL subjects.
I'll also be giving a talk about Binlog Servers at Oracle Open World in San Francisco at the end of October.
Finally, I'll also be animating a Birds-of-a-Feather (BoF) session at Oracle Open World: Riding the Binary Logs: Forthcoming Evolution in the Replication Stream.

Looking forward to seeing you in Amsterdam and/or in San Francisco.

15 comments:

UnknownAugust 30, 2015 at 1:01:00 AM EDT
Salut, Jean-François,

I am the primary developer of the InnoDB code in MySQL 5.6. It is improving on the ‘fast index creation’ feature that I developed for the InnoDB Plugin for MySQL 5.1.
Some of the issues that you mention were already fixed in MySQL 5.7 quite some time ago, as part of an internally-filed bug
Bug#17657223 EXCESSIVE TEMPORARY FILE USAGE IN ALTER TABLE
that was initially fixed about a year ago by my colleague Thirunarayanan Balathandayuth.

Yes, some of this could have been done already in 5.6, but I was under a time pressure to deliver correctly working online ALTER TABLE. To me, correctness was (and still is) more important than performance.

Unfortunately, the worst-case memory consumption cannot be helped easily. If we are adding multiple secondary indexes, we will need one set of buffers for each of them.

Roughly speaking, there are six forms of ALTER TABLE in MySQL 5.6 and later. They consist of the 3 main variants:

1. ALTER TABLE...ALGORITHM=COPY:
This is the only option before InnoDB Plugin, and the only option for ENGINE=MyISAM tables. This can be forced by SET old_alter_table=1;
2. ALTER TABLE...ALGORITHM=INPLACE, LOCK=NONE:
This is the default in MySQL 5.6. Some operations are refused with ALGORITHM=INPLACE, and some operations (such as DROP PRIMARY KEY without ADD PRIMARY KEY) require LOCK=SHARED. Note: the word ‘inplace’ is somewhat misleading, because the table may need to be rebuilt.
3. ALTER TABLE...ALGORITHM=INPLACE, LOCK=SHARE:
An ‘inplace’ ALTER that does not allow concurrent data modification. This will reduce memory overhead, because no log for concurrent DML operations will be allocated or applied.

ALGORITHM=COPY will include both undo and redo logging overhead. Internally, InnoDB treats the operation as a combination of CREATE TABLE, INSERT, RENAME TABLE, DROP TABLE, RENAME TABLE.

ALGORITHM=INPLACE since 5.6 will skip the undo logging overhead, but everything will be redo-logged. In 5.7, my colleague Shaohua Wang implemented WL#7277 avoids the redo logging for the initial index build phase, but not for the DML log apply (row0log.cc). WL#7277 also speeds up the loading of the merge-sorted index data into the B-trees. Before that, the sorted records were inserted one by one.

Now, there are three subvariants of ALGORITHM=INPLACE:
(a) Schema-only modifications, such as renaming a column or index, or extending a VARCHAR column.
(b) WL#5526: Operations that require data changes, but no rebuild of the whole table: ADD INDEX, DROP INDEX
(c) WL#6255: Operations that require rebuilding the whole table, e.g., ADD/DROP COLUMN, ADD PRIMARY KEY, change column order.

The schema-only changes (a) are basically instantaneous (requiring an exclusive table lock during the ‘commit’ phase only).

The DROP INDEX case of (b) is basically instantaneous too: the index B-trees will be dropped in the ‘commit’ phase.

The ADD INDEX case of (b) will allocate a sort buffer for every index that is being built. If the operation is performed with LOCK=NONE (the default), we will also allocate index->online_log for each index that is being built. The above mentioned bug fix in MySQL 5.7 will avoid some of this overhead.

The table-rebuilding ALTER operations will allocate a sort buffer for each index that will be built, plus a single index->online_log for the table rows. With the bug fix in 5.7, we can skip sorting of the clustered index if the order of the PRIMARY KEY does not change. There have been at least 2 bugs in this skip-sort functionality. The second fix was after the 5.7.8 release.

Because some regressions take considerable time to surface in our ongoing tests, we are reluctant to backport this kind of huge fixes to GA versions, even in those cases when patches themselves could be applied with minor technical effort.

With best regards,
Marko Mäkelä
Senior Principle Software Developer
InnoDB Group, MySQL Server, Oracle Corporation
ReplyDelete
Replies
UnknownAugust 30, 2015 at 1:08:00 AM EDT
Oh, by the way, InnoDB does not yet implement any bulk-loading interface other than the internal one that WL#7277 introduced for ALTER TABLE…ALGORITHM=INPLACE.

So, LOAD DATA INFILE will include both undo and redo logging overhead, and it will insert data into the secondary indexes in ‘random’ order.

We are aware of the problem, but I cannot make any promises when it could be fixed. These things (especially crash recovery) are simpler in MyISAM, because in MyISAM, the keys are stored separately from the data hash, and there is no system tablespace for MyISAM. Users also do not expect any durability or consistency from MyISAM. In InnoDB, proper recovery from a server kill during a fast-path LOAD DATA INFILE would require some changes to the data dictionary and crash recovery.
ReplyDelete
Replies
Jean-François GagnéAugust 31, 2015 at 3:11:00 PM EDT
Oh, and I am LOADING data that fits in RAM (at at least the indexes fit in RAM) so I do not have to pay read IO latency during loading.
ReplyDelete
Replies
Nickole DinardoDecember 29, 2015 at 9:25:00 AM EST
I see that post is very informative, thanks for sharing thoughts.
ReplyDelete
Replies

Add comment