Tuesday, November 30, 2021

Follow-up on an Unprivileged User can Crash your MySQL Server

A year ago, I blogged about An Unprivileged User can Crash your MySQL Server.  At the time, I presented how to protect yourself against this problem without explaining how to generate a crash.  In this post, I am revisiting this vulnerability, not giving the exploit yet, but presenting the fix.  Also, because the default configuration of Group Replication in 5.7 is still vulnerable (it is not in 8.0), I show the adjustment to make to avoid problems.

Update 2022-01-19: I published the exploit in a follow-up post: Crashing MySQL with Malicious Intent and a lot of Determination.

So MySQL 8.0 and 5.7 had to a crashing bug in replication: the 8.0 default configuration was vulnerable (up to 8.0.22), Group Replication in 5.7 was vulnerable (up to 5.7.32) and the default configuration of Group Replication in 5.7 still is vulnerable (at least up to 5.7.36), and write set optimized parallel replication in 5.7 was also vulnerable (up to 5.7.32).  I am referring you to my previous post for mitigation details.

I think this vulnerability was registered under CVE-2021-2002 (I am not absolutely sure because not a lot of details are disclosed in the report).  If this is the right CVE, it is probably incomplete because it only mentions pre-8.0.22 as vulnerable, and Group Replication in 5.7 is also vulnerable (100% vulnerable up to and including 5.7.32, as is the default configuration of 5.7.33 up to at least 5.7.36).

This vulnerability was fixed in MySQL 5.7.33 and 8.0.23.  In the release notes (links to 5.7.33 and 8.0.23), we can find (line breaks added for readability, and text in square brackets added for completeness):

Replication: When the system variable transaction_write_set_extraction=XXHASH64 is set, which is the default in MySQL 8.0 and a requirement for Group Replication [and a requirement for write set optimized parallel replication], the collection of writes for a transaction previously had no upper size limit. 

Now, for standard source to replica replication, the numeric limit on write sets specified by binlog_transaction_dependency_history_size is applied, after which the write set information is discarded but the transaction continues to execute.  Because the write set information is then unavailable for the dependency calculation, the transaction is marked as non-concurrent, and is processed sequentially on the replica. 

For Group Replication, the process of extracting the writes from a transaction is required for conflict detection and certification on all group members, so the write set information cannot be discarded if the transaction is to complete.  The byte limit set by group_replication_transaction_size_limit is applied instead of the numeric limit, and if the limit is exceeded, the transaction fails to execute. (Bug #32019842)

The description from the release notes is very clear about the origin of the bug: there were no limit to the "collection of writes for a transaction", also known as Write Set.  So by crafting a big transaction, it was possible to cause an Out-Of-Memory, which risks triggering the OOM-killer (another failure mode is a SIGABRT which would write "mysqld got signal 6" in the error log, full stack-trace at the end of the post).  The fix for standard replication is to stop saving the transaction write set once it grows larger than binlog_transaction_dependency_history_size.  As explained in the release notes, this is of little consequences for standard replication because there already is a limit at the global level, specified by binlog_transaction_dependency_history_size, for bounding RAM used for dependency / interval calculation (I wrote about this in section "Write Set tracking for parallel replication" of An update on Write Set bug fix in MySQL 8.0).

As explained in the release notes, it is not possible to stop recording the write set for Group Replication because it is important for conflict detection in a multi-master group (I wrote about this in Write Set in MySQL 5.7: Group Replication).  A configurable limit to the memory used by the write set was introduced, after which the transaction fails, and this limit can be changed with the global variable group_replication_transaction_size_limit.  The default value in 8.0 is close to 150 MB (it is exactly 150,000,000), but it is 0 in 5.7 which means no limit (at least from 5.7.33 to 5.7.36, see the documentation for no details about the different default).  This means the default configuration of Group Replication in 5.7 is still vulnerable to the crashing bug.

Group Replication in 5.7 is still vulnerable
because it does not enforce a limit by default

I do not know why an unsafe default was chosen in 5.7 (if you have ideas, please share them in the comments).  The default in 8.0 introduces a new failure condition; but if it is considered acceptable for 8.0, it should also be for 5.7-latest.  The different defaults also introduce a potential breakage while upgrading from 5.7 to 8.0: something that succeeded in 5.7 might fail in 8.0.  If you are using Group Replication in 5.7, it is probably a good idea to set the value of group_replication_transaction_size_limit to the default of 8.0 (150,000,000) or to a non-zero value.  Note that this might cause regressions in your application (transaction that succeeded before the change might fail after), but it is probably better to find this now while having full context than when upgrading to 8.0.  Also, if you hit the limit and a transaction fails, the fix is simple: increase the value of the variable (or make your transactions smaller).

For reference, the error message returned when a transaction failed because reaching group_replication_transaction_size_limit is below (there is already a post on dba.stackenchange about this).

ERROR 3231 (HY000) at line 73: The size of writeset data for the current transaction exceeds a limit imposed by an external component. If using Group Replication check 'group_replication_transaction_size_limit'.

Interestingly and from what I know, this is the first and only situation in which MySQL is aborting a transaction because of its size.  I find this failure / feature interesting and there is more to write about this.  However, this is not the current subject, so I will keep this for a future post.

About the default configuration of Group Replication in 5.7 still being vulnerable, I opened Bug#105759: Unsafe default in 5.7 for group_replication_transaction_size_limit.  I did not flag this bug as a security vulnerability, because it would have made it private, and IMHO there were no reasons to hide its content: all information about this - release notes, MySQL documentation and this post - are already public.  UPDATE: the bug was since then made private.

This is all for now.  In a few weeks, I will probably publish the exploit for this bug, so if you feel uneasy about this, either upgrade and fix the configuration if you are using Group Replication in 5.7, or apply the mitigation from my previous post.  And as promised, this vulnerability behavior is exposed below, the different results depending on MySQL version and Operating System (my two test environments are an out-of-the-box AWS Debian 10.11 vm and a hosted CentOS 6.10 vm which produce more OOM than SIGABRT).

Update 2022-01-19: I published the exploit in a follow-up post: Crashing MySQL with Malicious Intent and a lot of Determination.

In 5.7.32, I sometimes get the stack-trace below in the error log with standard replication, a similar stack-trace for Group Replication, and sometimes an OOM.

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
00:58:04 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=1
max_threads=151
thread_count=1
connection_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 68196 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f1828000d40
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f186c275e88 thread_stack 0x40000
/home/jgagne/opt/mysql/mysql_5.7.32/bin/mysqld(my_print_stacktrace+0x35)[0xf8e995]
/home/jgagne/opt/mysql/mysql_5.7.32/bin/mysqld(handle_fatal_signal+0x4b9)[0x802e49]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f1876c3e730]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7f18767237bb]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x121)[0x7f187670e535]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8c983)[0x7f1876ad6983]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0x928c6)[0x7f1876adc8c6]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92901)[0x7f1876adc901]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92b34)[0x7f1876adcb34]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9301c)[0x7f1876add01c]
/home/jgagne/opt/mysql/mysql_5.7.32/bin/mysqld(_ZNSt6vectorIySaIyEE13_M_insert_auxEN9__gnu_cxx17__normal_iteratorIPyS1_EERKy+0xdd)[0xcc189d]
/home/jgagne/opt/mysql/mysql_5.7.32/bin/mysqld(_ZN29Rpl_transaction_write_set_ctx13add_write_setEy+0x137)[0xcc0527]
/home/jgagne/opt/mysql/mysql_5.7.32/bin/mysqld(_Z7add_pkeP5TABLEP3THD+0xa8a)[0xcc30ca]
/home/jgagne/opt/mysql/mysql_5.7.32/bin/mysqld(_Z14binlog_log_rowP5TABLEPKhS2_PFbP3THDS0_bS2_S2_E+0x36b)[0x8545bb]
/home/jgagne/opt/mysql/mysql_5.7.32/bin/mysqld(_ZN7handler12ha_write_rowEPh+0x9b)[0x85493b]
/home/jgagne/opt/mysql/mysql_5.7.32/bin/mysqld(_Z12write_recordP3THDP5TABLEP9COPY_INFOS4_+0xa7)[0xedc157]
/home/jgagne/opt/mysql/mysql_5.7.32/bin/mysqld(_ZN19Query_result_insert9send_dataER4ListI4ItemE+0xae)[0xedcc8e]
/home/jgagne/opt/mysql/mysql_5.7.32/bin/mysqld[0xd1f8a8]
/home/jgagne/opt/mysql/mysql_5.7.32/bin/mysqld[0xd24d9d]
/home/jgagne/opt/mysql/mysql_5.7.32/bin/mysqld(_Z10sub_selectP4JOINP7QEP_TABb+0x323)[0xd25ec3]
/home/jgagne/opt/mysql/mysql_5.7.32/bin/mysqld(_ZN4JOIN4execEv+0x27a)[0xd2538a]
/home/jgagne/opt/mysql/mysql_5.7.32/bin/mysqld(_Z12handle_queryP3THDP3LEXP12Query_resultyy+0x250)[0xd907a0]
/home/jgagne/opt/mysql/mysql_5.7.32/bin/mysqld(_ZN21Sql_cmd_insert_select7executeEP3THD+0x42a)[0xedb97a]
/home/jgagne/opt/mysql/mysql_5.7.32/bin/mysqld(_Z21mysql_execute_commandP3THDb+0xe64)[0xd51bb4]
/home/jgagne/opt/mysql/mysql_5.7.32/bin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x3bd)[0xd5609d]
/home/jgagne/opt/mysql/mysql_5.7.32/bin/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0x1752)[0xd57892]
/home/jgagne/opt/mysql/mysql_5.7.32/bin/mysqld(_Z10do_commandP3THD+0x194)[0xd58464]
/home/jgagne/opt/mysql/mysql_5.7.32/bin/mysqld(handle_connection+0x2ac)[0xe2b80c]
/home/jgagne/opt/mysql/mysql_5.7.32/bin/mysqld(pfs_spawn_thread+0x174)[0x13fb594]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3)[0x7f1876c33fa3]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f18767e54cf]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f1828004860): <hiding query>
Connection ID (thread ID): 6
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

In 8.0.22, I sometimes get the below error message with standard replication, a similar message for Group Replication, and sometimes an OOM.

ERROR 1041 (HY000) at line 98: Out of memory; check if mysqld or some other process uses all available memory; if not, you may have to use 'ulimit' to allow mysqld to use more memory or you can add more swap space

In 5.7.36 and only with Group Replication, I am also sometimes getting the above error message and sometimes an OOM, no SIGABRT.

In 8.0.21, I sometimes get the stack-trace below in the error log with standard replication, sometimes an OOM, and Group Replication is only getting OOM (I was not able to get a SIGABRT with GR).  As I was not getting the same results as with 8.0.22, I guessed something changed, and after looking up, I found this in the release notes: "Replication: Group Replication's handling of memory allocation issues when adding transaction write sets has been improved. (Bug #31586243)".  I guess such a fix has also been done between 5.7.32 and 5.7.36, but I am not finding a reference to Bug#31586243 in the release notes.

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
00:20:34 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x7fc64016a7e0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fc6b423bdc8 thread_stack 0x46000
/home/jgagne/opt/mysql/mysql_8.0.21/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x2e) [0x1861794]
/home/jgagne/opt/mysql/mysql_8.0.21/bin/mysqld(handle_fatal_signal+0x15d) [0x10906a2]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730) [0x7fc6c498d730]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b) [0x7fc6c3caa7bb]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x121) [0x7fc6c3c95535]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8c983) [0x7fc6c405f983]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0x928c6) [0x7fc6c40658c6]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92901) [0x7fc6c4065901]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92b34) [0x7fc6c4065b34]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9301c) [0x7fc6c406601c]
/home/jgagne/opt/mysql/mysql_8.0.21/bin/mysqld(void std::vector<unsigned long, std::allocator<unsigned long> >::_M_realloc_insert<unsigned long const&>(__gnu_cxx::__normal_iterator<unsigned long*, std::vector<unsigned long, std::allocator<unsigned long> > >, unsigned long const&)+0x8e) [0x11fb016]
/home/jgagne/opt/mysql/mysql_8.0.21/bin/mysqld(Rpl_transaction_write_set_ctx::add_write_set(unsigned long)+0x35) [0x123ecd3]
/home/jgagne/opt/mysql/mysql_8.0.21/bin/mysqld(add_pke(TABLE*, THD*, unsigned char*)+0x5b6) [0x123f9c5]
/home/jgagne/opt/mysql/mysql_8.0.21/bin/mysqld(binlog_log_row(TABLE*, unsigned char const*, unsigned char const*, bool (*)(THD*, TABLE*, bool, unsigned char const*, unsigned char const*))+0x74) [0x11386b8]
/home/jgagne/opt/mysql/mysql_8.0.21/bin/mysqld(handler::ha_write_row(unsigned char*)+0x135) [0x1138abd]
/home/jgagne/opt/mysql/mysql_8.0.21/bin/mysqld(write_record(THD*, TABLE*, COPY_INFO*, COPY_INFO*)+0xa75) [0x1269689]
/home/jgagne/opt/mysql/mysql_8.0.21/bin/mysqld(Query_result_insert::send_data(THD*, List<Item>&)+0xcf) [0x126a237]
/home/jgagne/opt/mysql/mysql_8.0.21/bin/mysqld(SELECT_LEX_UNIT::ExecuteIteratorQuery(THD*)+0x310) [0x1046558]
/home/jgagne/opt/mysql/mysql_8.0.21/bin/mysqld(SELECT_LEX_UNIT::execute(THD*)+0x3a) [0x1046668]
/home/jgagne/opt/mysql/mysql_8.0.21/bin/mysqld(Sql_cmd_dml::execute_inner(THD*)+0x310) [0xffdb3a]
/home/jgagne/opt/mysql/mysql_8.0.21/bin/mysqld(Sql_cmd_dml::execute(THD*)+0x440) [0x10021b0]
/home/jgagne/opt/mysql/mysql_8.0.21/bin/mysqld(mysql_execute_command(THD*, bool)+0x158a) [0xfc7c2a]
/home/jgagne/opt/mysql/mysql_8.0.21/bin/mysqld(mysql_parse(THD*, Parser_state*)+0x2bc) [0xfca60f]
/home/jgagne/opt/mysql/mysql_8.0.21/bin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0xa11) [0xfcb356]
/home/jgagne/opt/mysql/mysql_8.0.21/bin/mysqld(do_command(THD*)+0x13c) [0xfcc76c]
/home/jgagne/opt/mysql/mysql_8.0.21/bin/mysqld() [0x10884d7]
/home/jgagne/opt/mysql/mysql_8.0.21/bin/mysqld() [0x1b3a44a]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3) [0x7fc6c4982fa3]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7fc6c3d6c4cf]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fc640b06de8): <hiding query>
Connection ID (thread ID): 12
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash

No comments:

Post a Comment