Friday, January 13, 2017

Funny replication breakage of Friday, January 13

A funny replication breakage kept me at the office longer than expected today (Friday 13 is not kind with me).

So question of the day: can you guess what the below UPDATE statement does (or what is wrong with it) ?
      > CREATE TABLE test_jfg (
           id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
           status ENUM('a','b') NOT NULL DEFAULT 'a',
           txt TEXT);
      Query OK, 0 rows affected (0.00 sec)

      > INSERT INTO test_jfg (txt) VALUES ('hello world');
      Query OK, 1 row affected (0.04 sec)

      > SELECT * FROM test_jfg;
      +----+--------+-------------+
      | id | status | txt         |
      +----+--------+-------------+
      |  1 | a      | hello world |
      +----+--------+-------------+
      1 row in set (0.00 sec)

      > UPDATE test_jfg SET txt='' AND status='b' WHERE id=1;
      ...
Hint: it works on MariaDB 10.0 (but probably does not do what you think/want), but fails on MariaDB 10.1.

This hit me this afternoon: my MariaDB 10.1 slaves have replication broken, but this passes through MariaDB 10.0 intermediate masters and executes OK-ish on the master.

Note: this replication chain is running statement-based replication (I think row-based replication would not have broken, which might make me start to prefer RBR to SBR).

So this is what you get on MariaDB 10.0:
      > UPDATE test_jfg SET txt='' AND status='b' WHERE id=1;
      Query OK, 1 row affected, 1 warning (0.12 sec)
      Rows matched: 1  Changed: 1  Warnings: 1

      > SELECT * FROM test_jfg;
      +----+--------+------+
      | id | status | txt  |
      +----+--------+------+
      |  1 | a      | 0    |
      +----+--------+------+
      1 row in set (0.00 sec)
But wait, there was a warning there, was does it say?
      > UPDATE test_jfg SET txt='' AND status='b' WHERE id=1;
      Query OK, 1 row affected, 1 warning (0.00 sec)
      Rows matched: 1  Changed: 1  Warnings: 1

      > SHOW WARNINGS;
      +---------+------+---------------------------------------+
      | Level   | Code | Message                               |
      +---------+------+---------------------------------------+
      | Warning | 1292 | Truncated incorrect INTEGER value: '' |
      +---------+------+---------------------------------------+
      1 row in set (0.00 sec)
And as you saw above, the expected value in the status and txt columns are not good.

So what is the problem ?  A simple SQL error: there should be a comma instead of an AND!  Because of this error, the query evaluated as follows:
      > UPDATE test_jfg SET txt= ( '' AND status='b' ) WHERE id=1;
Just for information, this is the error you get on a MariaDB 10.1 slave:
      Last_Error: Error 'Truncated incorrect DOUBLE value: ''' on query.
      Default database: 'test'.
      Query: 'UPDATE test_jfg SET txt='' AND status='b' WHERE id=1'
And this is the error you get when running the statement on MariaDB 10.1:
      > UPDATE test_jfg SET txt='' AND status='b' WHERE id=1;
      ERROR 1292 (22007): Truncated incorrect DOUBLE value: ''
The moral or this story, please do not ignore warnings, and this will keep the DBAs happy!

I now need to chase the guy that pushed this statement on the master, and I need to keep telling people to always investigating/fixing warnings in MySQL/MariaDB.

Side note: I am happy that MariaDB 10.1 will not accept this syntax, but I do not know if this is a bug in 10.0 (parser too permissive) or in 10.1 (parser rejecting valid SQL).  Thoughts?

Final note: I have not tested this with MySQL 5.6, 5.7 and 8.0.  If someone does, please be kind enough to post results in the comments.  Now I need to leave the office and go home.

UPDATE: some results of this with Oracle MySQL in a new post.

UPDATE2: MDEV-11787 - Inconsistent behavior on bad query between MariaDB 10.0 and 10.1 - is opened to track this issue.

7 comments:

  1. We see this in DML tickets coming our way all the time. The DML sandbox tool we have written catches these and reject them :-)

    ReplyDelete
    Replies
    1. Yes, it looks like a very common thing if you look at "Truncated incorrect DOUBLE value" on Google.

      Delete
  2. Is it really a good thing to keep two different versions for master/slave machines?

    ReplyDelete
    Replies
    1. At some point, you need to upgrade, and having a slave in the new version allows you to test things. When I (we) am (are) testing, I am ready to throw away slaves in a newer version, but this allows me to detect things that would break in the new version, understand incompatible changes like this one, report bugs, and fix the application. So for me, testing with slaves with a different version of the master is a good thing.

      Delete
  3. Hi!
    Admitted, I didn't check the SQL standard for this (Peter Gulutzan would know off-hand).
    Still, this seems to be a consequence of that old MySQL preference for "ease of use" (read: dirty programming).
    IMNSHO, an "AND" combining a character string (without cast) and any other type makes little sense, and it should have caused a syntax error from the very beginning.
    However, the original developer(s) opted for "ease of use" by implicit conversion (from string to number), and you are now suffering the pains of healing.
    Jörg Brühe

    ReplyDelete
    Replies
    1. In MDEV-11787, it is written that this is valid SQL. My opinion it that is should produce a "Dangerous SQL" warning that could be avoided by putting parentheses around the AND statement.

      Delete
    2. Thanks for the pointer!
      In that report, I didn't find the claim it is valid SQL - I only found the sentence "perhaps it is possible that this is valid SQL".
      Of course, part of the problem is SQL's double use of the equals sign '=' as both an assignment and a comparison operator, to which all implementations must adhere.
      A warning "dangerous SQL" is an interesting idea. OTOH, I often feel that sloppy programmers tend to ignore warnings, so I fear it won't help.
      We agree this is bad code, but on which part exactly would you base the warning?

      Further down, I found this:
      "The above query casts an empty string to BOOLEAN, which evaluates through testing if its DOUBLE value is not zero."
      And that's my main objection: I don't think the SQL standard requires such an implicit conversion, I am sure the old (1980s) X/Open SQL standard doesn't, it makes a strict distinction between string and numeric values.
      I rather assume the implicit conversion is a MySQL (Monty Widenius) invention for "ease of use".

      Without this cast, the "AND" would combine a string (empty) and a truth value (checking the column "status" for a specific value), this type mix could be considered "dangerous" or illegal (I certainly do that).
      With this cast, it combines two numeric values, in which I see no danger.

      That's just one example why I consider this cast as a modern equivalent of opening Pandora's box in Greek mythology.
      Regards, Jörg

      Delete