Monday, June 26, 2023

Rows Examined Blindspot when Looking for non-Existing Data

When trying to understand queries in the slow log, an interesting metric to look at is rows examined.  Also, when trying to understand CPU spikes on a MySQL instance, the InnoDB Metric dml_reads and the global status Innodb_rows_read are interesting to check.  In a similar way, when trying to gather information about which queries are loading the system, SUM_ROWS_EXAMINED from the Performance Schema table events_statements_summary_by_digest is interesting to analyze.  But rows examined might not mean what you think, and I give an example in this post.

TL&DR: in MySQL and from what I understand, rows examined means rows returned by the storage engine, even if the name would lead you to think that it is rows read.  A consequence of this is that a query looking for non-existing data has 0 rows examined, and this is what I call in this post a rows examined blindspot.

In a previous post, I already touch on the subject of Rows Examined not Trustworthy because of Index Condition Pushdown.  In that post, I write:

  • The bad reporting of rows examined introduces a blindspot in MySQL, making query optimization more complicated and server load harder to assess,
  • With Index Condition Pushdown, a single access to the storage engine can examine more than one row.

In that previous post, I also quickly describe the Handler / Storage Engine Architecture, and I assume you know about it in this post.  So if you are not familiar with this notion — including Storage Engine Accesses — or want to refresh your understanding, you should pause reading this post.

For exploring the rows examined blindspot when looking for non-existing data, I use a dbdeployer sandbox and the sparse table created in annex #1.  Let's first see what happens for a query looking at existing data.

In the script below, we have a query looking for 5 existing rows by primary key.  We have what we expect: 5 calls to the storage engine and 5 rows examined in the slow log.

./use test_jfg --table <<< "
  TRUNCATE mysql.slow_log;
  FLUSH STATUS;
  SELECT * FROM t WHERE id IN (2,4,6,8,10);
  SHOW STATUS LIKE 'Handler_read_%';
  SELECT rows_sent, rows_examined, sql_text
    FROM mysql.slow_log WHERE sql_text LIKE '%SELECT%'"
+----+---+----+
| id | a | b  |
+----+---+----+
|  2 | 1 |  2 |
|  4 | 2 |  4 |
|  6 | 3 |  6 |
|  8 | 4 |  8 |
| 10 | 5 | 10 |
+----+---+----+
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 5     |
| Handler_read_last     | 0     |
| Handler_read_next     | 0     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
+-----------+---------------+------------------------------------------+
| rows_sent | rows_examined | sql_text                                 |
+-----------+---------------+------------------------------------------+
|         5 |             5 | SELECT * FROM t WHERE id IN (2,4,6,8,10) |
+-----------+---------------+------------------------------------------+

Things become unexpected when we are looking for non-existing data.  In the script below, we have a query looking for 5 inexistent rows.  We still have 5 calls to the storage engine, but this time we have no rows examined in the slow log.  This is problematic because this does not represent the real cost of this query.  For that, I opened Bug#111570: Looking for inexistent data reports 0 rows examined.

./use test_jfg --table <<< "
  TRUNCATE mysql.slow_log;
  FLUSH STATUS;
  SELECT * FROM t WHERE id IN (1,3,5,7,9);
  SHOW STATUS LIKE 'Handler_read_%';
  SELECT rows_sent, rows_examined, sql_text
    FROM mysql.slow_log WHERE sql_text LIKE '%SELECT%'"
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 5     |
| Handler_read_last     | 0     |
| Handler_read_next     | 0     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
+-----------+---------------+-----------------------------------------+
| rows_sent | rows_examined | sql_text                                |
+-----------+---------------+-----------------------------------------+
|         0 |             0 | SELECT * FROM t WHERE id IN (1,3,5,7,9) |
+-----------+---------------+-----------------------------------------+

As shown in annex #2, the rows examined blindspot of the above query is also in the Performance Schema table events_statements_summary_by_digest, in the InnoDB Metric dml_reads and in the global status Innodb_rows_read.  However and contrarily to the Index Condition Pushdown blindspot, it is not in the Performance Schema table table_io_waits_summary_by_table.

ps-collect

As I just mentioned the tables events_statements_summary_by_digest and table_io_waits_summary_by_table, this is a good place for opening a parenthesis about what I name ps-collect.  This is also relevant because the impact of rows examined blindspots is high for what I named ps-collect-digest.

A few months ago, I designed ps-collect: a Performance Schema Metric Collection Tool.  It is inspired by a MySQL Flight Recorder used at a previous employer, and by the ps-top tool written by Simon Mudd: ps-top allows to see real-time metrics on a running system, and combining this with a flight recorder, we can see historical data, which is useful for troubleshooting after an event occurred.  For now, I only built the ps-collect-digest part of the tool, which is collecting metrics from the table events_statements_summary_by_digest.  I mentioned this in my Percona Live talk MySQL and Vitess at HubSpot.

A dedicated post on ps-collect-digest will discuss all the interesting metrics in the digest table, but one of them is rows examined.  Because of the two rows examined blindspots, the rows examined data reported by ps-collect-digest is not as useful as it could be.  Luckily, the blindspot for non-existent data is a little visible in the Performance Schema table table_io_waits_summary_by_table, and this makes me think that I should implement ps-collect-table.  I also have plans for ps-collect-file and ps-collect-mutex, but we digress from the subject of this post.  Let's just remember for now that the rows examined blindspots make ps-collect-digest not as useful as it could be.

Back at the rows examined blindspot...

So far, we saw a query doing 5 primary key lookups showing 0 rows examined while still looking at 5 rows in the storage engine.  This might not sound that bad, but it could easily take larger proportions.  With a more complex where-clause, we can easily trigger a situation generating more calls to the storage engine.  Below is an example of such a situation: a query with a where-clause that contains an AND of two IN with 4 values each, generating 16 calls to the storage engine.

./use test_jfg --table <<< "
  TRUNCATE mysql.slow_log;
  FLUSH STATUS;
  SELECT * FROM t WHERE a IN (1,2,3,4) AND b IN (1,3,5,7);
  SHOW STATUS LIKE 'Handler_read_%';
  SELECT rows_sent, rows_examined, sql_text
    FROM mysql.slow_log WHERE sql_text LIKE '%SELECT%'"
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 16    |
| Handler_read_last     | 0     |
| Handler_read_next     | 0     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
+-----------+---------------+---------------------------------------------------------+
| rows_sent | rows_examined | sql_text                                                |
+-----------+---------------+---------------------------------------------------------+
|         0 |             0 | SELECT * FROM t WHERE a IN (1,2,3,4) AND b IN (1,3,5,7) |
+-----------+---------------+---------------------------------------------------------+

In below, with an AND of two IN with 16 values each, we generate 256 calls to the storage engine.

a="$(seq 1 16   | paste -s -d ,)"; \
b="$(seq 1 2 32 | paste -s -d ,)"; \
./use test_jfg --table <<< "
  TRUNCATE mysql.slow_log;
  FLUSH STATUS;
   SELECT * FROM t WHERE a IN ($a) and b IN ($b);
  SHOW STATUS LIKE 'Handler_read_%';
  SELECT rows_sent, rows_examined, sql_text
    FROM mysql.slow_log WHERE sql_text LIKE '%SELECT%'"
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 256   |
| Handler_read_last     | 0     |
| Handler_read_next     | 0     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------+
| rows_sent | rows_examined | sql_text                                                                                                                  |
+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------+
|         0 |             0 | SELECT * FROM t WHERE a IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) and b IN (1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31) |
+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------+

Running the query above 10 thousand times, we can easily saturate one CPU for a little over 30 seconds on my test vm (depending on the CPU speed of your test environment, the timing might be different (this is run on an AWS EC2 M6i instance — m6i.4xlarge to be precise — which is advertised as 3rd Generation Intel Xeon Scalable processors).

a="$(seq 1 16   | paste -s -d ,)"; \
b="$(seq 1 2 32 | paste -s -d ,)"; \
sql="SELECT * FROM t WHERE a IN ($a) and b IN ($b);"; \
time yes "$sql" | head -n 100000 | ./use test_jfg

real    0m33.280s
user    0m0.823s
sys     0m1.123s

And running above in 8 subprocesses in parallel saturates 8 CPUs.

( a="$(seq 1 16   | paste -s -d ,)"
  b="$(seq 1 2 32 | paste -s -d ,)"
  sql="SELECT * FROM t WHERE a IN ($a) and b IN ($b);"
  for i in {1..8}; do
    yes "$sql" | head -n 100000 | ./use test_jfg &
  done
  wait; )

Let me insist again: even if we have a MySQL instance working very hard consuming a lot of CPU, rows examined in the slow log is zero, SUM_ROWS_EXAMINED in the Performance Schema table events_statements_summary_by_digest for the matching digest is not incremented, nor are the InnoDB Metric dml_reads and the global status Innodb_rows_read.

For me, this and the Index Condition Pushdown rows examined blindspot lead me to think that the current rows examined metric of MySQL needs to be replaced / improved with something new.

Random and Sequential Rows Examined

What I suggest is two new counters: random rows examined and sequential rows examined.  These would work as described below:

  • A primary key lookup increments random rows examined (many lookups in a single query would increment the counter many times),
  • A covering index lookup also increments random rows examined,
  • Looking at the next element in a range scan on the clustered index (primary key) or a secondary index increments sequential rows examined,
  • A secondary index lookup leading to a primary key lookup (not index covering) also increments random rows examined (in addition to incrementing random rows examined if this is the first access to the index, or in addition to incrementing sequential rows examined if this is part of a range scan on the index),
  • The first Index Condition Pushdown (ICP) storage engine call increments random rows examined (like for a covering index lookup), then increments sequential rows examined for all the next rows scanned in the storage engine for finding the condition (potentially zero if the first lookup matches the condition), and if this leads a condition match (there is also the potential for no match), it increments the random rows examined counter (for looking at the primary key),
  • The following ICP calls increment the sequential rows examined counter for all the rows not matching the condition, and if this leads a condition match, it increments the random rows examined counter (for looking at the primary key).

This is described in Bug#111571: Provide improved counters for solving rows examined blindspots.  There are probably things I am missing in the implementation of these counters (I am thinking of how they should apply to LSM like MyRocks, and maybe how they would surface InnoDB Undo Log applies for looking at old versions of rows), feel free to share thoughts in the comments of feature request or of this post.

In the meantime, until this feature request is implemented, the only way we can have a little visibility on the load generated by queries for non-existent data is by looking at the Handler_read statuses and at the Performance Schema table table_io_waits_summary_by_table.  Do not forget to look at these when trying to understand CPU spikes on your MySQL instances: I was only looking at the global status Innodb_rows_read before, and I was sometimes missing good stuff !


Annex #1: Generating a dataset for our tests

# Create a sandbox with the slow query log enabled in a table.
dbdc="-c slow_query_log=ON -c log_output=TABLE -c long_query_time=0"
dbdc="$dbdc -c innodb_buffer_pool_size=$((1024*1024*1024))"
dbdeployer deploy single mysql_8.0.33 $dbdc

# Create the schema.
./use <<< "
   CREATE DATABASE test_jfg;
   CREATE TABLE test_jfg.t (
      id INTEGER NOT NULL PRIMARY KEY,
      a INTEGER NOT NULL,
      b INTEGER NOT NULL,
      KEY (a, b))"

# Insert 1000 rows in the table with skipping every other values for id.
# This is done in such a way that the a column is set to id/2
#   and the b column is set to id.
seq 1 1000 |
  awk '{print "(" $1*2, $1, $1*2  ")"}' |
  tr " " "," | paste -s -d "$(printf ',%.0s' {1..100})\n" |
  sed -e 's/.*/INSERT INTO t values &;/' |
  ./use test_jfg
  
# Recompute statistics on the table.
# If not done, things could get weird, but I do not have time to explain why here.
./use test_jfg <<< "ANALYZE TABLE t" > /dev/null

Annex #2: the Problem is also in Performance Schema and InnoDB

# The query, with truncating the digest table before and checking it after.
./use performance_schema <<< "TRUNCATE events_statements_summary_by_digest"; \
  ./use test_jfg <<< "SELECT * FROM t WHERE a IN (1,2,3,4,5) AND b IN (1,3,5,7,9)"; \
  ./use performance_schema <<< "
    SELECT DIGEST_TEXT, SUM_TIMER_WAIT, COUNT_STAR, SUM_ROWS_SENT, SUM_ROWS_EXAMINED
      FROM events_statements_summary_by_digest
      WHERE DIGEST_TEXT like '%FROM%t%WHERE%a%'\G"
*************************** 1. row ***************************
      DIGEST_TEXT: SELECT * FROM `t` WHERE `a` IN (...) AND `b` IN (...)
   SUM_TIMER_WAIT: 284550000
       COUNT_STAR: 1
    SUM_ROWS_SENT: 0
SUM_ROWS_EXAMINED: 0

# The query, with computing the delta of dml_read before and after.
./use <<< "SET GLOBAL innodb_monitor_enable = dml_reads"; \
  sql="SELECT COUNT FROM INNODB_METRICS WHERE NAME = 'dml_reads'"; \
  c1="$(./use -N information_schema <<< "${sql}")"; \
  ./use test_jfg <<< "SELECT * FROM t WHERE a IN (1,2,3,4,5) AND b IN (1,3,5,7,9)"; \
  c2="$(./use -N information_schema <<< "${sql}")"; \
  echo $c2 - $c1 = $(($c2 - $c1))
0 - 0 = 0

# The query, with computing the delta of Innodb_rows_read before and after.
sql="SHOW GLOBAL STATUS LIKE 'Innodb_rows_read'"; \
  c1="$(./use -N <<< "${sql}" | awk '{print $2}')"; \
  ./use test_jfg <<< "SELECT * FROM t WHERE a IN (1,2,3,4,5) AND b IN (1,3,5,7,9)"; \
  c2="$(./use -N <<< "${sql}" | awk '{print $2}')"; \
  echo $c2 - $c1 = $(($c2 - $c1))
5 - 5 = 0

# The query, with truncating the table_io_waits table before and checking it after.
./use performance_schema <<< "TRUNCATE table_io_waits_summary_by_table"; \
  ./use test_jfg <<< "SELECT * FROM t WHERE a IN (1,2,3,4,5) AND b IN (1,3,5,7,9)"; \
  ./use performance_schema <<< "
    SELECT COUNT_STAR
      FROM table_io_waits_summary_by_table
      WHERE OBJECT_SCHEMA = 'test_jfg' AND OBJECT_NAME = 't'\G"
*************************** 1. row ***************************
COUNT_STAR: 25

No comments:

Post a Comment