Friday, June 28, 2024

Trick to Simulate Higher Latency IOs on Linux

When running tests, it is sometimes useful to slow down IOs to clearly show the impact of reading from the disk.  Modern laptops and servers usually have fast local SSDs, and most cloud vendors provide, in addition to fast local disks, relatively fast network block devices (EBS for AWS and Persistent Disk for GCP).  Even when using magnetic network block devices, IOs are not guaranteed to be slow because they might hit the cache of the storage appliance (EBS volumes sometime have sub-millisecond latency).  This blog post describes the solution I am using for simulating high latency IOs on Linux. TL&DR: device-mapper’s delay target.

This post is written in the context of my work on MySQL at Aiven : check our careers page or blog to know more about us.

My test environments are AWS Linux vm where I run dbdeployer and sometimes sysbench.  I have a dedicated mount point for the sandboxes as shown below.

( cd; ls -ld sandboxes; df -h /mnt/jgagne_sandboxes; )
lrwxrwxrwx. 1 jgagne jgagne 21 Nov 29  2023 sandboxes -> /mnt/jgagne_sandboxes
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme2n1     32G  269M   32G   1% /mnt/jgagne_sandboxes

I have a dedicated mount point mostly for cost optimization reasons.  I can usually do most tests with a few GB, but sometimes, I need larger disks.  EBS does not support shrinking a volume, so if I want to avoid paying for storage I do not use, I need to delete and re-create the volume.  Having a dedicated volume and mount point allows that without re-creating the whole vm (or mess-up with other things in my home directories).  I can also change the volume type depending on my needs : SSDs when I need speed, magnetic when I want to optimize costs.  I know the 32 GB above is magnetic (the actual EBS type is standard, but I use magnetic in the rest of the post, not to confuse with Cold SSD / sc1), but let's convince ourselves with ioping.

sudo ioping -c 100 -i 0 -q /mnt/jgagne_sandboxes | grep min
min/avg/max/mdev = 434.4 us / 11.8 ms / 121.5 ms / 14.9 ms

In the above, the average confirms it is magnetic (in the 10 ms range), the minimum shows that there might be caching involved and the maximum is probably a tail latency.  Below is what happens when I change the volume type to gp3 (these are General Purpose SSDs version 3).

sudo ioping -c 100 -i 0 -q /mnt/jgagne_sandboxes | grep min
min/avg/max/mdev = 483.2 us / 618.5 us / 2.22 ms / 239.6 us

So now, we have relatively fast IOs (relatively as this is probably the best we can get for network-attached devices, local SSDs can do 10 to 20 times faster).  If we want higher latency IOs, we can use the delayed device-mapper.  In below, I create two helper functions : create_delay for creating a delayed block device from another device, and update_delay to update the delay of the delayed block device.  I then replace my sandbox mount point with a delayed device and do tests with different delays.

function create_delay() {
  local device=$1
  local size=$(sudo blockdev --getsz $device)
  local table="0 $size delay $device 0 0"
  sudo dmsetup create delayed --table "$table"
}
function update_delay() {
  local new_delay=$1
  local old_table="$(sudo dmsetup table delayed)"
  local new_table="${old_table% *} $new_delay"
  sudo dmsetup reload delayed --table "$new_table"
  sudo dmsetup resume delayed
}
mount_point=/mnt/jgagne_sandboxes
device=$(mount | grep $mount_point | cut -d " " -f 1)
sudo umount $mount_point
create_delay $device
sudo mount /dev/mapper/delayed $mount_point
for i in 10 20 0; do
  update_delay $i
  sudo ioping -Dc 100 -i 0 -q /dev/mapper/delayed | grep min
done
min/avg/max/mdev = 9.59 ms / 10.7 ms / 14.5 ms / 807.3 us
min/avg/max/mdev = 19.8 ms / 20.4 ms / 24.1 ms / 563.4 us
min/avg/max/mdev = 329.5 us / 962.5 us / 2.91 ms / 325.7 us

In the above, we can see that the tests effectively show IO latencies of 10, 20 and 0 milliseconds.  These delays are for all IO types; there are ways to have different delays for read, write and flush operations (howto in the doc), but in the cases that interest me, configuring a single delay is enough.

For completeness, here are the commands to remove a delayed block device.

sudo umount $mount_point
sudo dmsetup remove delayed
sudo mount $mount_point

Not applicable in my case, but mentioning it for completeness : unmounting a filesystem to create a delayed device might not always be possible (as an example, if everything is installed in a single filesystem).  In this case, it is still possible to take advantage of delayed IOs, but it is a little more involved.  One way of doing this is described in Annex #1 : Creating a Delayed Device from a Loop.

I could stop this post here as the test above shows that we successfully achieved our goal of getting higher latency IOs, but this is a MySQL blog, so let's play a little with MySQL and delayed IOs.  For this, I use a table a little larger than the InnoDB Buffer Pool, so we expect some queries to be served from RAM and some to hit the disk.  The Annex #2 shows how to create this with dbdeployer and sysbench.  Then, I run below, which records sysbench output for non-delayed IOs and then for IOs delayed by 10 milliseconds.

for d in 0 10; do
  update_delay $d
  sysbench $sba --threads=1 --time=5 run --histogram=on > s_$d.txt
done

Looking at the histogram with non-delayed IOs (Annex #3), we see many queries served in RAM, others needing an IO, and what might be some IO tail latencies (it is not, delayed IOs will make obvious what happens).  The actual behavior of MySQL is easier to see with delayed IOs, I am not spoiling the surprise, check Annex #4.

Now that I am finished with this post (it has been in draft for more than a year !), I have to open a MySQL Bug where delayed IOs are making a behavior I hit in production a few weeks ago more visible (I will link to this post in the bug, and I will update the post when the bug is open), and I need to complete another post, also in draft for more than a year, where delaying IOs clearly shows a weird MySQL behavior (I will also update this post when the other post is published).  To be continued...


Annex #1 : Creating a Delayed Device from a Loop

# In below, I show how to create a loop device from a file,
#   and then a delayed device from this loop device.

# Before creating a loop device, we need a file that is its back-end.  
lf=/mnt/jgagne_sandboxes/loop_file
size_gb=2
sudo dd if=/dev/zero of=$lf bs=1M count=$((1024*$size_gb)) 2> /dev/null
ls -lh $lf
-rw-r--r--. 1 root root 2.0G Jun 28 18:51 /mnt/jgagne_sandboxes/loop_file

# From the file above, we create a loop device, then a filesystem that we mount.
ld=/dev/loop0  # ld: [l]oop [d]evice.
sudo losetup $ld $lf
sudo mkfs.xfs $ld > /dev/null
lmp=/mnt/jgagne_sandboxes/loop  # lmp: [l]oop [m]ount [p]oint.
sudo mkdir $lmp
sudo mount $ld $lmp
df -h $lmp
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0      2.0G   47M  2.0G   3% /mnt/jgagne_sandboxes/loop

# From here, it is the same as before for creating a delayed device.
# In the test below, we see that non-delayed IOs are very fast,
#   it is because the back-end file of the loop device is cached.
sudo umount $lmp
create_delay $ld
sudo mount /dev/mapper/delayed $lmp
for i in 10 20 0; do update_delay $i sudo ioping -Dc 100 -i 0 -q /dev/mapper/delayed | grep min done
min/avg/max/mdev = 9.64 ms / 10.0 ms / 11.2 ms / 177.6 us min/avg/max/mdev = 19.6 ms / 20.0 ms / 20.5 ms / 96.1 us min/avg/max/mdev = 57.5 us / 78.1 us / 293.2 us / 37.1 us # And for cleaning things up, we can run below. # (the rm command is commented to make sure data is not deleted by mistake) sudo umount $lmp sudo rmdir $lmp sudo dmsetup remove delayed sudo losetup -d $ld # sudo rm $lf # Uncomment to really delete data.

Annex #2 : Creating a Test Environment

# Setup a delayed block device for the sandboxes.
# Depending on your setup, you might have to adapt below.
mount_point=/mnt/jgagne_sandboxes
device=$(mount | grep $mount_point | cut -d " " -f 1)
sudo umount $mount_point
create_delay $device
sudo mount /dev/mapper/delayed $mount_point

# Create a sandbox for our tests with 3 GB of InnoDB Buffer Pool.
# (we need mysql-native-password for our version of sysbench)
# (dbda: [dbd]eployer [a]rguments)
dbda="-c skip-log-bin"
dbda="$dbda -c innodb_buffer_pool_instances=1"
dbda="$dbda -c innodb_buffer_pool_size=$((3*1024*1024*1024))"
dbda="$dbda -c mysql-native-password=ON"
dbdeployer deploy single mysql_8.4.0 $dbda

# Common sysbench stuff.
# (the number of rows is computed for having a 3 GB table)
# (rows will be grown to have 4 rows per page)
# (sba: [s]ys[b]ench [a]rguments)
nb_rows=$((3*1024*1024*1024 / (16*1024) * 4))
socket=$(awk '$1=="socket"{print $3; exit}' my.sandbox.cnf)
sba="oltp_point_select --rand-type=uniform --mysql-socket=$socket"
sba="$sba --mysql-user=msandbox --mysql-password=msandbox --mysql-db=sbtest"
sba="$sba --tables=1 --table_size=$nb_rows"

# Create a schema for sysbench, fill it, grow the rows, flush, and check size.
# (growing needs ALTER FORCE because instant ALTER)
# (the table will be a little bigger than the InnoDB Buffer Pool)
# (sysbench creates the table with two CHAR of size 120 and 60)
# ( "| pv -t" is a trick for having an execution timer)
{ ./use <<< "CREATE DATABASE sbtest"
  sysbench $sba prepare --create_secondary=off      | pv -t > /dev/null
  { echo "ALTER TABLE sbtest1 ADD COLUMN c0 CHAR(60) DEFAULT ''"
         seq -f " ADD COLUMN c%.0f CHAR(240) DEFAULT ''" 1 15
    } | paste -s -d "," | ./use sbtest              | pv -t
  ./use sbtest <<< "ALTER TABLE sbtest1 FORCE"      | pv -t
  ./use sbtest <<< "FLUSH TABLE sbtest1 FOR EXPORT" | pv -t
  du -h data/sbtest/sbtest1.ibd; }
0:00:16
0:00:00
0:01:55
0:00:00
4.7G    data/sbtest/sbtest1.ibd

# Allocate Huge Pages to prevent the table from being in the Linux Page Cache.
# (formatting adjusted and comae added for readability)
# (depending on the amount of RAM, the numbers might have to be adjusted)
# (more detail about this trick in this post:)
# (https://jfg-mysql.blogspot.com/2021/11/trick-simulate-linux-server-with-less-ram.html)
grep -e MemTotal -e MemFree -e "^Cached" -e SwapTotal -e HugePages_Total -e Hugepagesize /proc/meminfo
MemTotal:       16,224,316 kB
MemFree:        11,288,040 kB
Cached:            306,012 kB
SwapTotal:               0 kB
HugePages_Total:         0
Hugepagesize:         2048 kB
sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"
grep -e MemFree -e "^Cached" /proc/meminfo
MemFree:        11,397,972 kB
Cached:            117,020 kB
sudo bash -c "echo 1 > /proc/sys/vm/compact_memory"
sudo bash -c "echo $((9*1024/2)) > /proc/sys/vm/nr_hugepages"
grep -e MemFree -e "^Cached" -e HugePages_Total /proc/meminfo
MemFree:         1,887,700 kB
Cached:            200,620 kB
HugePages_Total:      4608

# Run sysbench log enough for the next tests to be representative.
# (by doing as many queries as there are rows, we should touch most of the pages)
# (reminder: we grew the rows to have 4 rows per page)
# (this should put the InnoDB Buffer Pool in a somehow stable state for the next tests)
sysbench $sba --threads=1 --events=$nb_rows --time=0 run | grep -e queries: | pv -t
    queries:                             786432 (1851.60 per sec.)
0:07:05

Annex #3 : Query Latency Histogram without Delay

# In below, we see many queries served in RAM, other needing an IO,
#   and what might be some IO tail latencies (it is not,
#   delayed IOs will make obvious what happens).
Latency histogram (values are in milliseconds)
       value  ------------- distribution ------------- count
       0.116 |                                         2
       0.120 |                                         4
       0.122 |                                         2
       0.127 |                                         4
       0.132 |                                         3
       0.134 |*                                        10
       0.136 |*                                        13
       0.139 |**                                       25
       0.141 |***                                      27
       0.144 |****                                     40
       0.147 |****                                     41
       0.149 |*******                                  75
       0.152 |****************                         164
       0.155 |**********************                   227
       0.158 |**************************               272
       0.160 |************************************     374
       0.163 |**************************************** 416
       0.166 |*********************************        341
       0.169 |***************************              277
       0.172 |**************************               270
       0.176 |********************                     204
       0.179 |***************                          156
       0.182 |************                             128
       0.185 |*************                            134
       0.189 |************                             129
       0.192 |*************                            134
       0.196 |*************                            133
       0.199 |*********                                92
       0.203 |*********                                96
       0.206 |***********                              114
       0.210 |*******                                  73
       0.214 |*******                                  68
       0.218 |*****                                    52
       0.222 |******                                   58
       0.226 |*****                                    49
       0.230 |******                                   61
       0.234 |****                                     39
       0.238 |*****                                    49
       0.243 |*****                                    47
       0.247 |*****                                    51
       0.252 |******                                   58
       0.256 |****                                     45
       0.261 |*****                                    54
       0.266 |*****                                    50
       0.270 |*****                                    47
       0.275 |*****                                    50
       0.280 |*****                                    48
       0.285 |****                                     42
       0.291 |****                                     46
       0.296 |****                                     41
       0.301 |****                                     44
       0.307 |****                                     39
       0.312 |****                                     39
       0.318 |***                                      36
       0.324 |***                                      31
       0.330 |*****                                    48
       0.336 |***                                      29
       0.342 |**                                       25
       0.348 |****                                     39
       0.354 |****                                     43
       0.361 |***                                      31
       0.367 |***                                      30
       0.374 |**                                       18
       0.381 |***                                      30
       0.388 |***                                      32
       0.395 |**                                       21
       0.402 |**                                       21
       0.409 |**                                       23
       0.417 |**                                       19
       0.424 |**                                       23
       0.432 |**                                       24
       0.440 |**                                       20
       0.448 |***                                      27
       0.456 |**                                       22
       0.464 |**                                       18
       0.473 |**                                       21
       0.481 |*                                        12
       0.490 |**                                       16
       0.499 |*                                        11
       0.508 |*                                        14
       0.517 |*                                        15
       0.527 |*                                        10
       0.536 |*                                        8
       0.546 |*                                        15
       0.556 |*                                        12
       0.566 |*                                        11
       0.576 |*                                        14
       0.587 |*                                        9
       0.597 |*                                        10
       0.608 |*                                        13
       0.619 |*                                        14
       0.630 |*                                        13
       0.642 |**                                       18
       0.654 |**                                       20
       0.665 |***                                      27
       0.677 |*****                                    48
       0.690 |*****                                    47
       0.702 |*******                                  69
       0.715 |*******                                  76
       0.728 |********                                 83
       0.741 |************                             122
       0.755 |*************                            139
       0.768 |**************                           149
       0.782 |*************                            131
       0.797 |**********                               108
       0.811 |*************                            140
       0.826 |************                             129
       0.841 |*******                                  75
       0.856 |**********                               105
       0.872 |*********                                96
       0.888 |*********                                91
       0.904 |******                                   63
       0.920 |******                                   64
       0.937 |*******                                  75
       0.954 |******                                   58
       0.971 |*****                                    49
       0.989 |*****                                    50
       1.007 |****                                     42
       1.025 |*****                                    54
       1.044 |****                                     44
       1.063 |***                                      35
       1.082 |****                                     45
       1.102 |****                                     46
       1.122 |***                                      36
       1.142 |*****                                    49
       1.163 |***                                      28
       1.184 |***                                      27
       1.205 |**                                       25
       1.227 |****                                     39
       1.250 |****                                     38
       1.272 |****                                     44
       1.295 |****                                     43
       1.319 |****                                     43
       1.343 |*****                                    56
       1.367 |****                                     44
       1.392 |*****                                    49
       1.417 |****                                     39
       1.443 |****                                     42
       1.469 |***                                      32
       1.496 |***                                      32
       1.523 |***                                      29
       1.551 |***                                      28
       1.579 |**                                       21
       1.608 |**                                       21
       1.637 |***                                      29
       1.667 |***                                      32
       1.697 |*                                        13
       1.728 |**                                       16
       1.759 |**                                       18
       1.791 |**                                       23
       1.824 |**                                       18
       1.857 |*                                        12
       1.891 |*                                        15
       1.925 |*                                        12
       1.960 |**                                       16
       1.996 |*                                        11
       2.032 |*                                        6
       2.069 |                                         4
       2.106 |                                         1
       2.145 |*                                        8
       2.184 |                                         3
       2.223 |*                                        7
       2.264 |                                         3
       2.305 |                                         4
       2.347 |                                         3
       2.389 |                                         3
       2.433 |                                         4
       2.477 |                                         2
       2.522 |                                         4
       2.568 |                                         4
       2.615 |                                         2
       2.662 |                                         4
       2.710 |                                         2
       2.760 |                                         3
       2.810 |                                         1
       2.861 |                                         4
       2.913 |                                         1
       2.966 |                                         3
       3.020 |                                         3
       3.075 |                                         2
       3.130 |                                         2
       3.187 |                                         1
       3.245 |                                         1
       3.304 |                                         1
       3.615 |                                         3
       3.681 |                                         3
       3.816 |                                         1
       3.885 |                                         3
       3.956 |                                         2
       4.028 |                                         1
       4.101 |                                         1
       4.176 |                                         2
       4.252 |                                         2
       4.407 |                                         3
       4.569 |                                         1
       4.737 |                                         2
       5.277 |                                         1
       5.470 |                                         1
       5.570 |                                         1
       5.986 |                                         1
       6.550 |                                         1
       9.057 |                                         1

Annex #4 : Query Latency Histogram with a Delay (10 ms)

# Conclusion below...
Latency histogram (values are in milliseconds)
       value  ------------- distribution ------------- count
       0.125 |*                                        1
       0.132 |**                                       2
       0.134 |*                                        1
       0.136 |**                                       2
       0.141 |*****                                    6
       0.144 |*******                                  9
       0.147 |**                                       3
       0.149 |**                                       3
       0.152 |*****                                    7
       0.155 |**********                               13
       0.158 |*********                                12
       0.160 |***********                              15
       0.163 |*******                                  9
       0.166 |********                                 11
       0.169 |********                                 11
       0.172 |***********                              15
       0.176 |***********                              15
       0.179 |*******                                  9
       0.182 |***********                              14
       0.185 |********                                 11
       0.189 |********                                 11
       0.192 |********                                 11
       0.196 |***************                          20
       0.199 |****************                         21
       0.203 |**************                           19
       0.206 |**************                           19
       0.210 |*****************                        22
       0.214 |***********                              14
       0.218 |***********                              15
       0.222 |******************                       24
       0.226 |**************                           18
       0.230 |***********                              15
       0.234 |**************                           18
       0.238 |********                                 11
       0.243 |***********                              14
       0.247 |*******                                  9
       0.252 |*************                            17
       0.256 |********                                 11
       0.261 |*************                            17
       0.266 |********                                 11
       0.270 |********                                 11
       0.275 |*****                                    6
       0.280 |**********                               13
       0.285 |*********                                12
       0.291 |********                                 10
       0.296 |*****                                    6
       0.301 |*********                                12
       0.307 |***                                      4
       0.312 |****                                     5
       0.318 |*******                                  9
       0.324 |******                                   8
       0.330 |****                                     5
       0.336 |****                                     5
       0.342 |*****                                    7
       0.348 |****                                     5
       0.354 |***                                      4
       0.361 |****                                     5
       0.367 |*                                        1
       0.374 |**                                       3
       0.381 |*****                                    6
       0.388 |**                                       3
       0.395 |****                                     5
       0.402 |**                                       3
       0.409 |**                                       2
       0.417 |*****                                    6
       0.424 |****                                     5
       0.432 |**                                       2
       0.440 |**                                       3
       0.448 |**                                       3
       0.456 |****                                     5
       0.473 |**                                       3
       0.490 |*                                        1
       0.499 |*                                        1
       0.508 |**                                       3
       0.517 |**                                       2
       0.527 |*                                        1
       0.536 |**                                       2
       0.546 |*                                        1
       0.556 |*                                        1
       0.576 |*                                        1
       0.587 |*                                        1
       0.608 |**                                       2
       0.630 |*                                        1
       0.642 |*                                        1
       0.665 |*                                        1
       0.677 |*                                        1
       0.741 |*                                        1
       0.755 |*                                        1
       0.768 |**                                       2
       0.782 |*                                        1
       0.826 |*                                        1
       1.272 |**                                       2
       1.443 |*                                        1
       1.496 |**                                       2
       1.637 |*                                        1
       2.223 |*                                        1
       2.305 |*                                        1
       4.407 |*                                        1
       9.734 |***                                      4
       9.910 |******                                   8
      10.090 |***********************                  30
      10.274 |***********************                  30
      10.460 |**************************************** 53
      10.651 |*********************************        44
      10.844 |*************************************    49
      11.041 |******************************           40
      11.242 |*******                                  9
      11.446 |***                                      4
      11.654 |***                                      4
      12.524 |*                                        1
      12.752 |**                                       2
      12.984 |*                                        1
      13.219 |*                                        1
      13.704 |*                                        1
      14.465 |*                                        1
      15.268 |*                                        1
      15.545 |*                                        1
      17.319 |*                                        1
      20.002 |*********                                12
      20.366 |**************                           18
      20.736 |**********************                   29
      21.112 |*********                                12
      21.496 |****                                     5
      21.886 |**                                       2
      22.284 |*                                        1
      22.689 |**                                       2
      23.101 |*                                        1
      23.521 |*                                        1
      28.162 |*                                        1
# By delaying IOs, it becomes very clear in the histogram that some queries need 2 IOs.

No comments:

Post a Comment