The previously presented results for the co-located tests (the details are in Part #4) are the following (I am omitting replication results as they are relatively similar):
- ~220 TPS for high durability (sync_binlog = 1 and trx_commit = 1)
- ~6280 TPS for low durability (sync_binlog = 0 and trx_commit = 2)
So my new test environment with better sync latencies is a vm in AWS with local SSD. The write latencies I am observing with ioping are ~50 us (they were ~590 us in my previous tests with GCP persistent disks). GCP also has local SSDs, but their write latencies are 2.27 ms and 6.22 ms for SCSI and NVMe respectively, which is worse than persistent disks (if you find this weird, you are not alone: I have a ticket open with Google about this; and as Mark Callaghan commented on my last post, this might be because GCP Local SSD are only fast for reads and still slow for writes). The details of those numbers are in the GCP vs AWS latencies for local SSD section of the annexe.
Before giving the results with faster disks, we have to think about the consequences of running a database on local SSD in a cloud environment. The local SSDs are not persistent (they are volatile), so their content could be lost in some failure scenarios, including failure of the underlying disk, if the instance stops (normal shutdown or crash), or in case of instance termination. If you choose to run MySQL on such volatile storage, you need to design your infrastructure to be able to cope with those failure scenarios (and you might simply want to run MySQL with low durability unless you also want to take advantage of very fast reads of local SSDs, but this is a different benchmark). A solution could be failing-over to slaves, but this is not trivial to implement, so I would be very careful about deploying such volatile architecture in production.
So now the results ! With a vm in AWS, still using the same dbdeployer and sysbench tests as in Part #4, I get the following results:
- Co-located, high durability: ~3790 TPS
- Co-located, low durability: ~11,190 TPS
The high durability numbers are very different from my previous tests (GCP persistent SSD disk was giving ~220 TPS). In an AWS environment with local SSD, I get throughput that are ~17 times better than GCP with persistent SSD. The AWS environment is very close to running MySQL on-premises on physical servers with SSD or with a battery-backed-up RAID cache. If you plan to move to the cloud from on-premises, make sure you take this into account.
If you plan to move MySQL from on-premises to the cloud,
make sure to take higher disk sync latencies into account !
make sure to take higher disk sync latencies into account !
And I also have 50% faster results with low durability in AWS compared to GCP. Faster sync should not influence the result of a low durability configuration with an in-memory benchmark (I am running the oltp_insert benchmark, details in the sysbench section of the annexe of Part #4), so we have to find another explanation. My guess is that AWS has faster vms than GCP, and this is confirmed by more tests whose results are in the GCP vs AWS vm performance section of the annexe.
AWS and GCP vm have different performance characteristics !
AWS and GCP also have different pricing for the instances I am using for my tests:
- a GCP n1-standard-4 instance in europe-west4 is $0.2092 per hour
- an AWS m5d.xlarge instance in Ireland is $0.252 per hour
This is all I have for now, the next post (5b) should be about multi-threads.
Hi Jeff:
ReplyDeleteThanks for writing about the benefits of running MySQL in local ssd environments and the precautions needed in such an environment. At ScaleGrid, we support MySQL deployments running on the local ssd that not only offer great performance but also ensures data reliability in case if failures. I have written a blog recently on this for reference and more details: https://scalegrid.io/blog/how-to-improve-mysql-aws-performance-2x-over-amazon-rds-at-the-same-cost/
Thanks!
Hi Prasad, to be perfectly clear, I do not recommend to run MySQL on local SSDs. This post is more about a demonstration of better TPS when sync are fast, not a description of reference architecture. Moreover and in the case of the benchmark presented in this post, running with sync_binlog=0 and trx_commit=2 would also get good TPS with less drawbacks than local SSDs IMHO. A situation where local SSDs might be useful is where the latency to remote storage is penalising reads, but it is not the most commun use-case IMHO.
DeleteCheers, JFG