HammerDB MySQL and MariaDB Best Practice for Performance and Scalability

This post complements the previous best practice guides this time with the focus on MySQL and MariaDB and achieving top levels of performance with the HammerDB MySQL TPC-C test.  As with the previous guides as an Intel employee (#IAMINTEL) the examples are taken from a MySQL 8 on Linux on Intel system and the approach is the same for whatever system you are testing although some of the settings you see may be different. Similarly for this guide MySQL can be swapped for a mySQL based databases such as MariaDB.

As is exactly the same with PostgreSQL for system choice a 2 socket system is optimal for MySQL OLTP performance. As is also the case this limitation is at the database level (especially the storage engine) rather than the hardware level. InnoDB is the storage engine that will deliver the best OLTP throughput and should be chosen for this test. 

HammerDB difference from Sysbench

For anyone benchmarking MySQL with HammerDB it is important to understand the differences from sysbench workloads as HammerDB is targeted at a testing a different usage model from sysbench. Historically MySQL has been positioned for supporting web-based applications this is in contrast to enterprise based database workloads that have been served by commercial databases such as Oracle, Db2 and SQL Server. For this reason sysbench presents a vastly more simplified workload than HammerDB.  

By default sysbench creates a single table with an ascending primary key and an identical pad column. 
MariaDB [sbtest]> select * from sbtest limit 10
    -> ;
| id | k | c | pad                                                |
|  1 | 0 |   | qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt |
|  2 | 0 |   | qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt |
|  3 | 0 |   | qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt |
|  4 | 0 |   | qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt |
|  5 | 0 |   | qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt |
|  6 | 0 |   | qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt |
|  7 | 0 |   | qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt |
|  8 | 0 |   | qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt |
|  9 | 0 |   | qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt |
| 10 | 0 |   | qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt |
10 rows in set (0.02 sec)

The workload then interacts with the database with simple SQL statements, for example with the read-only workload statements such as follows:
SELECT SUM(K) from sbtest where id between ? and ?
SELECT DISTINCT c from sbtest where id between ? and ? order by c
SELECT c from sbtest where id=?
Consequently there is no contention and the workload will scale according to the ability of the database to handle these isolated statements. 

HammerDB on the other hand is based on the TPC-C specification (but not identical to a full TPC-C workload) and more closely represents an enterprise type workload. There are 9 tables:
MariaDB [tpcc]> show tables;
| Tables_in_tpcc |
| customer       |
| district       |
| history        |
| item           |
| new_order      |
| order_line     |
| orders         |
| stock          |
| warehouse      |
9 rows in set (0.00 sec)
The tables have more complexity than sbtest:
MariaDB [tpcc]> select * from customer limit 1 G
*************************** 1. row ***************************
          c_id: 1
        c_d_id: 1
        c_w_id: 1
       c_first: 4fTYMkzYdy8EbABc
      c_middle: OE
        c_last: BARBARBAR
    c_street_1: G30TzdyBM1xx
    c_street_2: fRveiqUZDzz54pt9FK8
        c_city: jVRLzrSBopMpcmN4WYO2
       c_state: JK
         c_zip: 603111111
       c_phone: 4610643910936129
       c_since: 2018-10-11 08:48:51
      c_credit: BC
  c_credit_lim: 50000.00
    c_discount: 0.2000
     c_balance: -10.00
c_ytd_payment: 10.00
c_payment_cnt: 1
c_delivery_cnt: 0
c_data: hjtWypvebZZgVhNCdKOhHd50Wn7HQG8XAm9cSHkXTf73KqBHot7IvHq8PtaHdaJ9oMXqFx6aUXrBRxQ44gqLf0k04gkPVWc6Lx3q71gFCu1vZlLhmmIaWQf5zyDD4AAqejVcwYKi50P9rHFegjzURpTf6c9SPEfpupc7378uekwBYTj4Xfm0Od3ukiQIKto8Nlx1Is51pC4qynxLEWWGULhXdBipYckk5EjpbpdKzUjEcMGJ6nCmFpNgXfKDIUpaYsw1dWwgCEhPfXiKjXMO0v0iF56wzD6AOF4w7m8CXSw0x5zKB7URuTqlGedSYK8EvmylYudXLF
1 row in set (0.00 sec)
and the workload is driven by stored procedures:
MariaDB [tpcc]> show procedure status
    -> ;
| Db    | Name               | Type      | Definer        | Modified            | Created             | Security_type | Comment | character_set_client | collation_connection | Database Collation |
| tpcc  | DELIVERY           | PROCEDURE | @              | 2018-10-11 08:57:34 | 2018-10-11 08:57:34 | DEFINER       |         | latin1               | latin1_swedish_ci    | latin1_swedish_ci  |
| tpcc  | NEWORD             | PROCEDURE | @              | 2018-10-11 08:57:34 | 2018-10-11 08:57:34 | DEFINER       |         | latin1               | latin1_swedish_ci    | latin1_swedish_ci  |
| tpcc  | OSTAT              | PROCEDURE | @              | 2018-10-11 08:57:34 | 2018-10-11 08:57:34 | DEFINER       |         | latin1               | latin1_swedish_ci    | latin1_swedish_ci  |
| tpcc  | PAYMENT            | PROCEDURE | @              | 2018-10-11 08:57:34 | 2018-10-11 08:57:34 | DEFINER       |         | latin1               | latin1_swedish_ci    | latin1_swedish_ci  |
| tpcc  | SLEV               | PROCEDURE | @              | 2018-10-11 08:57:34 | 2018-10-11 08:57:34 | DEFINER       |         | latin1               | latin1_swedish_ci    | latin1_swedish_ci  |
7 rows in set (0.00 sec)
an example is shown of the shortest of these: 
st_w_id                 INTEGER,
st_d_id                 INTEGER,
threshold               INTEGER
DECLARE st_o_id         INTEGER;
DECLARE stock_count     INTEGER;
DECLARE `Constraint Violation` CONDITION FOR SQLSTATE '23000';
SELECT d_next_o_id INTO st_o_id
FROM district
WHERE d_w_id=st_w_id AND d_id=st_d_id;
SELECT COUNT(DISTINCT (s_i_id)) INTO stock_count
FROM order_line, stock
WHERE ol_w_id = st_w_id AND
ol_d_id = st_d_id AND (ol_o_id < st_o_id) AND
ol_o_id >= (st_o_id - 20) AND s_w_id = st_w_id AND
s_i_id = ol_i_id AND s_quantity < threshold;

As a result there is deliberate contention in the HammerDB workload that is not seen in a sysbench one. Also as HammerDB is making one call to stored procedure that then contains multiple SQL statements the workload throughput is significantly higher than sysbench. 

More recently MySQL and MariaDB has been positioned in the enterprise space competing with commercial database such as Oracle with features such as PL/SQL compatibility.  Therefore if you are testing MySQL or MariaDB’s ability to support web-based applications that use  SQL statements with minimal contention then sysbench is likely to present the best assessment of the system for this type of workload. On the other hand if testing MySQL or MariaDB for the ability to handle a more complex workload such as the use of stored procedures and in particular if looking to compare scalability with a traditional database then HammerDB is focused more towards testing those enterprise features. Finally it is also important to note that this comparison is focused around OLTP based workloads, HammerDB also supports a TPC-H based workload for analytics with complex ad-hoc queries. Traditionally MySQL has not supported such workloads however features such as columnstore in MariaDB now make this possible.  

System Setup: CPU, Memory and I/O Configuration

System setup is covered on the PostgreSQL Best Practice post so it will not be repeated here as the steps are the same. Make sure that the I/O that is able to keep up with writing to the redo log. Also make sure that if using an Intel CPU it is correctly configured with the right drivers and setup for turbo boost with cpupower output as follows: 
./cpupower frequency-info
analyzing CPU 0:   
driver: intel_pstate
  CPUs which run at the same hardware frequency: 0   
CPUs which need to have their frequency coordinated by software: 0   
maximum transition latency:  Cannot determine or is not supported.   
hardware limits: 1000 MHz - 3.80 GHz
available cpufreq governors: performance powersave   
current policy: frequency should be within 1000 MHz and 3.80 GHz.
                   The governor "performance" may decide which speed to use
                   within this range.
 current CPU frequency: Unable to call hardware   
 current CPU frequency: 1.99 GHz (asserted by call to kernel)   
boost state support:
     Supported: yes
     Active: yes

Finally in testing Huge Pages has less of an impact than with PostgreSQL, although performance should not be lower than having Huge Pages disabled. Just note that if using MariaDB in some versions a bug means that Huge Pages cannot be enabled although this does not greatly impact performance.

Install MySQL or MariaDB from Source

(Note that the following section applies in particular to pre-2019 versions of MySQL and MariaDB and more recent versions of MySQL 8 have already been updated for optimal performance on multiple platforms and therefore the change is this section is not required)

The MySQL manual advises that building from source allows you customize build parameters and as with PostgreSQL this can also be the best way to achieve maximum performance. In particular with a workload such as HammerDB there is a lot of emphasis on locking and latching that is not seen in a sysbench workload and therefore you may see gains in modifying the InnoDB storage engine code. In InnoDB some of this behaviour is hardcoded however the CPU PAUSE instruction that is used by the UT_RELAX_CPU macro can and does vary between CPUs.  In particular on Intel Scalable Processors (Skylake architecture) the PAUSE instruction is much longer than previous architectures and therefore calling UT_RELAX_CPU can consume a lot more time resulting in reduced performance.  It may be the case that by the time you are testing this you can modify this behaviour with a parameter. Otherwise find the file ut0ut.cc in the directory storage/innobase/ut and modify the value accordingly, by default it is set to 50, for Skylake 5 is more appropriate for the delay multiplier. 
for (i = 0; i <delay * 5; i++) {
    j += i;

If unsure on whether to do this run the workload and use the perf command to observe the top functions. If “ut_delay” is consuming most of the CPU (in most cases it will still be one of the top functions) then potentially modifying this code can increase performance. 

Configure MySQL

An example MySQL my.cnf file is shown.  Note that innodb_file_per_table is set then when we use partition in HammerDB it can insert into multiple files one per partition.  This gives a performance gain. One configured start the MySQL or MariaDB database.  For more details on MySQL and optimization see the website by Dimitri Kravtchuk

# general
# files
# buffers
# tune
# perf special
# monitoring
Configure the HammerDB Client

Download and install HammerDB on a test client system, Like PostgreSQL another 2 socket server is ideal. You need the client libraries so  ensure the MySQL 5.7 library file “libmysqlclient.so.20” is findable – you only need this one file and then run hammerdbcli to check for this file: 

HammerDB CLI v3.1
Copyright (C) 2003-2018 Steve Shaw
Type "help" for a list of commands
The xml is well-formed, applying configuration
Checking database library for MySQL
Success ... loaded library mysqltcl for MySQL
Create the Schema and Run the Test
Just before running the schema creation double check that you have initialised mysql correctly and the mysql directory exists, otherwise the tables will be created and loaded with data but you will get the following error when creating the stored procedures
Error in Virtual User 1: mysqlexec/db server: Table 'mysql.proc' doesn't exist
an example of how to do this is as follows:
$ ./scripts/mysql_install_db --srcdir=/home/mariadb/mariadb-10.2 --defaults-file=./../my.cnf
Installing MariaDB/MySQL system tables in '/home/mariadb/data' ...
If using the HammerDB CLI an example script saved as innodbbuild.tcl is shown update accordingly for your system and make sure partitioning is included:
dbset db mysql
diset connection mysql_host
diset connection mysql_port 3306
diset tpcc mysql_count_ware 800
diset tpcc mysql_partition true
diset tpcc mysql_num_vu 64
diset tpcc mysql_storage_engine innodb
print dict
$ ./hammerdbcli
HammerDB CLI v3.1
Copyright (C) 2003-2018 Steve Shaw
Type "help" for a list of commands
The xml is well-formed, applying configuration 
hammerdb>source innodbbuild.tcl

It will return to the prompt when built – then restart the MySQL instance and run a single test. On an up to date system around mid-2018 expect to see up to 2M MySQL TPM and around 650K NOPM. 

Vuser 1:56 Active Virtual Users configured
Vuser 1:TEST RESULT : System achieved 1994844 MySQL TPM at 658082 NOPM
 Then on a load test client run a script such as the following at the command line this does the same as autopilot in the GUI so make sure you log to temp.
proc runtimer { seconds } {
set x 0
set timerstop 0
while {!$timerstop} {
incr x
after 1000
  if { ![ expr {$x % 60} ] } {
          set y [ expr $x / 60 ]
          puts "Timer: $y minutes elapsed"
if {  [ vucomplete ] || $x eq $seconds } { set timerstop 1 }
dbset db mysql
diset connection mysql_host
diset connection mysql_port 3307
diset tpcc mysql_driver timed
diset tpcc my_rampup 2
diset tpcc my_duration 5
vuset logtotemp 1
foreach z { 1 2 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 } {
puts "$z VU TEST"
vuset vu $z
runtimer 600
after 5000
 Test Results

Grab the TPM or NOPM from the logfile. If everything went well you should get a sequence as follows peaking at around 650,000 NOPM on an up to date system in mid-2018.  To be clear this chart shows the data from 20 performance tests, each with a 2 minute rampup time and 5 minute test. After each test completed HammerDB then increased the virtual user count and repeated the test. Fortunately HammerDB allows all of this process to be automated.  Each data point shows the average transaction rate captured over the 5 minute test so not the peak performance that may be higher. 

Note that after a point, in this case after 56 virtual users, performance will decrease as you increase the number of virtual users. This is to be expected and is due to the limitations of the scalability of the storage engine. If you report the stored procedure response times (see the HammerDB manual for how to do this) you will see the time increase especially for the DELIVERY stored procedure that is updating the ORDER_LINE table while the NEWORD stored procedure is trying to insert into it. As a result the NOPM or New Orders per minute reduces as more contention is experienced. This is not a limitation of HammerDB that has been observed to drive significantly higher transaction rates.  Nevertheless around 2M MySQL TPM is very respectable throughput for an enterprise workload and as there is a lot more CPU available the potential is there to go a lot higher.