Comparing HammerDB TPROC-C results with sysbench-tpcc

In a recent project comparing systems for MariaDB performance, a user had originally been using a tool called sysbench-tpcc to compare hardware platforms before migrating to HammerDB.  However, the user was not aware that the performance could be compared between the 2. This is a brief post to highlight the metrics to use to do the comparison using a separate hardware platform for illustration purposes.

Firstly, it is worth noting that both HammerDB TPROC-C and sysbench-tpcc run workloads based on the TPC-C specification, however as described here HammerDB is called TPROC-C to correctly comply with the TPC fair use rules. 

Also note that whereas HammerDB offers a feature to do a fixed throughput workload close to the specification. In this case, we will only show the workloads run without keying and thinking time, as only HammerDB offers both. HammerDB also runs natively on Windows and Linux with GUI, CLI and Web interfaces on multiple databases, but in this case the example will be on MariaDB on Linux with the CLI.

Prepare or build the schema

Firstly, before running a workload, you need to build or prepare the schema. sysbench-tpcc offers the ability to build multiple schemas to work around scalability issues, however the TPC-C specification uses a single set of tables which can be built as follows.

./tpcc.lua --mysql-socket=/tmp/mariadb.sock --mysql-user=root --mysql-password=maria --mysql-db=tpccsb --time=300 --threads=64 --report-interval=1 --tables=1 --scale=400 --db-driver=mysql prepare

The equivalent in HammerDB is the buildschema command, with example settings as below. The scripts can be in Python or Tcl format.

#!/bin/tclsh
# maintainer: Pooja Jain

puts "SETTING CONFIGURATION"
dbset db maria
dbset bm TPC-C

diset connection maria_host localhost
diset connection maria_port 3306
diset connection maria_socket /tmp/mariadb.sock

set vu 100
set warehouse 400
diset tpcc maria_count_ware $warehouse
diset tpcc maria_num_vu $vu
diset tpcc maria_user root
diset tpcc maria_pass maria
diset tpcc maria_dbase tpcc
diset tpcc maria_storage_engine innodb
if { $warehouse >= 200 } {
diset tpcc maria_partition true
} else {
diset tpcc maria_partition false
}
puts "SCHEMA BUILD STARTED"
buildschema
puts "SCHEMA BUILD COMPLETED"

and run as follows from the command line.

 ./hammerdbcli auto ./scripts/tcl/maria/tprocc/maria_tprocc_build.tcl

You can see that both schemas are similar when built, with the main difference being that sysbench adds prefixes to a number of columns to aid with compression whereas HammerDB is closer to the specification.

MariaDB [tpccsb]> show tables;
+------------------+
| Tables_in_tpccsb |
+------------------+
| customer1        |
| district1        |
| history1         |
| item1            |
| new_orders1      |
| order_line1      |
| orders1          |
| stock1           |
| warehouse1       |
+------------------+
9 rows in set (0.000 sec)

MariaDB [tpccsb]> select * from warehouse1 limit 1;
+------+------------+--------------------+--------------------+-----------------+---------+-----------+-------+------------+
| w_id | w_name     | w_street_1         | w_street_2         | w_city          | w_state | w_zip     | w_tax | w_ytd      |
+------+------------+--------------------+--------------------+-----------------+---------+-----------+-------+------------+
|    1 | name-ussgn | street1-suwfdxnitk | street2-sdptwkrcjd | city-wowgpzhpmq | fu      | zip-12460 |  0.12 | 8398416.00 |
+------+------------+--------------------+--------------------+-----------------+---------+-----------+-------+------------+

MariaDB [tpcc]> show tables;
+----------------+
| Tables_in_tpcc |
+----------------+
| customer       |
| district       |
| history        |
| item           |
| new_order      |
| order_line     |
| orders         |
| stock          |
| warehouse      |
+----------------+
9 rows in set (0.000 sec)

MariaDB [tpcc]> select * from warehouse limit 1;
+------+-----------+--------+----------+---------------+---------------------+--------------------+---------+-----------+
| w_id | w_ytd     | w_tax  | w_name   | w_street_1    | w_street_2          | w_city             | w_state | w_zip     |
+------+-----------+--------+----------+---------------+---------------------+--------------------+---------+-----------+
|    1 | 300000.00 | 0.1800 | kyKhVJqn | ukYR4HaaEJLVi | icFhnjwgqE3cexTJFwR | Kxf1T7pcaHNyvELEIx | lH      | 358511111 |
+------+-----------+--------+----------+---------------+---------------------+--------------------+---------+-----------+

Running the workloads

We can run the workloads as follows, and in the example on both we will use 80 threads or Virtual Users in HammerDB terminology. To run the workload on sysbench-tpcc is the following.

./tpcc.lua --mysql-socket=/tmp/mariadb.sock --mysql-user=root --mysql-password=maria --mysql-db=tpccsb --time=300 --threads=80 --report-interval=1 --tables=1 --scale=400 --db-driver=mysql run

As the workloads are based on the same specification, you can use HammerDB to monitor the sysbench-tpcc workload with the metstart command starting the CPU monitor and tcstart the transaction counter.

hammerdb>metstart
Starting Local Metrics Agent on ubuntu
after#1
hammerdb>Connecting to Agent to Display CPU Metrics
Metric receive port open @ 27702 on ubuntu
Connecting to HammerDB Agent @ localhost:10000
Testing Agent Connectivity...OK
Metrics Connected
Started CPU Metrics for Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz:(112 CPUs)
hammerdb>tcstart
Transaction Counter Started

hammerdb>0 MariaDB tpm
CPU all usr%-0.00 sys%-0.02 irq%-0.00 idle%-99.97
0 MariaDB tpm

and a HammerDB example script as follows:

#!/bin/tclsh
# maintainer: Pooja Jain

set tmpdir $::env(TMP)
puts "SETTING CONFIGURATION"
jobs profileid 0
dbset db maria
dbset bm TPC-C
giset timeprofile xt_gather_timeout 1200
giset commandline keepalive_margin 1200

diset connection maria_host 127.0.0.1
diset connection maria_port 3306
diset connection maria_socket /tmp/mariadb.sock
#
diset tpcc maria_user root
diset tpcc maria_pass maria
diset tpcc maria_dbase tpcc
diset tpcc maria_driver timed
diset tpcc maria_rampup 2
diset tpcc maria_duration 5
diset tpcc maria_no_stored_procs false
diset tpcc maria_allwarehouse false
diset tpcc maria_timeprofile true
diset tpcc maria_purge true
#start CPU
metstart
puts "TEST STARTED"
loadscript
vuset vu 80
vuset logtotemp 1
vucreate
tcstart
tcstatus
vurun
tcstop
vudestroy
#stop CPU
metstop
puts "TEST COMPLETE"

Note that a key parameter here is setting maria_no_stored_procs to true or false. HammerDB uses stored procedures for higher throughput as fully explained here, but also offers a client SQL based version for comparison, sysbench implements a client SQL based version and therefore this is a key understanding for the difference between the 2 workloads. The HammerDB workload is run as shown:

./hammerdbcli auto ./scripts/tcl/maria/tprocc/maria_tprocc_run.tcl

Comparing the results

When running the 80 thread sysbench-tpcc workload, monitoring with HammerDB we can see the following output.

hammerdb>0 MariaDB tpm
CPU all usr%-17.58 sys%-4.18 irq%-0.00 idle%-78.21
485784 MariaDB tpm
CPU all usr%-17.87 sys%-4.30 irq%-0.00 idle%-77.80
497376 MariaDB tpm
CPU all usr%-17.79 sys%-4.36 irq%-0.00 idle%-77.84
478644 MariaDB tpm
CPU all usr%-17.74 sys%-4.35 irq%-0.00 idle%-77.88
485718 MariaDB tpm

and when it has finshed it prints output as follows:

SQL statistics:
    queries performed:
        read:                            31779084
        write:                           32983004
        other:                           4898362
        total:                           69660450
    transactions:                        2449061 (8162.57 per sec.)
    queries:                             69660450 (232173.91 per sec.)
    ignored errors:                      10685  (35.61 per sec.)
    reconnects:                          0      (0.00 per sec.)

Throughput:
    events/s (eps):                      8162.5668
    time elapsed:                        300.0356s
    total number of events:              2449061

Latency (ms):
         min:                                    0.35
         avg:                                    9.80
         max:                                  307.66
         95th percentile:                       41.10
         sum:                             23997083.58

Threads fairness:
    events (avg/stddev):           30613.2625/297.35
    execution time (avg/stddev):   299.9635/0.01

The key figure here is 8162.57 per sec, multiplied by 60 gives us 489,754 TPM (transactions per minute) and is the figure we can use for comparison as can be seen from the HammerDB transaction output giving the same data.

Running HammerDB with stored procedures we can see the difference in CPU utilisation and transactions.

CPU all usr%-52.76 sys%-6.52 irq%-0.00 idle%-40.52
1512462 MariaDB tpm
CPU all usr%-52.85 sys%-6.43 irq%-0.00 idle%-40.55
1519824 MariaDB tpm
CPU all usr%-53.01 sys%-6.47 irq%-0.00 idle%-40.35
1515888 MariaDB tpm
CPU all usr%-52.99 sys%-6.36 irq%-0.00 idle%-40.51
1524312 MariaDB tpm

and it is the TPM value that we use for comparison and not NOPM, as both tools are measuring transactions per second/minute.

Vuser 1:80 Active Virtual Users configured
Vuser 1:TEST RESULT : System achieved 632885 NOPM from 1471425 MariaDB TPM

if we use the HammerDB no stored procedures option we can see that performance drops as would be expected.

Vuser 1:Test complete, Taking end Transaction Count.
Vuser 1:80 Active Virtual Users configured
Vuser 1:TEST RESULT : System achieved 491964 NOPM from 1143635 MariaDB TPM

and refering to the previous article we can see that we are using more system time as we are now spending more time in the network.

1155714 MariaDB tpm
CPU all usr%-52.57 sys%-11.44 irq%-0.00 idle%-35.90
1163382 MariaDB tpm
CPU all usr%-52.73 sys%-11.57 irq%-0.00 idle%-35.61
1154976 MariaDB tpm
CPU all usr%-52.60 sys%-11.71 irq%-0.00 idle%-35.60
1153836 MariaDB tpm
CPU all usr%-52.56 sys%-11.67 irq%-0.00 idle%-35.68

Analyzing results

HammerDB will also automatically generate graphs for you to analyze your workload and detailed response times per transaction.

Summary

Of course the more benchmarks and workloads you run against a system, the more insights you can get. All benchmarks are valuable, however it is important to ensure that you deriving accurate results.

In our example a user was initially using sysbench-tpcc to compare different hardware systems for MariaDB however was drawing conclusions about both the hardware and database software capabilities that was not in keeping with our observations. Using this approach we provided an alternative measurement and showed how both approaches compared to illustrate the capabilities of both the hardware and software.

 

HammerDB v4.11 New Features: CPU Metrics for CLI, GUI and Jobs

One of the key metrics for measuring database performance is CPU utilisation. Therefore since version 2.16 HammerDB has included a CPU monitor in the GUI to graphically observe this utilisation during a workload.

At HammerDB v4.11 the CPU metrics functionality has been enhanced to add the CPU data to the CLI as well as recording this CPU data for HammerDB jobs to view with the HammerDB web service. This post explains the changes to the CPU metrics in HammerDB v4.11 and how to use them.

Capturing CPU utilisation data

HammerDB can record a number of metrics for a workload (such as transaction count and response times) that are stored in the SQLite databases repository for viewing with the web service. HammerDB v4.11 adds the same functionality for the CPU metrics as well as adding a number of commands to the CLI so that the CPU data can be recorded for both.

It is important to note however the distinction between data such as transaction count and CPU utilisation in that the former can be retrieved from the target database with SQL, however the latter requires additional data gathering at the system level. To do this HammerDB uses a separate agent process to gather the CPU utilisation and send it to HammerDB.  If HammerDB is running on a separate system to the database being tested, then the agent needs to run on the system where the database is running.

On Linux the HammerDB agent uses the mpstat utility from the sysstat package and therefore this package must be installed on the system being tested.

$ mpstat -V
sysstat version 11.5.7
(C) Sebastien Godard (sysstat <at> orange.fr

On windows the mpstat utility is provided by HammerDB instead and is installed in the agent directory.  This uses the Windows PDH functions to consume the counter data.

If HammerDB is installed on the same system as the database then the agent can be run directly from the HammerDB install. If the database system is separate and HammerDB is connecting over the network then HammerDB should be installed on the database server for running the agent.

Note that you may have already installed HammerDB on both systems as running a schema build locally on the same system will typically be much faster then running a build over the network.

Starting the agent manually

Note that if you are running HammerDB on the same host as the database then you can skip to starting the agent automatically.  However the choice is yours, even if the host is the same then you can still follow the manual process of starting the agent and then connecting display that you have to take when the systems are separate.

The agent can be started manually by running ./agent on Linux or agent.bat on Windows.

When run without arguments the agent will randomly select a port and report the port and hostname to connect to.

Alternatively, if you provide an argument of a port number to the agent

C:\Program Files\HammerDB-4.11\agent>agent.bat 10000

Then the agent will start with this specified port number

The agent is now ready to be connected to with a display.  When connected the agent will run mpstat and gather and send CPU data to the display.

GUI Metrics interface

Under the GUI tree view you can see Metrics Options and Display. Options provides the Display with the configuration of the agent to connect to and Display starts the GUI connecting to the Display.

Under the metrics agent options enter the Agent ID that was either randomly chosen or specified when the agent was started and the hostname it is running on, either local or remote.

If the hostname is the localhost, then you can verify if the agent is running with the Agent Status button. As we already started the agent manually on port 10000 then we can see that HammerDB reports that the agent is already running.

Note that you can also see that we have the options to start and stop the agent from the GUI when we are running both the agent and the GUI on the same host.  This starts and stops the agent in exactly the same way as described when running manually with providing the port number that you have specified under Agent ID.

With the agent running either locally or remotely, click Display in the tree view, the HammerDB Display will now connect to the tree view and begin showing the CPU metrics.

Note that the agent will also report that a Display has connected.

Also the metrics window can be dragged out of HammerDB to display standalone, or if closed will return to the HammerDB notebook.

Finally, if you select the Start Display with Local Agent checkbox, if this is selected and you click Agent Start, then both the Agent and Display will start running on the localhost in a single step.

CLI Metrics interface

The CLI metrics interface follows the same steps as the GUI. Firstly use the print generic command to observer the metrics settings.

hammerdb>print generic
Generic Dictionary Settings
...
metrics              {
 agent_hostname = localhost
 agent_id       = 10000
}
...

The metset command can be used to change the agent_hostname or agent_id.

hammerdb>metset
Usage: metset [agent_hostname|agent_id] value

The metstart command is used to start the CLI metrics display.  If the agent host is the localhost and the agent is not already running then metstart will also start the agent.

hammerdb>metstart
Starting Local Metrics Agent on ubuntu
after#1
hammerdb>Connecting to Agent to Display CPU Metrics
Metric receive port open @ 37065 on ubuntu
Connecting to HammerDB Agent @ localhost:10000
Testing Agent Connectivity...OK
Metrics Connected
Started CPU Metrics for Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz:(112 CPUs)
CPU all usr%-0.06 sys%-0.04 irq%-0.00 idle%-99.87
CPU all usr%-0.08 sys%-0.04 irq%-0.00 idle%-99.85
CPU all usr%-0.06 sys%-0.03 irq%-0.00 idle%-99.90
CPU all usr%-0.04 sys%-0.03 irq%-0.00 idle%-99.91
...

metstatus checks whether the agent is running. Note that because the agent is a remote process then there may be some delay between starting the metrics and the status returning as running.

hammerdb>metstatus
CPU Metrics are running on ubuntu
Metrics Agent running on localhost:10000

Finally metstop stops the display and agent.

hammerdb>metstop
Stopping Metrics Agent and Display on localhost:10000

Typically these commands will be included in a script as follows:

set tmpdir $::env(TMP)
puts "SETTING CONFIGURATION"
jobs profileid 0
dbset db maria
dbset bm TPC-C
giset timeprofile xt_gather_timeout 1200
giset commandline keepalive_margin 1200

diset connection maria_host 127.0.0.1
diset connection maria_port 3306
diset connection maria_socket /tmp/mariadb.sock
#
diset tpcc maria_user root
diset tpcc maria_pass maria
diset tpcc maria_dbase tpcc
diset tpcc maria_driver timed
diset tpcc maria_rampup 2
diset tpcc maria_duration 5
diset tpcc maria_no_stored_procs false
diset tpcc maria_allwarehouse false
diset tpcc maria_timeprofile true
diset tpcc maria_purge true
#start CPU
metstart
puts "TEST STARTED"
loadscript
vuset vu 80
vuset logtotemp 1
vucreate
tcstart
tcstatus
vurun
tcstop
vudestroy
#stop CPU
metstop
puts "TEST COMPLETE"

and when the script is run you can observe that the CPU utilization is reported. Note that the difference from the GUI is that the CLI reports utilization for all CPUs whereas in the GUI you can observe the individual cores.

 ./hammerdbcli auto ./scripts/tcl/maria/tprocc/maria_tprocc_run_80vu.tcl
...
Vuser 1:Rampup 1 minutes complete ...
1616094 MariaDB tpm
CPU all usr%-56.20 sys%-7.32 irq%-0.00 idle%-36.39
1621218 MariaDB tpm
CPU all usr%-56.71 sys%-7.29 irq%-0.00 idle%-35.94
1605342 MariaDB tpm
CPU all usr%-56.55 sys%-7.18 irq%-0.00 idle%-36.19
1614732 MariaDB tpm
CPU all usr%-56.67 sys%-7.34 irq%-0.00 idle%-35.94
...
Vuser 28:FINISHED SUCCESS
Vuser 29:FINISHED SUCCESS
Vuser 1:Calculating timings...
Vuser 1:Writing timing data to /home/HammerDB-4.11/TMP/hdbxtprofile.log
CPU all usr%-4.81 sys%-0.33 irq%-0.00 idle%-89.32
111918 MariaDB tpm
CPU all usr%-2.82 sys%-0.17 irq%-0.00 idle%-96.98
0 MariaDB tpm
CPU all usr%-2.73 sys%-0.13 irq%-0.00 idle%-97.11
0 MariaDB tpm
CPU all usr%-2.84 sys%-0.11 irq%-0.00 idle%-97.03
0 MariaDB tpm
Vuser 1:Starting purge: history list length 3044852
CPU all usr%-2.67 sys%-0.08 irq%-0.00 idle%-97.23
0 MariaDB tpm
CPU all usr%-2.83 sys%-0.08 irq%-0.00 idle%-97.05
0 MariaDB tpm
CPU all usr%-2.61 sys%-0.09 irq%-0.00 idle%-97.21
0 MariaDB tpm
CPU all usr%-2.87 sys%-0.08 irq%-0.00 idle%-96.99
0 MariaDB tpm
CPU all usr%-3.00 sys%-0.07 irq%-0.00 idle%-96.88
0 MariaDB tpm
CPU all usr%-3.03 sys%-0.07 irq%-0.00 idle%-96.82
0 MariaDB tpm
CPU all usr%-2.94 sys%-0.11 irq%-0.00 idle%-96.85
0 MariaDB tpm
Vuser 1:Purge complete in 0 hrs:01 mins:11 secs
Vuser 1:Starting write back: dirty buffer pages 1161320
CPU all usr%-0.77 sys%-0.25 irq%-0.00 idle%-98.27
0 MariaDB tpm
Vuser 1:Write back complete in 0 hrs:00 mins:14 secs
Vuser 1:FINISHED SUCCESS
ALL VIRTUAL USERS COMPLETE
Transaction Counter thread running with threadid:tid0x7f83cd7fa700
Stopping Transaction Counter
vudestroy success
Stopping Metrics Agent and Display on localhost:10000
TEST COMPLETE

When the job has finished the new job system command will record the name of the CPU and the number of cores.

hammerdb>job 667D3F1561BD03E203533353 system
[
  "Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz",
  "112"
]

and the job metrics command will show the CPU utilization data for a particular job.

hammerdb>job 667D3F1561BD03E203533353 metrics
{
  "2024-06-27 03:29:51": {
    "usr%": "6.21",
    "sys%": "0.46",
    "irq%": "0.0",
    "idle%": "90.63"
  },
  "2024-06-27 03:30:01": {
    "usr%": "24.55",
    "sys%": "1.45",
    "irq%": "0.0",
    "idle%": "72.86"
  },
  "2024-06-27 03:30:11": {
    "usr%": "39.02",
    "sys%": "2.81",
    "irq%": "0.0",
    "idle%": "57.01"
  },
  "2024-06-27 03:30:21": {
    "usr%": "50.87",
    "sys%": "5.16",
    "irq%": "0.0",
    "idle%": "42.97"
  },
...
}

Viewing Metrics with the Web Service

When you run a job with the CPU metrics running, either in the GUI or in the CLI the job will record the CPU that the workload is running on and the metrics data. Note that if the metrics is not running then the metrics and system links will be missing for a particular job.

In the example we can see that the metrics was running for this job and the system and metrics links are present.

The system link reports the CPU that the job was run against.

and the metrics link shows a graph for user% and sys% across the time the job was run.

Summary

HammerDB v4.11 adds functionality to improve capturing CPU utilisation during a job and storing this data for displaying along with other job related data. It also adds the metset, metstart, metstop and metstatus commands to run and capture CPU utilisation from the CLI interface.

HammerDB v4.11 New Features: Performance Profiles for TPROC-C Workloads

Arguably, the most common beginning errors with database benchmarking is for a user to select a single point of utilisation (usually overconfigured)  and then extrapolate conclusions about system performance from this single point. Instead, HammerDB has always encouraged the building of performance profiles to fully understand how a database system behaves.

HammerDB v4.11 automates this practice by introducing the concept of performance profiles for TPROC-C workloads. This functionality available with both GUI and CLI enables you to group a number of related TPROC-C workloads together to build a profile across multiple benchmarks.

Typically, this functionality will be used with an automated workload where you have defined an increasing sequence of Virtual Users to identify the levels of peak performance on a system or compare different software and hardware configurations.  The performance profile allows you to group these related TPROC-C workloads together with a single profile ID.

By default, the performance profile ID is set to 0 meaning that a particular HammerDB job is not related to other jobs and by setting this profile ID to a positive integer then groups all jobs with the same number together.

Using Performance Profiles with Autopilot

In the GUI, when jobs are enabled, there is the new option of performance profile ID. To activate performance profiles set this ID to a positive value with the Job Options dialog.

As this setting is related to automated workloads the same setting is also available in the Autopilot options.

Then run an autopilot sequence exactly as before.

When the run has completed, view your results with the HammerDB web service and you will see a new section called Performance Profiles and a profile listed under the ID you have created.  The profile lists the number of Jobs grouped under that ID and pinpoints the job with the maximum NOPM and TPM for that job and the number of Active Virtual Users.

Clicking on the profile ID will show a graph of both NOPM and TPM (both of which can be interactively deselected) allowing you to view the performance profile for your grouped jobs.

Using Performance Profiles with the CLI

In advanced server environments, most users will use scripts to run a performance profile sequence. The equivalent of setting the Performance Profile ID in the GUI is done with the jobs profileid command. Without an argument it will report the current profile ID, by default 0 for no profile and with an integer argument it will set a Profile ID.

hammerdb>jobs profileid Performance profile id set to 0

hammerdb>jobs profileid 1 Setting performance profile id to 1

hammerdb>jobs profileid Performance profile id set to 1

If running a sequence of tests as follows, setting the Profile ID requires just one command before any of the workload is run.

#!/bin/tclsh
# maintainer: Pooja Jain

set tmpdir $::env(TMP)
puts "SETTING CONFIGURATION"
jobs profileid 1
dbset db maria
dbset bm TPC-C
giset timeprofile xt_gather_timeout 1200
giset commandline keepalive_margin 1200

diset connection maria_host 127.0.0.1
diset connection maria_port 3306
diset connection maria_socket /tmp/mariadb.sock
#
diset tpcc maria_user root
diset tpcc maria_pass maria
diset tpcc maria_dbase tpcc
diset tpcc maria_driver timed
diset tpcc maria_rampup 2
diset tpcc maria_duration 5
diset tpcc maria_no_stored_procs false
diset tpcc maria_allwarehouse false
diset tpcc maria_timeprofile true
diset tpcc maria_purge true
#start CPU
metstart
puts "TEST STARTED"
foreach z { 1 2 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88
92 96 100 104 108 112 116 120 } {
loadscript
vuset vu $z
vuset logtotemp 1
vucreate
tcstart
tcstatus
vurun
tcstop
vudestroy
}
metstop
puts "TEST COMPLETE"

Running the webservice will then show the performance profile

and clicking on the Profile ID the graph of the NOPM and TPM for all of the jobs under that ID.

The CLI also adds a jobs profile command that will report the results of jobs with the profile ID specified.

hammerdb>job profile 1
{
"66686B0D61A903E233236323": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 08:19:41",
"activevu": "1",
"nopm": "16254",
"tpm": "37709"
},
"66686CB661A903E243430373": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 08:26:46",
"activevu": "2",
"nopm": "35434",
"tpm": "82497"
},
"66686E6161A903E263536313": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 08:33:53",
"activevu": "4",
"nopm": "71549",
"tpm": "166405"
},
"6668700E61A903E283834363": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 08:41:02",
"activevu": "8",
"nopm": "135774",
"tpm": "316012"
},
"666871C161A903E203230333": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 08:48:17",
"activevu": "12",
"nopm": "196216",
"tpm": "455867"
},
"6668737A61A903E283131383": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 08:55:38",
"activevu": "16",
"nopm": "258999",
"tpm": "601697"
},
"6668753861A903E283437363": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 09:03:04",
"activevu": "20",
"nopm": "310608",
"tpm": "721658"
},
"666876FB61A903E293532383": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 09:10:35",
"activevu": "24",
"nopm": "358359",
"tpm": "832467"
},
"666878C361A903E203836353": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 09:18:11",
"activevu": "28",
"nopm": "399416",
"tpm": "928993"
},
"66687A8E61A903E213737353": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 09:25:50",
"activevu": "32",
"nopm": "438920",
"tpm": "1018825"
},
"66687C5E61A903E263536313": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 09:33:34",
"activevu": "36",
"nopm": "471456",
"tpm": "1095396"
},
"66687E2961A903E263635343": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 09:41:13",
"activevu": "40",
"nopm": "522600",
"tpm": "1213621"
},
"66687FF761AA03E293130303": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 09:48:55",
"activevu": "44",
"nopm": "548843",
"tpm": "1275641"
},
"666881CE61AA03E283137393": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 09:56:46",
"activevu": "48",
"nopm": "569173",
"tpm": "1322266"
},
"666883A861AA03E293734373": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 10:04:40",
"activevu": "52",
"nopm": "591268",
"tpm": "1373413"
},
"6668858561AA03E223038303": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 10:12:37",
"activevu": "56",
"nopm": "609316",
"tpm": "1415349"
},
"6668877161AA03E243737363": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 10:20:49",
"activevu": "60",
"nopm": "628853",
"tpm": "1462056"
},
"6668896861AA03E253531393": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 10:29:12",
"activevu": "64",
"nopm": "642036",
"tpm": "1491259"
},
"66688B6861AA03E203133313": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 10:37:44",
"activevu": "68",
"nopm": "656450",
"tpm": "1524772"
},
"66688D7161AA03E293331363": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 10:46:25",
"activevu": "72",
"nopm": "661515",
"tpm": "1536318"
},
"66688F8361AA03E203331333": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 10:55:15",
"activevu": "76",
"nopm": "670828",
"tpm": "1557467"
},
"6668919E61AA03E263931383": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 11:04:14",
"activevu": "80",
"nopm": "672855",
"tpm": "1563938"
},
"666893BF61AA03E213631303": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 11:13:19",
"activevu": "84",
"nopm": "680334",
"tpm": "1580944"
},
"666895E761AA03E253330363": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 11:22:31",
"activevu": "88",
"nopm": "685096",
"tpm": "1590859"
},
"6668981461AA03E213832303": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 11:31:48",
"activevu": "92",
"nopm": "680334",
"tpm": "1580664"
},
"66689A4661AA03E213938343": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 11:41:10",
"activevu": "96",
"nopm": "679016",
"tpm": "1577343"
},
"66689C7E61AA03E213730313": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 11:50:38",
"activevu": "100",
"nopm": "678694",
"tpm": "1577827"
},
"66689EB561AA03E233739373": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 12:00:05",
"activevu": "104",
"nopm": "674706",
"tpm": "1566707"
},
"6668A0F461AA03E243230353": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 12:09:40",
"activevu": "108",
"nopm": "670643",
"tpm": "1557979"
},
"6668A33361AA03E293836323": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 12:19:15",
"activevu": "112",
"nopm": "665293",
"tpm": "1545308"
},
"6668A57A61AA03E203433363": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 12:28:58",
"activevu": "116",
"nopm": "661495",
"tpm": "1536360"
},
"6668A7CF61AA03E253833343": {
"db": "MariaDB",
"bm": "TPC-C",
"tstamp": "2024-06-11 12:38:55",
"activevu": "120",
"nopm": "655745",
"tpm": "1523566"
}
}

Performance Profiles Summary

With the new performance profile, functionality HammerDB enables you to group related TPROC-C jobs under an ID and use it to compare performance across benchmarks at different levels of utilisation and different configurations.

The profile ID chart will show performance for each individual profile.

And you can then quickly pinpoint specific jobs and drill down into their performance characteristics.

If you are not already building performance profiles manually, then this functionality makes it easier than ever to fully analyze your database performance.

 

 

HammerDB v4.10 New Features: Partitioning and Advanced Statistics for SQL Server TPROC-H

 

HammerDB v4.10 adds performance enhancements for the SQL Server TPROC-H workload in the form of partitioning for the orders and lineitem tables as well as the option to create advanced statistics.

Enabling Partitioning and Advanced Statistics

These can be selected with a checkbox in the GUI.

or the following options being set to true in the CLI.

mssqls_tpch_partition_orders_and_lineitems = true
mssqls_tpch_advanced_stats = true

Benefits of Partitioning and Advanced Statistics

When partitioning is selected both the order and Lineitem tables are partitioned with each partition spanning 1 week. By default t partition flag is set to false by default and must be enabled. Both Column Store and Row store are allowed to be partitioned. When partioning is enabled. Certain primary and foreign key definitions defined in HammerDB are no longer valid and will not be created. However HammerDB will still create those keys/constraints when the Orders and Lineitem tables are not partitioned.`

For columnstore indices, the is an initial degradation of performance, however if the user calls Alter Index … Reorganize or leaves the server running after the initial load for approximately 48 hours, the performance of power runs improves by 2X when compared to column store indices with no partitioning.

The advanced statistics option is also set to off by default, and when enabled it does increase the overall load time however it creates an option for users loading TPCH for MSSQL to perform a more advanced statistical analysis to benefit query performance.

Summary

HammerDB v4.10 introduces an option to partition the SQL Server TPROC-H schema and perform a more advanced statistical analysis, taking advantage of these options can benefit query performance.

 

 

 

 

HammerDB v4.10 New Features: Schema and Consistency Checks

The HammerDB TPROC-C and TPROC-H workloads are derived from the TPC-C and TPC-H workloads respectively.  Although the HammerDB workloads are not identical to TPC-C and TPC-H it is still important that the workloads implemented maintain the same data consistency as the official workloads. For this reason HammerDB has implemented the data consistency checks for TPC-C and TPC-H to be run against all databases.

The consistency checks are useful to be run after a schema build but also after a workload has been run to ensure that the data has remained consistent throughout all of the changes implemented by the workload.  As the consistency checks are the same ones stipulated by the TPC specification you are also welcome to run them against tests run against clustered and distributed environments and any workloads claimed to be derived from TPC workloads. The checks ensure that your transactions are fully and correctly processed by your database.

In addition to the consistency checks, HammerDB has also added a number of schema checks to ensure that the a schema build has completed successfully and the database is fully populated with the correct stored procedures.

Running the TPROC-C Schema and Consistency Check

To run the schema check, firstly build your schema and let it complete.

Then select the Check option from the main menu or treeview, accept the dialog prompt and let it run through the checks.  With the CLI run the checkschema command. Note that on some databases row counts may take longer than others and the test may take a number of minutes to complete.  The first part of the schema checks are particularly relevant after the build and it will check.

  1. Database Exists.
  2. Tables Exist.
  3. Warehouse count in schema is the same as the HammerDB configuration.
  4. Tables are indexed.
  5. Tables are populated.
  6. Stored Procedures Exist.

It will also run the consistency checks that we will cover after the workload has completed.

After the initial check, run the workload.

Let the workload run to completion.

You can then re-run the schema and consistency checks.

After you have run a workload the consistency checks are the most important aspect although the schema checks provide benefit especially in checking the row populations. With the consistency checks, HammerDB will create a temporary table with a sample of warehouse ids and then run the following checks.

  1. For each District within a Warehouse, the next available Order ID (d_next_o_id) minus one is equal to the most recent Order ID [max(o_id)] for the ORDER table associated with the preceding District and Warehouse. Additionally, that same relationship exists for the most recent Order ID [max(o_id)] for the NEW-ORDER table associated with the same District and Warehouse. Those relationships can be illustrated as: d_next_o_id – 1 = max(o_id) = max(no_o_id) where (d_w_id = o_w_id = no_w_id) and (d_id = o_d_id = no_d_id)
  2. For each District within a Warehouse, the value of the most recent Order ID [max(no_o_id)] minus the first Order ID [min(no_o_id)] plus one, for the NEW-ORDER table associated with the District and Warehouse, equals the number of rows in that NEW-ORDER table. That relationship can be illustrated as: max(no_o_id) – min(no_o_id) + 1 = rows in NEW-ORDER where (o_w_id = no_w_id) and (o_d_id = no_d_id)
  3. For each District within a Warehouse, the sum of Order-Line counts [sum(o_ol_cnt)] for the Orders associated with the District equals the number of rows in the ORDER-LINE table associated with the same District. That relationship can be illustrated as: sum(o_ol_cnt) = rows in the ORDER-LINE table for the Warehouse and District
  4. The sum of balances (d_ytd) for all Districts within a specific Warehouse is equal to the balance (w_ytd) of that Warehouse.

If completed successfully the check ensures that your data has remained consistent after the workload has completed.

Running the TPROC-H Schema and Consistency Check

The schema and consistency checks for the TPROC-H as expected follow a similar approach to TPROC-C and you can run the check after the build to ensure that the build completed successfully. If you only run the query aspect of the TPROC-H workload the data is unmodified and therefore there will not be benefit from running the schema and consistency checks again after you have verified the initial build. However the refresh function does modify the data and therefore if you run a refresh function then you can run the check to verify the consistency. In the example below we have used one virtual user to run the power test which runs the new sales order refresh, one query set and the old sales refresh.

After this test running the refresh function and query workload run the schema and consistency check to verify the refresh function has not impacted the data consistency.

 

The TPROC-H schema and consistency checks run the following checks:

  1. Database Exists.
  2. Tables Exist.
  3. Scale Factor in schema is the same as the HammerDB configuration.
  4. Tables are indexed.
  5. Tables are populated.
  6. Consistency Check.
    1. A consistent state for the TPC-H database is defined to exist when: O_TOTALPRICE = SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)(1+L_TAX) for each ORDER and LINEITEM defined by (O_ORDERKEY=L_ORDERKEY) and can be checked by: SELECT DECIMAL(SUM(DECIMAL(INTEGER(INTEGER(DECIMAL (INTEGER(100DECIMAL(L_EXTENDEDPRICE,20,3)),20,3)* (1-L_DISCOUNT)) * (1+L_TAX)),20,3)/100.0),20,3) FROM LINEITEM WHERE L_ORDERKEY = okey SELECT DECIMAL(SUM(O_TOTALPRICE, 20, 3)) from ORDERS WHERE O_ORDERKEY = okey

    Summary

    HammerDB v4.10 introduces schema and data consistency checks for all databases and workloads to enable you to verify the schema has been built correctly and that the data remains consistent before and after running workloads.

HammerDB v4.10 New Features: Purge and Write back for MariaDB TPROC-C

Many of the HammerDB TPROC-C workloads have included features to prevent the database doing maintenance tasks for the previous run whilst another run is taking place. This is particularly important when running automated workloads back-back to generate a performance profile for a progressively increasing number of virtual users.  An example of this is the “Checkpoint when complete” option for Oracle which will flush all dirty/modified data blocks in the in-memory buffer cache to disk and then switch the logfiles so this will not happen during a subsequent run with database writer activity impacting performance.

History List

With MySQL and MariaDB a key maintenance activity that can impact performance is purging which can be monitored with the history list length with a SQL statement such as follows.

select variable_value from information_schema.global_status where variable_name = 'INNODB_HISTORY_LIST_LENGTH'"

When a workload deletes data from MySQL or MariaDB the row is not deleted straight away and only when it is not needed for read consistency and the referring undo record for the operation is deleted. The history list length that can be queried is the number of undo log pages that contain changes.  If the history list length grows too large then a larger number of row versions can impact query performance, however purging to process the undo log pages can also itself impact performance requiring mutex locking.  For this reason we have a number of parameters, notably innodb_purge_threads, innodb_purge_batch_size, innodb_max_purge_lag and innodb_max_purge_lag_delay.  The number of purge threads and batch size determine how aggressively purging takes place and the purge lag and delay allow throttling of throughput of transactions if the history list grows too large to allow purging to catch up.

It should be clear that if you want to stress a MySQL or MariaDB databases then a workload that includes delete operations is essential to the overall picture and fortunately in the HammerDB workload the DELIVERY stored procedure includes a loop that processes a large number of delete operations and therefore is a great opportunity to test the effectiveness of your purge settings.  However until recently the purge parameters have only been able to be changed after a server restart and therefore we haven’t been able to accelerate a purge after a workload has completed.  Fortunately with MariaDB this change  Make number of purge threads variable dynamic as it suggests makes the configuration dynamic and we can take advantage of this to do a purge to clear the history list after a workload has completed so the purging for one test run does not unduly impact the following run.

In addition to the history list when adding this functionality we can also add similar functionality to write back the dirty buffers in the buffer pool at the same time.

Setting Purge and Write Back

To run the purge and write back after a MariaDB workload you need at least version 10.7.0 of MariaDB when the variables were made dynamic.  If you do not have this version HammerDB will report during a run that the settings cannot be made.  Otherwise if you have version 10.7.0 or above, in the GUI choose the Purge when complete checkbox in your driver settings.

Or in the CLI set the option maria_purge to true.

tpcc {
maria_count_ware = 30
...
maria_purge = true
}

With this setting enabled, run your MariaDB TPROC-C workload as normal.

Once the run has completed it will dynamically set the variables to run the purge and write back and restore your variables when complete.

Choosing optimal purge settings

Of course for this notebook based example the history list length does build up unduly over the course of the workload, however in a larger server environment with high transaction rates it is possible to see history list lengths such as follows depending on your purge settings.

"TEST RESULT : System achieved 916487 NOPM from 2128748 MariaDB TPM",
"Starting purge: history list length 13107078",
"Starting write back: dirty buffer pages 2289677",

Typically for optimal performance during a HammerDB run you want purge_threads and batch_size to be at the default settings as follows innodb_purge_threads =4 , innodb_purge_batch_size = 1000, and then set max_purge_lag and purge lag_delay to low values so we minimize delays during the test, such as innodb_max_purge_lag=0 and innodb_max_purge_lag_delay=1.  As described previously, even in this large server environment HammerDB will modify these settings to accelerate the purge and write back after the run has completed and then restore your settings when it has finished ready for the next workload giving us the best possible results each time.

Summary

Now that MariaDB from version 10.7.0 allows us to dynamically set the purge settings we can take advantage of this to complete the purge after a run has completed before the next one starts so that our test runs are entirely repeatable without being impacted by purging.

Thank-you to Marko Mäkelä of MariaDB for guidance on exactly which settings were needed for purge and write back to make this new HammerDB feature possible.

Deploy HammerDB as a Web Application with CloudTK

From HammerDB v4.9 it is possible to deploy HammerDB through a web browser, giving you access to a fully functional HammerDB GUI in Cloud environments.

The quickest way to run HammerDB with CloudTk is through Docker and in this post will we demonstrate how to pull the HammerDB CloudTK Docker image and run it to start running HammerDB through your chosen browser.

Firstly, pull the Docker image:

$ sudo docker pull tpcorg/hammerdb:latest-cloudtk
latest-cloudtk: Pulling from tpcorg/hammerdb
96d54c3075c9: Pull complete
96261b8fec13: Pull complete
d2c2e5928320: Pull complete
4f4fb700ef54: Pull complete
0dcebfea4cd0: Pull complete
4fea0d72408b: Pull complete
4d35a83249fa: Pull complete
a20f97744e61: Pull complete
181fc9db3bb2: Pull complete
442899704ccd: Pull complete
Digest: sha256:a73903ba354f9a7f826a69e2e1f3f8b3523220927323d0ba913efe168917ec02
Status: Downloaded newer image for tpcorg/hammerdb:latest-cloudtk
docker.io/tpcorg/hammerdb:latest-cloudtk

And then run it. If you want access to functionality such as CPU metrics then you should use the –net-host option.

$ sudo docker run --net=host --name hammerdb-cloudtk tpcorg/hammerdb:latest-cloudtk
can't find package limit
Running with default file descriptor limit
/debug user "debug" password "sycu4xfc.gcw"
httpd started on port 8081
secure httpd started on SSL port 8082

Otherwise, you can only expose the ports as needed, as follows.

  docker run -p 8081:8081 -p 8082:8082 -p 8080:8080 --name hammerdb-cloudtk hammerdb:cloudtk

Once running, you can connect to the port 8081 for an unencrypted connection. Once connected, click on the launcher link.

If preferred, you can connect to an encrypted connection on port 8082. By default, sample self-signed certificates are installed, and these should be replaced by your own certificates.

Whichever connection method is used, click Submit Query on the HammerDB-CloudTK application.

HammerDB will then run fully contained within the browser.

You can change viewing options with the settings on the left of the screen.

You can now use HammerDB exactly as you would do with the GUI displayed on a local display.

The GUI can also be used for functionality to monitor your databases such as the PostgreSQL metrics.

The drag and drop functionality of these windows can be activated by dragging the tab to the title bar of the main HammerDB window.

It is also possible to start the HammerDB Web service from within the container by using the Jobs Options dialog to view the statistics from the jobs you have run.

Confirm that the web service is running.

and then connect to the web service on port 8080 using another tab in your browser. You can then view the charts related to the workloads you have run.

Finally, when you have finished with your HammerDB session use the tab on the left to disconnect.

HammerDB with CloudTk provides you with the option to deploy the HammerDB GUI as a web application. In this post we have demonstrated the easiest way to do this with Docker, however you can also deploy CloudTK within any HammerDB for Linux installation as described in the readme on the HammerDB CloudTK repository.

 

HammerDB v4.9 New Feature: Accelerate SQL Server builds with BCP

This post provides an introduction to the new feature added to HammerDB v4.9 by @krithikasatish  and @JoshInnis to provide accelerated load performance for both SQL Server TPROC-C and TPROC-H schemas.

This accelerated load is implemented using the BCP utility and is turned on or off with the “Use BCP Option” checkbox. With the default for v4.9 to have this feature enabled.

Insert based load

In previous releases, HammerDB loads data using inserts, (or supported a manual use of BCP by generating flat files with the datagen option) and you can still perform schema loads with inserts by deselecting the Use BCP Option.

This earlier functionality has the advantage that there are no intermediate staging files required, and data is inserted into the database with multi-row inserts as soon as it is created. The disadvantage of this approach is that the database sees the insert as a regular database insert, maintaining consistency and recoverability, and with a round-trip to the HammerDB client per multi-row, this is what we see in recent expensive queries.

In this example on a development PC the regular build started at 10:48:35 and ended at 10:54:46 meaning it took 6 minutes 11 seconds to build our 20 warehouse schema.

BCP based load

Now when we select the  “Use BCP Option”

We can see that we are now using an insert bulk command (although the item table being fixed at 100K rows and loaded by the monitor virtual user continues to use regular inserts).

In our PC test the build started at 11:28:16 and ended at 11:30:27 meaning it took 2 minutes 11 seconds to build our 20 warehouse schema.

Using BCP our TPROC-C build now completed almost 3X faster and tests on TPROC-H showed similar results.

Note that for the BCP functionality, temporary data files will be created and deleted in the TMP environment variable area, and you can find this location by running the command

(HammerDB-4.9) % puts $::env(TMP)
C:\Users\Hammer\AppData\Local\Temp

in the HammerDB console, and then see the files being created and deleted as the build progresses.

Summary

The new Use BCP Option for SQL Server accelerates both TPROC-C and TPROC-H schema loads, so you can get to running your benchmarks faster. We thank @krithikasatish and @JoshInnis for this excellent contribution to HammerDB.

 

 

 

 

 

 

 

 

 

 

Why you should benchmark your database using stored procedures

HammerDB uses stored procedures to achieve maximum throughput when benchmarking your database.  HammerDB has always used stored procedures as a design decision because the original benchmark was implemented as close as possible to the example workload in the TPC-C specification that uses stored procedures. Additionally, reviewing official TPC-C full disclosure reports highlighted that all vendors also use stored procedures.

However, there can be a lack of understanding of the benefits that stored procedures bring or if you have a benchmarking tool or database that doesn’t support stored procedures, then you have nothing to compare against.

This blog post introduces the new “No stored procedures” option for MariaDB and MySQL introduced with HammerDB v4.9 and explains how to measure the difference between running with and without stored procedures.

What is a stored procedure?

A stored procedure as it sounds is a procedure stored inside your database that you call with parameters. The business logic is implemented inside the routine and it returns a result. As an example from the TPC-C specification, this is the Stock Level procedure.

int slev() 
{ 
EXEC SQL WHEN EVER NOT FOUND GOTO sqlerr;
EXEC SQL WHEN EVER SQLERROR GOTO sqlerr; 
EXEC SQL SELECT d _next_o_id IN TO :o_id FROM district WHERE d_w_id =:w _id AND d_id = :d_id; 
EXEC SQL SELECT COUNT(DISTINCT(s_i_id )) INTO :stock_count 
FROM order_line, stock 
WHERE ol_w _id =:w _id AND
ol_d_id =:d_id AND ol_o_id <:o_id AND
ol_o_id >=:o_id -20 AND
s_w_id =:w_id AND
s_i_id =ol_i_id AND s_quantity < :threshold; 
EXEC SQL COMMIT WORK; 
return(0); 
sqlerr: 
error();
}

and in MySQL, MariaDB we can implement this procedure as follows. So we call the stored procedure and pass a warehouse id, district id and threshold and receive a stock_count as a result.

Also, note in passing that we include a COMMIT statement in this procedure because one is included in the example code in TPC-C specification.

CREATE PROCEDURE `SLEV` (
st_w_id INTEGER,
st_d_id INTEGER,
threshold INTEGER,
OUT stock_count INTEGER
)
BEGIN 
DECLARE st_o_id INTEGER;
DECLARE `Constraint Violation` CONDITION FOR SQLSTATE '23000';
DECLARE EXIT HANDLER FOR `Constraint Violation` ROLLBACK;
DECLARE EXIT HANDLER FOR NOT FOUND ROLLBACK;
START TRANSACTION;
SELECT d_next_o_id INTO st_o_id
FROM district
WHERE d_w_id=st_w_id AND d_id=st_d_id;
SELECT COUNT(DISTINCT (s_i_id)) INTO stock_count
FROM order_line, stock
WHERE ol_w_id = st_w_id AND
ol_d_id = st_d_id AND (ol_o_id < st_o_id) AND
ol_o_id >= (st_o_id - 20) AND s_w_id = st_w_id AND
s_i_id = ol_i_id AND s_quantity < threshold;
COMMIT;
END

Next we can run the same SQL directly without a stored procedure.

proc slev { maria_handler w_id stock_level_d_id prepare RAISEERROR } {

global mariastatus
set threshold [ RandomNumber 10 20 ]
mariaexec $maria_handler "start transaction"
set d_next_o_id [ list [ maria::sel $maria_handler "SELECT d_next_o_id FROM district WHERE d_w_id=$w_id AND d_id=$stock_level_d_id" -list ]]
set stock_count [ list [ maria::sel $maria_handler "SELECT COUNT(DISTINCT (s_i_id)) FROM order_line, stock WHERE ol_w_id = $w_id AND ol_d_id = $stock_level_d_id AND (ol_o_id < $d_next_o_id) AND ol_o_id >= ($d_next_o_id - 20) AND s_w_id = $w_id AND s_i_id = ol_i_id AND s_quantity < $threshold" -list ]]
maria::commit $maria_handler
}

In this example, when we run the client SQL version d_next_o_id is 3001
stock count is 6

SELECT d_next_o_id FROM district WHERE d_w_id=1 AND d_id=10
SELECT COUNT(DISTINCT (s_i_id)) FROM order_line, stock WHERE ol_w_id = 1
AND ol_d_id = 10 AND (ol_o_id < 3001) AND ol_o_id >= (3001 - 20)
AND s_w_id = 1 AND s_i_id = ol_i_id AND s_quantity < 11

and when we call the stored procedure, we get the same result for the stock count.

call slev(1,10,11,@stock_count);
Query OK, 2 rows
select @stock_count;
> +--------------+
> | @stock_count |
> +--------------+
> | 6            |
> +--------------+
> 1 row in set (0.000 sec)

Although we get the same result and run exactly the same SQL, there are notable differences in how we get there. Firstly, the stored procedure is compiled on the database but also the slev procedure on the client is compiled into bytecode. Also using client SQL we do more parsing of SQL statements and are using literal values, although we could also use prepared statements in this scenario. However, the major difference is that whereas in the client SQL approach we make 4 round trips between the client and database to receive our stock count in the stored procedure we are making 2, one to call the stored procedure with the input parameters and one to select the variable where the output is stored.  With a simple example such as this, it would not necessarily be expected for the additional network traffic to be significant between the 2 approaches. However, with more complex application logic this network round trip soon becomes a key focus area for improving performance.

Setting the HammerDB No Stored Procedures Option

The no stored procedures option is a driver script option, and you should create the schema as normal with stored procedures.  Then when selecting the driver options to use client SQL only, you select the No Stored Procedures checkbox.

In the CLI you set maria_no_stored_procs or mysql_no_stored_procs to use a client SQL driver script.

Stored Procedures and Client SQL comparison

To test the stored procedures and client implementations, we ran both workloads against a system equipped with Intel Xeon 8280L. The data shows a scripted automated workload running a number of back to back tests each time with an increasing number of virtual users.

On MySQL, we saw a 1.5X performance advantage in favour of stored procedures and on MariaDB a 1.3X performance advantage.

Note that for all tests, we used the local loopback address and port to provide the lowest possible network latency between client and server and also so we don’t have any concerns about bandwidth limitations.

# iperf -c 127.0.0.1

------------------------------------------------------------

Client connecting to 127.0.0.1, TCP port 5001
TCP window size: 2.50 MByte (default)
------------------------------------------------------------
[  3] local 127.0.0.1 port 19230 connected with 127.0.0.1 port 5001
[  4] local 127.0.0.1 port 5001 connected with 127.0.0.1 port 19230
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  37.3 GBytes  32.1 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  37.3 GBytes  32.1 Gbits/sec

We also test both local port and socket connections and found that a local port provided the best comparative data for both MySQL and MariaDB.

Database information and performance schemas

The best approach for database performance analysis is to take a top-down approach. Use the performance metrics available in the database first before looking at data further down in the stack.

Using MariaDB and analysing performance at a workload of 80 Virtual Users the first place we can look at is the information schema user_statistics to quantify the difference in the database traffic. In this example, we can capture the bytes received and sent for the workload without stored procedures

mysql> select * from information_schema.user_Statistics where user='root'\G;
*************************** 1. row ***************************
          USER: root
          TOTAL_CONNECTIONS: 83
          BYTES_RECEIVED: 22847015761
          BYTES_SENT: 50668052468
...

And also for the workload with stored procedures

mysql> select * from information_schema.user_Statistics where user='root'\G;
*************************** 1. row ***************************
          USER: root
          TOTAL_CONNECTIONS: 83
          BYTES_RECEIVED: 3548506639
          BYTES_SENT: 6335812312
...

With this first step, we have identified a difference in that without stored procedures we sent 8X and received 6.4X the amount of data from the server to the client to achieve the same result.  With the workload running for 2 minutes rampup the throughput for the workload without stored procedures is equivalent to BYTES_RECEIVED:  54 MB/s and BYTES_SENT: 120 MB/s.  This in itself is not an issue as it means we are well within the systems bandwidth capabilities. It also makes sense that the database sends more data than it receives as it receives SQL queries and returns results, for the client it will be the opposite receving more data than is sent.

Therefore, the most important question is not so much how much data was sent, but how long it took and where wait time was spent.  At the highest level we can look at an event called io/socket/sql/client_connection to see how much time measured in pico seconds was spent in network related events, firstly without stored procedures

mysql> select * from events_waits_summary_global_by_event_name where event_name like '%client_connection%';
+--------------------------------------+------------+------------------+----------------+----------------+----------------+
| EVENT_NAME                           | COUNT_STAR | SUM_TIMER_WAIT   | MIN_TIMER_WAIT | AVG_TIMER_WAIT | MAX_TIMER_WAIT |
+--------------------------------------+------------+------------------+----------------+----------------+----------------+
| wait/io/socket/sql/client_connection |  350769863 | 2350211822039704 |              0 |        6699889 |     8921327926 |
+--------------------------------------+------------+------------------+----------------+----------------+----------------+
1 row in set (0.01 sec)

and secondly with stored procedures

mysql> select * from events_waits_summary_global_by_event_name where event_name like '%client_connection%';
+--------------------------------------+------------+-----------------+----------------+----------------+----------------+
| EVENT_NAME                           | COUNT_STAR | SUM_TIMER_WAIT  | MIN_TIMER_WAIT | AVG_TIMER_WAIT | MAX_TIMER_WAIT |
+--------------------------------------+------------+-----------------+----------------+----------------+----------------+
| wait/io/socket/sql/client_connection |   43053845 | 310781514128952 |              0 |        7218288 |     7889980872 |
+--------------------------------------+------------+-----------------+----------------+----------------+----------------+
1 row in set (0.01 sec)

Now we can observe that when we used client SQL we spent 7.6X the time waiting for the network compared to when we used stored procedures and it is not so much the time of each event but the cumulative difference.

Note that in some places such as cloud providers the wait event io/socket/sql/client_connection is documented as being related to thread creation, however it can be observed this metric increments when SQL statements are being run against the database rather than when clients are connecting. We can also drill down into the socket_summary_by_instance table to look further at the io/socket/sql/client_connection event, again firstly without stored procedures

*************************** 3. row ***************************
               EVENT_NAME: wait/io/socket/sql/client_connection
               COUNT_STAR: 356410019
           SUM_TIMER_WAIT: 2415153960635112
           MIN_TIMER_WAIT: 0
           AVG_TIMER_WAIT: 6775980
           MAX_TIMER_WAIT: 9975342936
               COUNT_READ: 178204845
           SUM_TIMER_READ: 133364264753112
           MIN_TIMER_READ: 0
           AVG_TIMER_READ: 748092
           MAX_TIMER_READ: 8409523512
 SUM_NUMBER_OF_BYTES_READ: 20950542108
              COUNT_WRITE: 178204678
          SUM_TIMER_WRITE: 2281784113760856
          MIN_TIMER_WRITE: 0
          AVG_TIMER_WRITE: 12804240
          MAX_TIMER_WRITE: 9975342936
SUM_NUMBER_OF_BYTES_WRITE: 49197673794
               COUNT_MISC: 496
           SUM_TIMER_MISC: 5582121144
           MIN_TIMER_MISC: 0
           AVG_TIMER_MISC: 11254116
           MAX_TIMER_MISC: 67299264

and secondly with:

*************************** 3. row ***************************
               EVENT_NAME: wait/io/socket/sql/client_connection
               COUNT_STAR: 43029616
           SUM_TIMER_WAIT: 308592755260896
           MIN_TIMER_WAIT: 0
           AVG_TIMER_WAIT: 7171416
           MAX_TIMER_WAIT: 6103182288
               COUNT_READ: 21514643
           SUM_TIMER_READ: 15175699757784
           MIN_TIMER_READ: 0
           AVG_TIMER_READ: 705312
           MAX_TIMER_READ: 5876657352
 SUM_NUMBER_OF_BYTES_READ: 3219606535
              COUNT_WRITE: 21514479
          SUM_TIMER_WRITE: 293411103781368
          MIN_TIMER_WRITE: 0
          AVG_TIMER_WRITE: 13637520
          MAX_TIMER_WRITE: 6103182288
SUM_NUMBER_OF_BYTES_WRITE: 6082914012
               COUNT_MISC: 494
           SUM_TIMER_MISC: 5951721744
           MIN_TIMER_MISC: 0
           AVG_TIMER_MISC: 12047964
           MAX_TIMER_MISC: 87785304
3 rows in set (0.01 sec)

We observe that io/socket/sql/client_connection does indeed provide us the same data as the events_waits_summary_global_by_event_name table but also the documentation tells us that the columns
COUNT_READ, SUM_TIMER_READ, MIN_TIMER_READ, AVG_TIMER_READ, MAX_TIMER_READ, SUM_NUMBER_OF_BYTES_READ aggregate all receive operations (RECV, RECVFROM, and RECVMSG) and COUNT_WRITE, SUM_TIMER_WRITE, MIN_TIMER_WRITE, AVG_TIMER_WRITE, MAX_TIMER_WRITE, SUM_NUMBER_OF_BYTES_WRITE columns aggregate all send operations (SEND, SENDTO, and SENDMSG). So now we can relate the database statistics to what we can observe on the system and see that without stored procedures we spend 8.8X the time on read operations (RECV, RECVFROM, and RECVMSG) and 7.8X the time on writes (SEND, SENDTO, and SENDMSG) meaning without stored procedures we are spending a lot more time on the round trip between client and server and can use io/socket/sql/client_connection to quantify the difference.

To put it differently at least some of the time spent in the network accounts for the lower database throughput when we are not using stored procedures.

We could also expect that where operations are starting a transaction and then taking locks, if we are spending more time in the network then we could also be spending more time in lock waits. In this example measuring without stored procedures

mysql> select object_name, sum_timer_wait from table_lock_waits_summary_by_table where object_name = 'warehouse';
+-------------+----------------+
| object_name | sum_timer_wait |
+-------------+----------------+
| warehouse   |  3890167468704 |
+-------------+----------------+
1 row in set (0.00 sec)

mysql> select object_name, sum_timer_wait from table_lock_waits_summary_by_table where object_name = 'district';
+-------------+----------------+
| object_name | sum_timer_wait |
+-------------+----------------+
| district    |  5874029749344 |
+-------------+----------------+
1 row in set (0.00 sec)

compared to the same workload with stored procedures

mysql> select object_name, sum_timer_wait from table_lock_waits_summary_by_table where object_name = 'warehouse';
+-------------+----------------+
| object_name | sum_timer_wait |
+-------------+----------------+
| warehouse   |  4039611353088 |
+-------------+----------------+
1 row in set (0.00 sec)
mysql> select object_name, sum_timer_wait from table_lock_waits_summary_by_table where object_name = 'district';
+-------------+----------------+
| object_name | sum_timer_wait |
+-------------+----------------+
| district    |  5383666516296 |
+-------------+----------------+
1 row in set (0.00 sec)

The amount of time in lock waits goes up on the warehouse table but down on the district table as with the district table we have a SELECT FOR UPDATE statement that now includes a round trip between the SELECT and UPDATE.

To visualize the overall difference we can generate a flame graph, again firstly without stored procedures

and secondly with

When you generate your own flame graphs you will have an interactive SVG file allowing you to drill down into both client and server functions, However the difference should immediately be clear at first glance.

The width of the bars show the time spent in each function and at the base as expected we are spending time in mariadbd, the tclsh8.6 client (i.e. HammerDB) and idle.

With stored procedures in the database and the HammerDB client the time spent in the network and even in the HammerDB client at all is minimal compared to the time spent in the database. This is intentional as the application logic is in a stored procedure within the database so we can see the time spent in do_execute_sp and the functions it calls, in particular mysql_execute_command are exactly the same as when we are not using stored procedures, i.e. the SQL is the same, but we are calling it faster.

When we run without stored procedures we can visualise what we saw from io/socket/sql/client_connection in that we are now spending a lot more time in send and recv functions in both the client and the server.  note that within HammerDB this additional time is spend within the MariaDB client library.

We can also see additional time in MYSQLParse when we are not using stored procedures because we are sending SQL statements with literal values that may be different each time and therefore parsed each time, whereas stored procedures use parameters meaning the identical SQL can be reused.

System Performance with LinuxKI

From the previous analysis we can observe that the client and database is spending a lot more time in the network without stored procedures, however the previous timings only showed us active times. For a more complete picture each Virtual User is a parallel thread that matches a client connection meaning that we will now have both active but also idle time when both client and server are waiting to receive data.

mysql> select * from socket_instances;
+----------------------------------------+-----------------------+-----------+-----------+-----------+-------+--------+
| EVENT_NAME                             | OBJECT_INSTANCE_BEGIN | THREAD_ID | SOCKET_ID | IP        | PORT  | STATE  |
+----------------------------------------+-----------------------+-----------+-----------+-----------+-------+--------+
| wait/io/socket/sql/server_tcpip_socket |        94367427331136 |         1 |        15 | ::1       |  3306 | ACTIVE |
| wait/io/socket/sql/server_tcpip_socket |        94367427331456 |         1 |        16 | 127.0.0.1 |  3306 | ACTIVE |
| wait/io/socket/sql/server_unix_socket  |        94367427331776 |         1 |        17 |           |     0 | ACTIVE |
| wait/io/socket/sql/client_connection   |        94367427332096 |         9 |        38 |           |     0 | ACTIVE |
| wait/io/socket/sql/client_connection   |        94367427384896 |      2281 |        39 | 127.0.0.1 | 58684 | IDLE   |
| wait/io/socket/sql/client_connection   |        94367427385216 |      2282 |        40 | 127.0.0.1 | 58690 | IDLE   |
| wait/io/socket/sql/client_connection   |        94367427385536 |      2283 |        41 | 127.0.0.1 | 58696 | ACTIVE |
...
| wait/io/socket/sql/client_connection   |        94367427410176 |      2360 |       144 | 127.0.0.1 | 50442 | ACTIVE |
| wait/io/socket/sql/client_connection   |        94367427410496 |      2361 |       145 | 127.0.0.1 | 50444 | ACTIVE |
| wait/io/socket/sql/client_connection   |        94367427410816 |      2362 |       146 | 127.0.0.1 | 50458 | IDLE   |
+----------------------------------------+-----------------------+-----------+-----------+-----------+-------+--------+
86 rows in set (0.00 sec)

When using client SQL we are going to spend more time in the network as well as more time idle waiting to send and receive and more time for the threads to context switch between states.

To illustrate this we will  use the LinuxKI Toolset which is designed to answer 2 key questions, namely, If it’s running, what’s it doing? and If it’s waiting, what’s it waiting on? To do this we run LinuxKI for 20 seconds while running the workload with and without stored procedures and view the LinuxKI report.

When we observe the global CPU usage one clear difference is the increase in softirqs or software interrupts.(If we were running over a network using the network card we would also see an increase in hardware interrupts).

In the LinuxKI report we can see the difference without stored procedures

and with stored procedures

so without stored procedures we are spending 6.8X the time in the NET_RX Soft IRQ utilising 5% of the available CPU where the main function is net_rx_action which processes the data flow. And in both client and server you can see the data being sent

received

and  read (in both the MariaDB client and server)

Note the function in both client and server of poll_schedule_timeout as this is what we see in the trace events of the top processes calling sleep, so we are seeing timeouts and tasks being descheduled while we are polling inside MariaDB waiting for data and then woken up when the data is ready.

and for contrast with stored procedures where we now see futex for MariaDB mutex locking as we driving a greater level of database throughput and futex_wait_queue_me as we are waiting for mutex acquisition.

and if our MariaDB threads are polling then timing out we are going to be descheduled and doing more context switches, which we can observe on a system wide basis with vmstat or in this example on a per thread basis with pidstat. Capturing this data without stored procedures.

pidstat -wt 3 10
04:01:02 AM  1001         -   3452701  12931.33      7.33  |__mariadbd
04:01:02 AM  1001         -   3452702  13691.00     13.33  |__mariadbd
04:01:02 AM  1001         -   3452703  11562.67     13.67  |__mariadbd
04:01:02 AM  1001         -   3452705  13753.67     14.33  |__mariadbd
04:01:02 AM  1001         -   3452716  12376.00     11.33  |__mariadbd
04:01:02 AM  1001         -   3452717  13874.00     10.00  |__mariadbd
04:01:02 AM  1001         -   3452720  11129.00      9.67  |__mariadbd
04:01:02 AM  1001         -   3452723  13543.67      9.33  |__mariadbd
04:01:02 AM  1001         -   3452726  11826.67     12.00  |__mariadbd
04:01:02 AM  1001         -   3452769   3810.67      3.00  |__mariadbd
04:01:02 AM  1001         -   3452770  12672.00      9.67  |__mariadbd
04:01:02 AM  1001         -   3452773  13539.00     11.67  |__mariadbd
04:01:02 AM  1001         -   3452774  11724.33     11.33  |__mariadbd
04:01:02 AM  1001         -   3452777  12707.67     12.00  |__mariadbd
...
04:01:02 AM     0         -   3456145   8858.33     11.33  |__tclsh8.6
04:01:02 AM     0         -   3456146   9661.00      8.33  |__tclsh8.6
04:01:02 AM     0         -   3456147   8439.67     10.00  |__tclsh8.6
04:01:02 AM     0         -   3456148   8969.33      9.00  |__tclsh8.6
04:01:02 AM     0         -   3456149   8453.00      8.00  |__tclsh8.6
04:01:02 AM     0         -   3456150   7752.67      8.00  |__tclsh8.6
04:01:02 AM     0         -   3456151   6378.33      8.67  |__tclsh8.6
04:01:02 AM     0         -   3456152   4978.00      6.33  |__tclsh8.6
04:01:02 AM     0         -   3456153   4054.67      1.67  |__tclsh8.6
04:01:02 AM     0         -   3456154   2726.67      3.00  |__tclsh8.6

and with stored procedures.

pidstat -wt 3 10
04:07:12 AM  1001         -   3452701   8474.67      7.00  |__mariadbd
04:07:12 AM  1001         -   3452702   8464.33      6.67  |__mariadbd
04:07:12 AM  1001         -   3452703   8403.00      6.33  |__mariadbd
04:07:12 AM  1001         -   3452704   8339.33      5.33  |__mariadbd
04:07:12 AM  1001         -   3452705   8712.00      6.00  |__mariadbd
04:07:12 AM  1001         -   3452706   8656.00      3.33  |__mariadbd
04:07:12 AM  1001         -   3452707   8671.67      5.67  |__mariadbd
04:07:12 AM  1001         -   3452708   8585.67      9.00  |__mariadbd
04:07:12 AM  1001         -   3452716   8361.33      6.67  |__mariadbd
04:07:12 AM  1001         -   3452717   8541.67      6.00  |__mariadbd
04:07:12 AM  1001         -   3452718   8664.33      8.00  |__mariadbd
04:07:12 AM  1001         -   3452719   8574.00      6.33  |__mariadbd
04:07:12 AM  1001         -   3452720   8748.00      3.33  |__mariadbd
...
04:07:12 AM     0         -   3457711    645.33      9.33  |__tclsh8.6
04:07:12 AM     0         -   3457712    646.00     15.00  |__tclsh8.6
04:07:12 AM     0         -   3457713    674.33     15.33  |__tclsh8.6
04:07:12 AM     0         -   3457714    649.33     14.00  |__tclsh8.6
04:07:12 AM     0         -   3457715    652.67     14.00  |__tclsh8.6
04:07:12 AM     0         -   3457716    687.67     15.67  |__tclsh8.6
04:07:12 AM     0         -   3457717    634.67     14.33  |__tclsh8.6
04:07:12 AM     0         -   3457718    638.33     11.67  |__tclsh8.6
04:07:12 AM     0         -   3457719    654.00     16.00  |__tclsh8.6
04:07:12 AM     0         -   3457720    703.33     15.33  |__tclsh8.6
04:07:12 AM     0         -   3457721    669.67      7.67  |__tclsh8.6
04:07:12 AM     0         -   3457722    656.33     12.00  |__tclsh8.6
04:07:12 AM     0         -   3457723    681.33     13.00  |__tclsh8.6

We can see that our HammerDB is now doing almost 14X the number of context switches in our MariaDB client (tclsh8.6) without stored procedures.   This is also going to cause run queue latency to go up as our threads are spending more time being switched off the CPU and back on again.  As a thread can also be scheduled on another CPU we also see the number of migrations between CPUs increase as well as the Node migrations between different NUMA nodes. Again without stored procedures

and to contrast, with stored procedures

and thread migration can introduce a delay in the NET_RX soft irq performance.

Although the LinuxKI report has a wealth more information we can relate this back to the database statistics by observing the sendto/recvfrom activity on the MariaDB ports, without stored procedures.

For  example the data sent to port 3306 matches the BYTES_RECEIVED:  54 MB/s from the database statistics and the data received from port 3306 matches the BYTES_SENT: 120 MB/s. The other ports shows the HammerDB Virtual Users.  And the same data with stored procedures.

LinuxKI enables us to have a system wide view of database performance.

Summary

The key question in this post is why you should benchmark your database using stored procedures and even though the example uses the local loopback address, when we are running the same workload without stored procedures we can observe how much more time we spend in the network and context switches rather than in the database. When we use stored procedures we spend as much time as possible in the database driving higher levels of throughput meaning we are better able to observe the performance capabilities of the database and system under test.

Nevertheless,  in some cases testing without stored procedures may be useful particularly if we are observing the impact of changes at the network layer such as encryption or compression and the HammerDB v4.9 no stored procedure option gives you the flexibility to benchmark such a configuration.

However if you are looking to benchmark a database with client SQL you should account for how much time is spent in each layer, especially if you are running the test over an external network and bring hardware interrupts into play.