HammerDB v4.6 New Features Pt2: Jobs Interface

Feedback from the Community raised the Issue Adding the enhancement for storing and retrieval of HammerDB results and configurations #352 that although HammerDB prints result output interactively and to log files, it would be preferred to have these results stored in a format that could be browsed at a later point. In particular, after a running a test it would be ideal to have a repository where we could verify the configuration of the workload that was run, the results and any timing or transaction count data generated to bring all the log output into a central location. HammerDB v4.6 does this with the “jobs” command.

Jobs Configuration and Storage

The base configuration for the jobs storage can be found in the generic.xml file in the commandline section.

<commandline>
<sqlite_db>TMP</sqlite_db> 
<jobsoutput>JSON</jobsoutput>
<jobs_disable>0</jobs_disable>
</commandline>

The default storage location is set to “TMP” where HammerDB will firstly check to see if a TMP or TEMP environment variable has been set and if not find a default temp location and either create a SQLite database called hammer.DB if one does not already exist in this location or open the existing one.  For example on Windows an example of opening a new database.

HammerDB CLI v4.6
Copyright (C) 2003-2022 Steve Shaw
Type "help" for a list of commands
Initialized new SQLite on-disk database C:/Users/Hdb/AppData/Local/Temp/hammer.DB
hammerdb>

and on Linux for opening a new database where we have set the TMP environment variable to a new directory called TMP under the HammerDB-4.6 directory.

HammerDB CLI v4.6
Copyright (C) 2003-2022 Steve Shaw
Type "help" for a list of commands
Initialized SQLite on-disk database /home/hdb/HammerDB-4.6/TMP/hammer.DB using existing tables (36,864 KB)
hammerdb>

If for any reason you wish to completely refresh all jobs data it is safe to remove the hammer.DB file and it will be recreated on the next restart. The options for jobs output and disabling jobs can be set at runtime.

An additional option is given in the timeprofile section called xt_job_storage, where if using the xt time profiler it will enable/disable the storage of the time profile data.

<timeprofile>
<profiler>xtprof</profiler>
...
<xt_job_storage>1</xt_job_storage>
</timeprofile>

By default, the jobs storage is disabled in the GUI for v4.6, however advanced users can fully enable it by modifying the following sections in the jobs and xtprof modules respectively with the output from the GUI queryable from the CLI at this release.

proc init_job_tables_gui { } {
#In the GUI, we disable the jobs output even though it works by running the jobs command in the console
 rename jobs {}
 uplevel #0 {proc hdbjobs { args } { return "" }}
#If we want to enable jobs output in the GUI comment out previous 2 lines and uncomment the following line
 #init_job_tables
}

To enable the time profiling in the GUI in the xtprof module update the following:

#If running in the GUI do not try to store output in SQLite
if { [ tsv::exists commandline sqldb ] eq 0 } {
set xtjob_storage 0
}

Disabling Jobs

If the storage of job related data is not desired, then it is possible to disable/enable this feature with the jobs disable command.

hammerdb>jobs disable 1
Disabling jobs repository, restart HammerDB to take effect

hammerdb>jobs disable 0
Enabling jobs repository, restart HammerDB to take effect


If disabled, the SQLite repository database is not opened, and it is not possible to query any jobs related data. Doing so will prompt on how to re-enable the feature. When enabled, the repository will be opened and it will be possible to query the data previously stored.

$ ./hammerdbcli
HammerDB CLI v4.6
Copyright (C) 2003-2022 Steve Shaw
Type "help" for a list of commands
hammerdb>jobs
Error: Jobs Disabled: enable with command "jobs disable 0" and restart HammerDB

$ ./hammerdbcli
HammerDB CLI v4.6
Copyright (C) 2003-2022 Steve Shaw
Type "help" for a list of commands
Initialized SQLite on-disk database /home/steve/HammerDB-4.6/TMP/hammer.DB using existing tables (36,864 KB)
hammerdb>

Creating Jobs and Formatting Output

 When enabled, a job will be created with a unique id whenever a workload is run. This happens for both a schema build and running a test.

For example, we will run the following script to build a TPROC-C schema

puts "SETTING CONFIGURATION"
dbset db maria
dbset bm TPC-C

diset connection maria_host localhost
diset connection maria_port 3306
diset connection maria_socket /tmp/mariadb.sock

set vu [ numberOfCPUs ]
set warehouse [ expr {$vu * 5} ]
diset tpcc maria_count_ware $warehouse
diset tpcc maria_num_vu $vu
diset tpcc maria_user root
diset tpcc maria_pass maria
diset tpcc maria_dbase tpcc
diset tpcc maria_storage_engine innodb
if { $warehouse >= 200 } {
diset tpcc maria_partition true
    } else {
diset tpcc maria_partition false
    }
puts "SCHEMA BUILD STARTED"
buildschema
puts "SCHEMA BUILD COMPLETED"

and the following to run a test

puts "SETTING CONFIGURATION"
dbset db maria
dbset bm TPC-C

diset connection maria_host localhost
diset connection maria_port 3306
diset connection maria_socket /tmp/mariadb.sock

diset tpcc maria_user root
diset tpcc maria_pass maria
diset tpcc maria_dbase tpcc
diset tpcc maria_driver timed
diset tpcc maria_rampup 2
diset tpcc maria_duration 5
diset tpcc maria_allwarehouse true
diset tpcc maria_timeprofile true

loadscript
puts "TEST STARTED"
vuset vu vcpu
vucreate
tcstart
tcstatus
set jobid [ vurun ]
vudestroy
tcstop
puts "TEST COMPLETE"

After running the scripts using the jobs command we can now see that we have created 2 jobs. One for building the schema and one for running the test as expected.

hammerdb>jobs
[
  "6388A0385EEC03E263531353",
  "6388A1585EEC03E243839333"
]


For querying a build, the status command will show whether all Virtual Users started and finished successfully, allowing a quick way to check the status of a build. 

hammerdb>job 6388A0385EEC03E263531353 status
[
  "0",
  "Ready to create a 20 Warehouse MariaDB TPROC-C schema\nin host LOCALHOST:\/TMP\/MARIADB.SOCK under user ROOT in database TPCC with storage engine INNODB?",
  "0",
  "Vuser 1:RUNNING",
  "0",
  "Vuser 2:RUNNING",
  "0",
  "Vuser 3:RUNNING",
  "0",
  "Vuser 4:RUNNING",
  "0",
  "Vuser 5:RUNNING",
  "0",
  "Vuser 2:FINISHED SUCCESS",
  "0",
  "Vuser 3:FINISHED SUCCESS",
  "0",
  "Vuser 5:FINISHED SUCCESS",
  "0",
  "Vuser 4:FINISHED SUCCESS",
  "0",
  "Vuser 1:FINISHED SUCCESS",
  "0",
  "ALL VIRTUAL USERS COMPLETE"
]


For formatting output, there is the option of converting the output to JSON or outputting text which is done with the jobs format command.

hammerdb>job format JSON
Setting jobs output format to json

hammerdb>jobs format text
Setting jobs output format to text

In the following example, we verify more details about the build by querying the output of Virtual User 1, we can see that it monitored all of the Virtual Users to load the data correctly and created the stored procedures and gathered statistics, so we can be sure that the schema build was fully complete.

hammerdb>job 6388A0385EEC03E263531353 1
Virtual User 1: Monitor Thread
Virtual User 1: CREATING TPCC SCHEMA
Virtual User 1: Ssl_cipher {}
Virtual User 1: CREATING DATABASE tpcc
Virtual User 1: CREATING TPCC TABLES
Virtual User 1: Loading Item
Virtual User 1: Loading Items - 50000
Virtual User 1: Loading Items - 100000
Virtual User 1: Item done
Virtual User 1: Monitoring Workers...
Virtual User 1: Workers: 4 Active 0 Done
Virtual User 1: Workers: 0 Active 4 Done
Virtual User 1: CREATING TPCC STORED PROCEDURES
Virtual User 1: GATHERING SCHEMA STATISTICS
Virtual User 1: TPCC SCHEMA COMPLETE


Querying  Job Output

As we ran a workload we can now query the configuration of HammerDB when the job was run,  in the example we query the database, benchmark, timestamp for the job and the dict at the time it was run.

hammerdb>job 6388A1585EEC03E243839333 db
["MariaDB"]

hammerdb>job 6388A1585EEC03E243839333 bm
["TPC-C"]

hammerdb>job 6388A1585EEC03E243839333 timestamp
6388A1585EEC03E243839333 {2022-12-01 12:43:04}

hammerdb>job 6388A1585EEC03E243839333 dict
{
  "connection": {
    "maria_host": "localhost",
    "maria_port": "3306",
    "maria_socket": "\/tmp\/mariadb.sock",
    "maria_ssl": "false",
    "maria_ssl_two_way": "false",
    "maria_ssl_linux_capath": "\/etc\/mysql\/certs",
    "maria_ssl_windows_capath": "C:\\mysql\\certs",
    "maria_ssl_ca": "ca-cert.pem",
    "maria_ssl_cert": "client-cert.pem",
    "maria_ssl_key": "client-key.pem",
    "maria_ssl_cipher": "server"
  },
  "tpcc": {
    "maria_count_ware": "20",
    "maria_num_vu": "4",
    "maria_user": "root",
    "maria_pass": "maria",
    "maria_dbase": "tpcc",
    "maria_storage_engine": "innodb",
    "maria_partition": "false",
    "maria_prepared": "false",
    "maria_total_iterations": "10000000",
    "maria_raiseerror": "false",
    "maria_keyandthink": "false",
    "maria_driver": "timed",
    "maria_rampup": "2",
    "maria_duration": "5",
    "maria_allwarehouse": "true",
    "maria_timeprofile": "true",
    "maria_async_scale": "false",
    "maria_async_client": "10",
    "maria_async_verbose": "false",
    "maria_async_delay": "1000",
    "maria_connect_pool": "false"
  }
}

We can also query the job result, the transaction count and the captured xtprof timings.  By default the summary timings are reported, however by adding the vuid you can also drill down into the timings for a specific virtual user.

hammerdb>job 6388A1585EEC03E243839333 result
[
  "6388A1585EEC03E243839333",
  "2022-12-01 12:43:04",
  "4 Active Virtual Users configured",
  "TEST RESULT : System achieved 28564 NOPM from 66576 MariaDB TPM"
]

hammerdb>job 6388A1585EEC03E243839333 tcount
{"MariaDB tpm": {
    "0": "2022-12-01 12:43:04",
    "67644": "2022-12-01 12:43:14",
    "82308": "2022-12-01 12:43:24",
    "80886": "2022-12-01 12:43:34",
    "79746": "2022-12-01 12:43:44",
    "80916": "2022-12-01 12:43:54",
    "81000": "2022-12-01 12:44:04",
    "80160": "2022-12-01 12:44:14",
    "78690": "2022-12-01 12:44:24",
    "81114": "2022-12-01 12:44:34",
    "79776": "2022-12-01 12:44:44",
    "79878": "2022-12-01 12:44:54",
    "77592": "2022-12-01 12:45:04",
    "73290": "2022-12-01 12:45:14",
    "76368": "2022-12-01 12:45:24",
    "76782": "2022-12-01 12:45:34",
    "73698": "2022-12-01 12:45:44",
    "54996": "2022-12-01 12:45:54",
    "62448": "2022-12-01 12:46:04",
    "66192": "2022-12-01 12:46:14",
    "62046": "2022-12-01 12:46:24",
    "63030": "2022-12-01 12:46:34",
    "64110": "2022-12-01 12:46:44",
    "63816": "2022-12-01 12:46:54",
    "66036": "2022-12-01 12:47:04",
    "65268": "2022-12-01 12:47:14",
    "64638": "2022-12-01 12:47:24",
    "61878": "2022-12-01 12:47:34",
    "64716": "2022-12-01 12:47:44",
    "61800": "2022-12-01 12:47:54",
    "63486": "2022-12-01 12:48:04",
    "64530": "2022-12-01 12:48:14",
    "64524": "2022-12-01 12:48:24",
    "65166": "2022-12-01 12:48:34",
    "65640": "2022-12-01 12:48:44",
    "68286": "2022-12-01 12:48:54",
    "68970": "2022-12-01 12:49:04",
    "70998": "2022-12-01 12:49:14",
    "68964": "2022-12-01 12:49:24",
    "69456": "2022-12-01 12:49:34",
    "69378": "2022-12-01 12:49:44",
    "69018": "2022-12-01 12:49:54",
    "69072": "2022-12-01 12:50:04"
  }}

hammerdb>job 6388A1585EEC03E243839333 timing
{
  "NEWORD": {
    "elapsed_ms": "420522.5",
    "calls": "53829",
    "min_ms": "1.284",
    "avg_ms": "4.239",
    "max_ms": "187.509",
    "total_ms": "228207.349",
    "p99_ms": "14.224",
    "p95_ms": "6.92",
    "p50_ms": "3.846",
    "sd": "3409.688",
    "ratio_pct": "54.213"
  },
  "PAYMENT": {
    "elapsed_ms": "420522.5",
    "calls": "53889",
    "min_ms": "0.612",
    "avg_ms": "1.685",
    "max_ms": "193.093",
    "total_ms": "90797.186",
    "p99_ms": "6.48",
    "p95_ms": "3.134",
    "p50_ms": "1.434",
    "sd": "1893.599",
    "ratio_pct": "21.57"
  },
  "DELIVERY": {
    "elapsed_ms": "420522.5",
    "calls": "5321",
    "min_ms": "5.065",
    "avg_ms": "13.447",
    "max_ms": "250.596",
    "total_ms": "71553.841",
    "p99_ms": "85.008",
    "p95_ms": "30.665",
    "p50_ms": "9.919",
    "sd": "15320.906",
    "ratio_pct": "16.998"
  },
  "OSTAT": {
    "elapsed_ms": "420522.5",
    "calls": "5472",
    "min_ms": "0.413",
    "avg_ms": "2.232",
    "max_ms": "127.654",
    "total_ms": "12215.15",
    "p99_ms": "34.898",
    "p95_ms": "4.028",
    "p50_ms": "1.155",
    "sd": "6872.841",
    "ratio_pct": "2.902"
  },
  "SLEV": {
    "elapsed_ms": "420522.5",
    "calls": "5393",
    "min_ms": "0.903",
    "avg_ms": "1.932",
    "max_ms": "30.993",
    "total_ms": "10419.8",
    "p99_ms": "6.303",
    "p95_ms": "3.094",
    "p50_ms": "1.706",
    "sd": "1177.113",
    "ratio_pct": "2.475"
  }
}

For a TPROC-H workload, we can query the output of the individual Virtual Users to capture the query times.

hammerdb>job 6388A4F05EEC03E233931323 1
[
  "1",
  "Ssl_cipher {}",
  "1",
  "Executing Query 14 (1 of 22)",
  "1",
  "query 14 completed in 16.743 seconds",
  "1",
  "Executing Query 2 (2 of 22)",
  "1",
  "query 2 completed in 0.719 seconds",
  "1",
  "Executing Query 9 (3 of 22)",
  "1",
  "query 9 completed in 5.79 seconds",
  "1",
  "Executing Query 20 (4 of 22)",
  "1",
  "query 20 completed in 0.299 seconds",
  "1",
  "Executing Query 6 (5 of 22)",
  "1",
  "query 6 completed in 1.932 seconds",
  "1",
  "Executing Query 17 (6 of 22)",
  "1",
  "query 17 completed in 0.103 seconds",
  "1",
  "Executing Query 18 (7 of 22)",
  "1",
  "query 18 completed in 5.682 seconds",
  "1",
  "Executing Query 8 (8 of 22)",
  "1",
  "query 8 completed in 1.99 seconds",
  "1",
  "Executing Query 21 (9 of 22)",
  "1",
  "query 21 completed in 9.813 seconds",
  "1",
  "Executing Query 13 (10 of 22)",
  "1",
  "query 13 completed in 4.022 seconds",
  "1",
  "Executing Query 3 (11 of 22)",
  "1",
  "query 3 completed in 1.451 seconds",
  "1",
  "Executing Query 22 (12 of 22)",
  "1",
  "query 22 completed in 0.116 seconds",
  "1",
  "Executing Query 16 (13 of 22)",
  "1",
  "query 16 completed in 0.276 seconds",
  "1",
  "Executing Query 4 (14 of 22)",
  "1",
  "query 4 completed in 0.631 seconds",
  "1",
  "Executing Query 11 (15 of 22)",
  "1",
  "query 11 completed in 0.196 seconds",
  "1",
  "Executing Query 15 (16 of 22)",
  "1",
  "query 15 completed in 4.406 seconds",
  "1",
  "Executing Query 1 (17 of 22)",
  "1",
  "query 1 completed in 10.899 seconds",
  "1",
  "Executing Query 10 (18 of 22)",
  "1",
  "query 10 completed in 4.585 seconds",
  "1",
  "Executing Query 19 (19 of 22)",
  "1",
  "query 19 completed in 0.127 seconds",
  "1",
  "Executing Query 5 (20 of 22)",
  "1",
  "query 5 completed in 1.258 seconds",
  "1",
  "Executing Query 7 (21 of 22)",
  "1",
  "query 7 completed in 0.948 seconds",
  "1",
  "Executing Query 12 (22 of 22)",
  "1",
  "query 12 completed in 5.403 seconds",
  "1",
  "Completed 1 query set(s) in 78 seconds",
  "1",
  "Geometric mean of query times returning rows (21) is \"1.33547\""
]

Deleting Jobs

When the data related to a job is no loger required it can be deleted with the jobs delete command.

hammerdb>jobs
[
  "6388A0385EEC03E263531353",
  "6388A1585EEC03E243839333",
  "6388A3035EEC03E243830373",
  "6388A4F05EEC03E233931323"
]


hammerdb>job 6388A0385EEC03E263531353 delete
Deleted Jobid 6388A0385EEC03E263531353


hammerdb>job 6388A3035EEC03E243830373 delete
Deleted Jobid 6388A3035EEC03E243830373


hammerdb>jobs
[
  "6388A1585EEC03E243839333",
  "6388A4F05EEC03E233931323"
]

Job Web Service

HammerDB CLI jobs are also compatible with the HammerDB Web Service command allowing the querying of the jobs output over HTTP.  It is planned to enhance this webservice to provide an interactive to more easily visualise and analyse job data.

Summary

The HammerDB jobs interface has been added to provide a central repository stored in a SQLite database where all output and configuration related to a workload is stored for ease of querying at a later point. The jobs interface provides a foundation for future development to provide more comprehensive insight into job related data.

HammerDB v4.6 New Features Pt1: Python CLI Interface

From version v4.6 HammerDB allows you to run the Command Line Interface in both interactive and scripted sessions using a Python interpreter as an alternative to the default Tcl environment on both Linux and Windows.

Python Version Dependency

HammerDB will rely on the system installed Python interpreter and at a specific version as described in the following table. On Windows you need to install Python from www.python.org rather than the Windows app store to ensure that the required libraries are installed.

Linux 3.8
Red Hat Enterprise Linux 3.6
Windows 3.10

If you need a different version of Python you can build HammerDB from source as detailed in the documentation to use any version of Python 3 that you wish.

Starting the CLI in Python Mode

To start HammerDB in Python mode run the hammerdbcli tool with “py” or “python” as the first argument.  HammerDB will show the prompt  “hammerdb>>>” to indicate that it is running a Python interpreter.

$ ./hammerdbcli py
HammerDB CLI v4.6 
Copyright (C) 2003-2022 Steve Shaw 
Type "help()" for a list of commands 
hammerdb>>>

On Windows firstly open a CMD shell, navigate to HammerDB directory and run the same command as on Linux.

C:\Program Files\HammerDB-4.6>hammerdbcli py 
HammerDB CLI v4.6 
Copyright (C) 2003-2022 Steve Shaw 
Type "help()" for a list of commands 
hammerdb>>>

In this mode HammerDB will now accept commands and scripts in Python format enabling integration with an existing Python environment. A Python script can be run from an external command or script using the auto argument and a .py extension to the script.

hammerdbcli py auto autorunbuild.py

or the script run from the interactive prompt using the source command.

hammerdb>>>source('cliexample.py')

The HammerDB commands are accessed as Python functions with all arguments given as strings or integers where appropriate.

print("SETTING CONFIGURATION") 
dbset('db','mssqls') 
diset('tpcc','mssqls_driver','timed') 
diset('tpcc','mssqls_rampup',0) 
diset('tpcc','mssqls_duration',1) 
vuset('logtotemp',1) 
loadscript() 
print("SEQUENCE STARTED") 
for z in [1,2,4]:     
     print(z," VU TEST")     
     vuset('vu',z)     
     vucreate()     
     vurun()     
     vudestroy() 

print("TEST SEQUENCE COMPLETE")

Running this script returns output as follows using the Python loop to drive HammerDB.

hammerdb>>>source('cliexample.py')
hammerdb>>>SETTING CONFIGURATION
Database set to MSSQLServer
Value timed for tpcc:mssqls_driver is the same as existing value timed, no change made
Changed tpcc:mssqls_rampup from 2 to 0 for MSSQLServer
Changed tpcc:mssqls_duration from 5 to 1 for MSSQLServer
Script loaded, Type "print script" to view
SEQUENCE STARTED
1 VU TEST
Vuser 1 created MONITOR - WAIT IDLE
Vuser 2 created - WAIT IDLE
Logging activated
to C:/Users/Hdb/AppData/Local/Temp/hammerdb.log
2 Virtual Users Created with Monitor VU
Vuser 1:RUNNING
Vuser 1:Initializing xtprof time profiler
Vuser 1:Beginning rampup time of 0 minutes
Vuser 1:Rampup complete, Taking start Transaction Count.
Vuser 1:Timing test period of 1 in minutes
Vuser 2:RUNNING
Vuser 2:Initializing xtprof time profiler
Vuser 2:Processing 10000000 transactions with output suppressed...
Vuser 1:1 ...,
Vuser 1:Test complete, Taking end Transaction Count.
Vuser 1:1 Active Virtual Users configured
Vuser 1:TEST RESULT : System achieved 39945 NOPM from 92893 SQL Server TPM
Vuser 1:Gathering timing data from Active Virtual Users...
Vuser 2:FINISHED SUCCESS
Vuser 1:Calculating timings...
Vuser 1:Writing timing data to C:/Users/Hdb/AppData/Local/Temp/hdbxtprofile.log
Vuser 1:FINISHED SUCCESS
ALL VIRTUAL USERS COMPLETE
vudestroy success
2 VU TEST
Vuser 1 created MONITOR - WAIT IDLE
Vuser 2 created - WAIT IDLE
Vuser 3 created - WAIT IDLE
Logging activated
to C:/Users/Hdb/AppData/Local/Temp/hammerdb.log
3 Virtual Users Created with Monitor VU
Vuser 1:RUNNING
Vuser 1:Initializing xtprof time profiler
Vuser 1:Beginning rampup time of 0 minutes
Vuser 1:Rampup complete, Taking start Transaction Count.
Vuser 1:Timing test period of 1 in minutes
Vuser 2:RUNNING
Vuser 2:Initializing xtprof time profiler
Vuser 2:Processing 10000000 transactions with output suppressed...
Vuser 3:RUNNING
Vuser 3:Initializing xtprof time profiler
Vuser 3:Processing 10000000 transactions with output suppressed...
Vuser 1:1 ...,
Vuser 1:Test complete, Taking end Transaction Count.
Vuser 1:2 Active Virtual Users configured
Vuser 1:TEST RESULT : System achieved 81178 NOPM from 189455 SQL Server TPM
Vuser 1:Gathering timing data from Active Virtual Users...
Vuser 3:FINISHED SUCCESS
Vuser 2:FINISHED SUCCESS
Vuser 1:Calculating timings...
Vuser 1:Writing timing data to C:/Users/Hdb/AppData/Local/Temp/hdbxtprofile.log
Vuser 1:FINISHED SUCCESS
ALL VIRTUAL USERS COMPLETE
vudestroy success
4 VU TEST
Vuser 1 created MONITOR - WAIT IDLE
Vuser 2 created - WAIT IDLE
Vuser 3 created - WAIT IDLE
Vuser 4 created - WAIT IDLE
Vuser 5 created - WAIT IDLE
Logging activated
to C:/Users/Hdb/AppData/Local/Temp/hammerdb.log
5 Virtual Users Created with Monitor VU
Vuser 1:RUNNING
Vuser 1:Initializing xtprof time profiler
Vuser 1:Beginning rampup time of 0 minutes
Vuser 1:Rampup complete, Taking start Transaction Count.
Vuser 1:Timing test period of 1 in minutes
Vuser 2:RUNNING
Vuser 2:Initializing xtprof time profiler
Vuser 2:Processing 10000000 transactions with output suppressed...
Vuser 3:RUNNING
Vuser 3:Initializing xtprof time profiler
Vuser 3:Processing 10000000 transactions with output suppressed...
Vuser 4:RUNNING
Vuser 4:Initializing xtprof time profiler
Vuser 4:Processing 10000000 transactions with output suppressed...
Vuser 5:RUNNING
Vuser 5:Initializing xtprof time profiler
Vuser 5:Processing 10000000 transactions with output suppressed...
Vuser 1:1 ...,
Vuser 1:Test complete, Taking end Transaction Count.
Vuser 1:4 Active Virtual Users configured
Vuser 1:TEST RESULT : System achieved 118762 NOPM from 276763 SQL Server TPM
Vuser 1:Gathering timing data from Active Virtual Users...
Vuser 5:FINISHED SUCCESS
Vuser 4:FINISHED SUCCESS
Vuser 3:FINISHED SUCCESS
Vuser 2:FINISHED SUCCESS
Vuser 1:Calculating timings...
Vuser 1:Writing timing data to C:/Users/Hdb/AppData/Local/Temp/hdbxtprofile.log
Vuser 1:FINISHED SUCCESS
ALL VIRTUAL USERS COMPLETE
vudestroy success
TEST SEQUENCE COMPLETE
hammerdb>>>

Python Performance and the GIL

You should already be aware that due to the Python GIL impacting multithreading performance it is not possible to drive a high-performance database workload directly from Python.

Why Tcl is 700% faster than Python for database benchmarking

Therefore, although on both Linux and Windows, you will see Python as the top process in performance monitoring tools when running the workload.

The multithreading capabilities of Tcl are embedded into Python ensuring that the same high levels of performance can be achieved as when running directly in Tcl.

How the Python CLI Interface works

The HammerDB CLI interface uses a package called tclpy to embed HammerDB into the Python interpreter.  Therefore, advanced users can also use the command tclpy.eval to call HammerDB commands directly and  tclpy.eval(‘vurun’) is functionally equivalent to vurun() but will return output values and therefore in the example below this is how vurun can be called in v4.6 to capture the jobid.

print("SETTING CONFIGURATION")
dbset('db','maria')
dbset('bm','TPC-C')

diset('connection','maria_host','localhost')
diset('connection','maria_port','3306')
diset('connection','maria_socket','/tmp/mariadb.sock')

diset('tpcc','maria_user','root')
diset('tpcc','maria_pass','maria')
diset('tpcc','maria_dbase','tpcc')
diset('tpcc','maria_driver','timed')
diset('tpcc','maria_rampup','2')
diset('tpcc','maria_duration','5')
diset('tpcc','maria_allwarehouse','true')
diset('tpcc','maria_timeprofile','true')

loadscript()
print("TEST STARTED")
vuset('vu','vcpu')
vucreate()
tcstart()
tcstatus()
jobid = tclpy.eval('vurun')
vudestroy()
tcstop()
print("TEST COMPLETE")
fd = open("maria_tprocc.out", "w")
fd.write(jobid)
fd.close()
exit()

Similarly, as we are running an embedded interpreter any redirection of stdout to capture should be passed to the embedded Tcl level and not at the Python level. See the Python Docker script examples for how this is done.

Calling Python from Tcl

Although the examples in this post show HammerDB being called from a Python interpreter it is also possible to load the same package into HammerDB being run with a Tcl interpreter and call Python functions with the “py eval” command, for example:

HammerDB CLI v4.6
Copyright (C) 2003-2022 Steve Shaw
Type "help" for a list of commands
Initialized SQLite on-disk database /tmp/hammer.DB using existing tables (36,864 KB)
hammerdb>package require tclpy
0.4

hammerdb>py eval {divide = lambda x: 1.0/int(x)}

hammerdb>set d [py call divide 16]
0.0625

Summary

From HammerDB v4.6, Python users can run the HammerDB CLI in interactive or scripted mode using a Python interpreter, yet still take advantage of the advanced high performance mulithreading of Tcl. This way HammerDB can extended to take advantage of the features of both languages.

Why Tcl is 700% faster than Python for database benchmarking

Python is a popular programming language, especially for beginners, and consequently we see it occurring in places where it just shouldn’t be used, such as database benchmarking.  In contrast, a frequent question when it comes to HammerDB is why is it written in Tcl? Surely any language will do?

This post addresses both questions to illustrate exactly why HammerDB is written in Tcl and why using Python to benchmark a database can result in poor performance and misleading results.  To do this, we are going to rewrite HammerDB in Python and run a series of tests on a 2 socket Intel(R) Xeon(R) Platinum 8280L server to see how and why Tcl is 700% faster than Python for database benchmarking*

Background and Concepts

To begin with, you should be familiar with the concepts of parallelism vs concurrency and why it is vital that your benchmark should run in parallel so that you are testing the concurrency of the database. An introduction to these concepts is given in the following post.

What programming languages does HammerDB use and why does it matter?

You should also be familiar with the concepts of ACID compliance and locking versus multiversioning as well as database transaction isolation levels, as it is essential to ensure that the database benchmark itself is not serialized.

Rewriting HammerDB in Python

So we are not actually going to rewrite HammerDB in Python, however it is trivial to write a command line driver script in Python as all the hard work in creating the schema and writing the stored procedures has already been done. (We use stored procedures because, as the introductory post shows, using single SQL statements turns our database benchmark into a network test).  So instead we are going to take a cut down version of the HammerDB TPROC-C driver script and do the same in Python and use the HammerDB infrastructure to measure performance. For this example, we are going to use PostgreSQL stored procedures in PostgreSQL 14.1.

So firstly we have the Tcl based driver script called pgtest_thread.tcl

#!/usr/local/bin/tclsh8.6
package require Thread
set maxvuser 64
for { set vuser 0 } {$vuser < $maxvuser } {incr vuser} {
        set threadID [thread::create {
#EDITABLE OPTIONS##################################################
set library Pgtcl ;# PostgreSQL Library
set total_iterations 10000000 ;# Number of transactions before logging off
set host "localhost" ;# Address of the server hosting PostgreSQL
set port "5432" ;# Port of the PostgreSQL Server
set sslmode "disable" ;# SSLMode of the PostgreSQL Server
set user "tpcc" ;# PostgreSQL user
set password "tpcc" ;# Password for the PostgreSQL user
set db "tpcc" ;# Database containing the TPC Schema
#EDITABLE OPTIONS##################################################
#RANDOM NUMBER
  proc RandomNumber {m M} {return [expr {int($m+rand()*($M+1-$m))}]}
  proc NURand { iConst x y C } {return [ expr {((([RandomNumber 0 $iConst] | [RandomNumber $x $y]) + $C) % ($y - $x + 1)) + $x }]} 
   #RANDOM NAME
  proc randname { num } {
    array set namearr { 0 BAR 1 OUGHT 2 ABLE 3 PRI 4 PRES 5 ESE 6 ANTI 7 CALLY 8 ATION 9 EING }
    set name [ concat $namearr([ expr {( $num / 100 ) % 10 }])$namearr([ expr {( $num / 10 ) % 10 }])$namearr([ expr {( $num / 1 ) % 10 }]) ]
    return $name
  }

#LOAD LIBRARIES AND MODULES
if [catch {package require $library} message] { error "Failed to load $library - $message" }

#TIMESTAMP
proc gettimestamp { } {
    set tstamp [ clock format [ clock seconds ] -format %Y%m%d%H%M%S ]
    return $tstamp
}
#POSTGRES CONNECTION
proc ConnectToPostgres { host port sslmode user password dbname } {
    global tcl_platform
    if {[catch {set lda [pg_connect -conninfo [list host = $host port = $port sslmode = $sslmode user = $user password = $password dbname = $dbname ]]} message]} {
        set lda "Failed" ; puts $message
        error $message
    } else {
        if {$tcl_platform(platform) == "windows"} {
            #Workaround for Bug #95 where first connection fails on Windows
            catch {pg_disconnect $lda}
            set lda [pg_connect -conninfo [list host = $host port = $port sslmode = $sslmode user = $user password = $password dbname = $dbname ]]
        }
        pg_notice_handler $lda puts
        set result [ pg_exec $lda "set CLIENT_MIN_MESSAGES TO 'ERROR'" ]
        pg_result $result -clear
    }
    return $lda
}
#NEW ORDER
proc neword { lda no_w_id w_id_input } {
    #2.4.1.2 select district id randomly from home warehouse where d_w_id = d_id
    set no_d_id [ RandomNumber 1 10 ]
    #2.4.1.2 Customer id randomly selected where c_d_id = d_id and c_w_id = w_id
    set no_c_id [ RandomNumber 1 3000 ]
    #2.4.1.3 Items in the order randomly selected from 5 to 15
    set ol_cnt [ RandomNumber 5 15 ]
    #2.4.1.6 order entry date O_ENTRY_D generated by SUT
    set date [ gettimestamp ]
    set result [pg_exec $lda "call neword($no_w_id,$w_id_input,$no_d_id,$no_c_id,$ol_cnt,0.0,'','',0.0,0.0,0,TO_TIMESTAMP('$date','YYYYMMDDHH24MISS')::timestamp without time zone)" ]
    if {[pg_result $result -status] != "PGRES_TUPLES_OK"} {
            error "[pg_result $result -error]"
        pg_result $result -clear
    } else {
        #puts "New Order: $no_w_id $w_id_input $no_d_id $no_c_id $ol_cnt 0 [ pg_result $result -list ]"
        pg_result $result -clear
    }
}
#PAYMENT
proc payment { lda p_w_id w_id_input } {
    #2.5.1.1 The home warehouse id remains the same for each terminal
    #2.5.1.1 select district id randomly from home warehouse where d_w_id = d_id
    set p_d_id [ RandomNumber 1 10 ]
    #2.5.1.2 customer selected 60% of time by name and 40% of time by number
    set x [ RandomNumber 1 100 ]
    set y [ RandomNumber 1 100 ]
    if { $x <= 85 } {
        set p_c_d_id $p_d_id
        set p_c_w_id $p_w_id
    } else {
        #use a remote warehouse
        set p_c_d_id [ RandomNumber 1 10 ]
        set p_c_w_id [ RandomNumber 1 $w_id_input ]
        while { ($p_c_w_id == $p_w_id) && ($w_id_input != 1) } {
            set p_c_w_id [ RandomNumber 1  $w_id_input ]
        }
    }
    set nrnd [ NURand 255 0 999 123 ]
    set name [ randname $nrnd ]
    set p_c_id [ RandomNumber 1 3000 ]
    if { $y <= 60 } {
        #use customer name
        #C_LAST is generated
        set byname 1
    } else {
        #use customer number
        set byname 0
        set name {}
    }
    #2.5.1.3 random amount from 1 to 5000
    set p_h_amount [ RandomNumber 1 5000 ]
    #2.5.1.4 date selected from SUT
    set h_date [ gettimestamp ]
    #2.5.2.1 Payment Transaction
    #change following to correct values
    set result [pg_exec $lda "call payment($p_w_id,$p_d_id,$p_c_w_id,$p_c_d_id,$byname,$p_h_amount,'0','$name',$p_c_id,'','','','','','','','','','','','','','','','','','',TO_TIMESTAMP('$h_date','YYYYMMDDHH24MISS')::timestamp without time zone,0.0,0.0,0.0,'',TO_TIMESTAMP('$h_date','YYYYMMDDHH24MISS')::timestamp without time zone)" ]
    if {[pg_result $result -status] != "PGRES_TUPLES_OK"} {
            error "[pg_result $result -error]"
        pg_result $result -clear
    } else {
        #puts "Payment: $p_w_id $p_d_id $p_c_w_id $p_c_d_id $p_c_id $byname $p_h_amount $name 0 0 [ pg_result $result -list ]"
        pg_result $result -clear
    }
}
#ORDER_STATUS
proc ostat { lda w_id } {
    #2.5.1.1 select district id randomly from home warehouse where d_w_id = d_id
    set d_id [ RandomNumber 1 10 ]
    set nrnd [ NURand 255 0 999 123 ]
    set name [ randname $nrnd ]
    set c_id [ RandomNumber 1 3000 ]
    set y [ RandomNumber 1 100 ]
    if { $y <= 60 } {
        set byname 1
    } else {
        set byname 0
        set name {}
    }
            set date [ gettimestamp ]
            set result [pg_exec $lda "call ostat($w_id,$d_id,$c_id,$byname,'$name','','',0.0,0,TO_TIMESTAMP('$date','YYYYMMDDHH24MISS')::timestamp without time zone,0,'')" ] 
    if {[pg_result $result -status] != "PGRES_TUPLES_OK"} {
            error "[pg_result $result -error]"
        pg_result $result -clear
    } else {
        #puts "Order Status: $w_id $d_id $c_id $byname $name [ pg_result $result -list ]"
        pg_result $result -clear
    }
}
#DELIVERY
proc delivery { lda w_id } {
    set carrier_id [ RandomNumber 1 10 ]
    set date [ gettimestamp ]
            set result [pg_exec $lda "call delivery($w_id,$carrier_id,TO_TIMESTAMP('$date','YYYYMMDDHH24MISS')::timestamp without time zone)" ] 
    if {[pg_result $result -status] ni {"PGRES_TUPLES_OK" "PGRES_COMMAND_OK"}} {
            error "[pg_result $result -error]"
        pg_result $result -clear
    } else {
        #puts "Delivery: $w_id $carrier_id [ pg_result $result -list ]"
        pg_result $result -clear
    }
}
#STOCK LEVEL
proc slev { lda w_id stock_level_d_id } {
    set threshold [ RandomNumber 10 20 ]
            set result [pg_exec $lda "call slev($w_id,$stock_level_d_id,$threshold,0)"]
    if {[pg_result $result -status] ni {"PGRES_TUPLES_OK" "PGRES_COMMAND_OK"}} {
            error "[pg_result $result -error]"
        pg_result $result -clear
    } else {
        #puts "Stock Level: $w_id $stock_level_d_id $threshold [ pg_result $result -list ]"
        pg_result $result -clear
    }
}

#RUN TPC-C
set lda [ ConnectToPostgres $host $port $sslmode $user $password $db ]
if { $lda eq "Failed" } {
    error "error, the database connection to $host could not be established"
} 
pg_select $lda "select max(w_id) from warehouse" w_id_input_arr {
    set w_id_input $w_id_input_arr(max)
}
#2.4.1.1 set warehouse_id stays constant for a given terminal
set w_id  [ RandomNumber 1 $w_id_input ]  
pg_select $lda "select max(d_id) from district" d_id_input_arr {
    set d_id_input $d_id_input_arr(max)
}
set stock_level_d_id  [ RandomNumber 1 $d_id_input ]  
puts "Processing $total_iterations transactions without output suppressed..."
for {set it 0} {$it < $total_iterations} {incr it} {
    set choice [ RandomNumber 1 23 ]
    if {$choice <= 10} {
        neword $lda $w_id $w_id_input 
    } elseif {$choice <= 20} {
        payment $lda $w_id $w_id_input
    } elseif {$choice <= 21} {
        delivery $lda $w_id
    } elseif {$choice <= 22} {
        slev $lda $w_id $stock_level_d_id
    } elseif {$choice <= 23} {
        ostat $lda $w_id
    }
}
pg_disconnect $lda
thread::wait }]
}
vwait forever

and the same in Python called pgtest_thread.py

#!/usr/bin/python
import psycopg2
import sys
import random
import math
from datetime import datetime
import threading

def gettimestamp():
    tstamp = datetime.now().strftime("%Y%m%d%H%M%S")
    return tstamp

def nurand(iConst,x,y,C): 
    nrnd = ((((random.randint(0,iConst) | random.randint(x,y)) + C) % (y - x + 1 )) + x)
    return nrnd

def randname(num):
    namearr = [ "BAR", "OUGHT", "ABLE", "PRI", "PRES", "ESE", "ANTI", "CALLY", "ATION", "EING" ]
    pt1 = namearr[math.floor((num / 100) % 10)]
    pt2 = namearr[math.floor((num / 10) % 10)]
    pt3 = namearr[math.floor((num / 1) % 10)]
    name = pt1 + pt2 + pt3
    return name

def neword(cur, no_w_id, w_id_input):
    #print("neword")
    no_d_id = random.randint(1,10)
    no_c_id = random.randint(1,3000)
    ol_cnt = random.randint(5,15)
    date = gettimestamp()
    cur.execute('call neword(%s,%s,%s,%s,%s,0.0,\'\',\'\',0.0,0.0,0,TO_TIMESTAMP(%s,\'YYYYMMDDHH24MISS\')::timestamp without time zone)',(no_w_id,w_id_input,no_d_id,no_c_id,ol_cnt,date))
    rows = cur.fetchall()
    #print(rows)

def payment(cur, p_w_id, w_id_input):
    #print("payment")
    p_d_id = random.randint(1,10)
    x = random.randint(1,100)
    y = random.randint(1,100)
    if (x <= 85):
            p_c_d_id = p_d_id
            p_c_w_id = p_w_id
    else:
            p_c_d_id = random.randint(1,10)
            p_c_w_id = random.randint(1,w_id_input) 
            while (p_c_w_id == p_w_id) and (w_id_input != 1):
                p_c_w_id = random.randint(1,w_id_input) 

    nrnd = nurand(255,0,999,123)
    name = randname(nrnd)
    p_c_id = random.randint(1,3000)
    if (y <= 85):
            byname = 1
    else:
            byname = 0
            name = ""

    p_h_amount = random.randint(1,5000)
    h_date = gettimestamp()
    p_c_since = gettimestamp()
    cur.execute('call payment(%s,%s,%s,%s,%s,%s,\'0\',%s,%s,\'\',\'\',\'\',\'\',\'\',\'\',\'\',\'\',\'\',\'\',\'\',\'\',\'\',\'\',\'\',\'\',\'\',\'\',TO_TIMESTAMP(%s,\'YYYYMMDDHH24MISS\')::timestamp without time zone,0.0,0.0,0.0,\'\',TO_TIMESTAMP(%s,\'YYYYMMDDHH24MISS\')::timestamp without time zone)',(p_w_id,p_d_id,p_c_w_id,p_c_d_id,byname,p_h_amount,name,p_c_id,p_c_since,h_date))
    rows = cur.fetchall()
    #print(rows)

def delivery(cur, w_id):
    #print("delivery")
    carrier_id = random.randint(1,10)
    date = gettimestamp()
    cur.execute('call delivery(%s,%s,TO_TIMESTAMP(%s,\'YYYYMMDDHH24MISS\')::timestamp without time zone)',(w_id,carrier_id,date))
    #rows = cur.fetchall()
    #print(rows)

def slev(cur, w_id, stock_level_d_id):
    #print("slev" )
    threshold = random.randint(10, 20)
    cur.execute('call slev(%s,%s,%s,0)',(w_id,stock_level_d_id,threshold))
    rows = cur.fetchall()
    #print(rows)

def ostat(cur, w_id):
    #print("ostat")
    d_id = random.randint(1, 10)
    nrnd = nurand(255,0,999,123)
    name = randname(nrnd)
    c_id = random.randint(1, 3000)
    y = random.randint(1, 100)
    if (y <= 60):
        byname = 1
    else:
        byname = 0
        name = ""

    date = gettimestamp()
    cur.execute('call ostat(%s,%s,%s,%s,%s,\'\',\'\',0.0,0,TO_TIMESTAMP(%s,\'YYYYMMDDHH24MISS\')::timestamp without time zone,0,\'\')',(w_id,d_id,c_id,byname,name,date))
    rows = cur.fetchall()
    #print(rows)

def runtpcc():
    total_iterations = 1000000
    conn = psycopg2.connect(host="localhost",port="5432",database="tpcc",user="tpcc",password="tpcc")
    conn.set_session(autocommit=True)
    cur = conn.cursor()
    cur.execute("select max(w_id) from warehouse")
    w_id_input = cur.fetchone()[0]
    w_id = random.randint(1, w_id_input)
    cur.execute("select max(d_id) from district")
    d_id_input = cur.fetchone()[0]
    stock_level_d_id = random.randint(1, d_id_input)
    print ("Processing", total_iterations, "transactions without output suppressed...")
    for x in range(total_iterations):
        choice = random.randint(1, 23)
        if (choice <= 10):
            neword (cur, w_id, w_id_input)
        elif (choice <= 20):
            payment (cur,w_id, w_id_input)
        elif (choice <= 21):
            delivery (cur, w_id)
        elif (choice <= 22):
            slev (cur, w_id, stock_level_d_id)
        elif (choice <= 23):
            ostat (cur, w_id)

        if (x == total_iterations):
            conn.close()

def main():
    threads = []
    for n in range(64):
        t = threading.Thread(target=runtpcc)
        threads.append(t)
        t.start()

    for t in threads:
        t.join()

if __name__ == '__main__':
    main()

It should be clear that  these driver scripts do exactly the same thing, they create multiple threads and loop calling the 5 TPROC-C stored procedures meaning that on the PostgreSQL database the workload itself is identical only the language calling the stored procedures is different.

Building the Test Schema and Timing Script

Next we are going to build a PostgreSQL test schema using the HammerDB CLI as follows:

dbset db pg
dbset bm TPC-C
vuset logtotemp 0
diset tpcc pg_superuser postgres
diset tpcc pg_superuserpass postgres
diset tpcc pg_storedprocs true
diset tpcc pg_count_ware 1000
diset tpcc pg_num_vu 64
buildschema
waittocomplete
quit

Finally we are going to build a HammerDB timing script. The key thing to note is that we are going to set pg_total_iterations to 1, this means that we have a special form of driver script that will time the transactions on the database but will not run any transactions itself to impact the load. This way we can run our external Tcl and Python test scripts and capture the results.

dbset db pg
dbset bm TPC-C
diset tpcc pg_superuser steve
diset tpcc pg_defaultdbase postgres
diset tpcc pg_storedprocs true
diset tpcc pg_total_iterations 1
diset tpcc pg_driver timed
diset tpcc pg_rampup 0
diset tpcc pg_duration 2
vuset logtotemp 1
loadscript
vuset vu 1
tcstart
vucreate
vurun
runtimer 130
vudestroy
tcstop

If all is configured correctly when running either our Tcl or Python driver script we will see output such as follows, note that Vuser 2 shows FINISHED SUCCESS just after it starts so the only activity that this script is doing on the database is capturing the NOPM and TPM over a time period, in this example 2 minutes.

$ ./hammerdbcli auto pgtime.tcl
HammerDB CLI v4.4
Copyright (C) 2003-2022 Steve Shaw
Type "help" for a list of commands
The xml is well-formed, applying configuration
Database set to PostgreSQL
Benchmark set to TPC-C for PostgreSQL
Changed tpcc:pg_superuser from postgres to steve for PostgreSQL
Value postgres for tpcc:pg_defaultdbase is the same as existing value postgres, no change made
Changed tpcc:pg_storedprocs from false to true for PostgreSQL
Changed tpcc:pg_total_iterations from 10000000 to 1 for PostgreSQL
Clearing Script, reload script to activate new setting
Script cleared
Changed tpcc:pg_driver from test to timed for PostgreSQL
Changed tpcc:pg_rampup from 2 to 0 for PostgreSQL
Changed tpcc:pg_duration from 5 to 2 for PostgreSQL
Script loaded, Type "print script" to view
Transaction Counter Started
Vuser 1 created MONITOR - WAIT IDLE
Vuser 2 created - WAIT IDLE
Logging activated
to /tmp/hammerdb.log
2 Virtual Users Created with Monitor VU
Vuser 1:RUNNING
Vuser 1:Beginning rampup time of 0 minutes
Vuser 1:Rampup complete, Taking start Transaction Count.
0 PostgreSQL tpm
Vuser 1:Timing test period of 2 in minutes
Vuser 2:RUNNING
Vuser 2:Processing 1 transactions with output suppressed...
Vuser 2:FINISHED SUCCESS
3650220 PostgreSQL tpm
4245132 PostgreSQL tpm
4203948 PostgreSQL tpm
4211748 PostgreSQL tpm
4203648 PostgreSQL tpm
Timer: 1 minutes elapsed
Vuser 1:1 ...,
4367244 PostgreSQL tpm
4265898 PostgreSQL tpm
4320510 PostgreSQL tpm
4258518 PostgreSQL tpm
4426578 PostgreSQL tpm
4413780 PostgreSQL tpm
Timer: 2 minutes elapsed
Vuser 1:2 ...,
Vuser 1:Test complete, Taking end Transaction Count.
Vuser 1:1 Active Virtual Users configured
Vuser 1:TEST RESULT : System achieved 1844256 NOPM from 4232155 PostgreSQL TPM
Vuser 1:FINISHED SUCCESS
ALL VIRTUAL USERS COMPLETE

Running a single process pre-test

Before we test the multithreading capabilities it is important to callibrate how our respective scripts. For this we will use a slightly modified version for both that removes the threading. We start these running as follows for Python

python3 pgtest_proc.py

and as follows from the HammerDB directory (export the HammerDB ./lib directory into the LD_LIBRARY_PATH first)

./bin/tclsh8.6 pgtest_proc.tcl

Once the workload is started we then run the timing script in another command window and once we get the result we Ctrl-C the driver script to stop the test.

In our single process test we see 62257 NOPM for Python and 64142 NOPM for Tcl giving confidence that the test scripts and methodology are correct (remember that the workload is running in PostgreSQL stored procedures and the test is calling those stored procedures, so we expect the single process result to be very close).

We can also use VizTracer to run the same workload under Python this time in a single thread to understand a little about what the workload should look like. For each stored procedure we calculate the parameters with a number of random operations and then call psycopg2.extensions.cursor.execute, during this we sleep while we wait for the PostgreSQL database to execute the stored procedure and then we run psycopg2.extensions.cursor.fetchall to fetch the results. In total the Wall duration to run the neword stored procedure was 1.8ms.

Running the Mulithreaded Test

Now lets run a the multithreaded tests shown above. For each test we are going to edit the script to change the number of threads testing at 1,2,4,8,16,32 and 64 threads with the following results.

At 1, 2 or 4 threads we barely notice a difference, however at 64 threads Python gives us 247976 NOPM and Tcl 1844256 NOPM meaning Tcl is more than 700% faster for an identical workload once we start running multiple threads.

So why is there such a marked contrast between an identical workload in Python and Tcl? As an early indication we can take a look at the output from top when running 64 threads. Firstly with Python observe that we have 9 running tasks, 5.8% user CPU utilisation and all of the top postgres processes show as idle.

Tasks: 1144 total, 9 running, 1135 sleeping, 0 stopped, 0 zombie
%Cpu(s): 5.8 us, 1.0 sy, 0.0 ni, 93.0 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st
MiB Mem : 772719.8 total, 581155.8 free, 6252.7 used, 185311.2 buff/cache
MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 711977.4 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
52656 postgres 20 0 5347252 25784 13644 S 178.8 0.0 0:44.94 python3 pgtest_thread.py
52794 postgres 20 0 49.3g 2.4g 2.4g S 10.9 0.3 0:02.83 postgres: tpcc tpcc ::1(45366) idle
52663 postgres 20 0 49.3g 2.2g 2.2g S 10.3 0.3 0:02.88 postgres: tpcc tpcc ::1(45234) idle
52721 postgres 20 0 49.3g 2.3g 2.3g S 10.3 0.3 0:02.75 postgres: tpcc tpcc ::1(45292) idle
52778 postgres 20 0 49.3g 2.4g 2.4g S 10.3 0.3 0:02.82 postgres: tpcc tpcc ::1(45350) idle
52784 postgres 20 0 49.3g 2.3g 2.3g S 10.3 0.3 0:02.81 postgres: tpcc tpcc ::1(45356) idle

in contrast the same workload in Tcl shows that we have 64 running tasks (for 64 threads) 53.1% CPU and the postgres processes are all busy running stored procedures.

Tasks: 1143 total, 64 running, 1079 sleeping, 0 stopped, 0 zombie
%Cpu(s): 53.1 us, 6.0 sy, 0.0 ni, 40.0 id, 0.1 wa, 0.0 hi, 0.8 si, 0.0 st
MiB Mem : 772719.8 total, 578477.9 free, 9582.1 used, 184659.7 buff/cache
MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 708649.4 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
52329 postgres 20 0 5351472 222896 10216 S 441.3 0.0 1:33.25 ./bin/tclsh8.6 pgtest_thread.tcl
52362 postgres 20 0 49.3g 5.5g 5.5g R 87.1 0.7 0:17.59 postgres: tpcc tpcc ::1(45114) CALL
52408 postgres 20 0 49.3g 6.0g 6.0g R 87.1 0.8 0:17.49 postgres: tpcc tpcc ::1(45160) CALL
52338 postgres 20 0 49.3g 6.0g 6.0g R 86.8 0.8 0:17.63 postgres: tpcc tpcc ::1(45086) CALL
52344 postgres 20 0 49.3g 6.2g 6.2g R 86.8 0.8 0:17.63 postgres: tpcc tpcc ::1(45096) CALL
52348 postgres 20 0 49.3g 4.5g 4.5g S 86.8 0.6 0:17.42 postgres: tpcc tpcc ::1(45100) CALL

Analyzing the Python GIL

So, if you have read this far you won’t be surprised that we suspect that the roadblock we have encountered with Python is a major one called the Global Interpreter Lock or GIL that means only one thread can run at any one time. But lets confirm this using the GIL load tool to time the amount of time that the GIL is held for our 64 thread workload. The tool is installed and run as follows:

pip install gil_load
python3 -m gil_load pgtest_thread_gil.py

and we add gil_load.start and gil_load.stop to the Python script and run a smaller number of transactions without timing the test. The following is the result.  The GIL is held for 77% of the time and the threads wait for the GIL 99.9% of the time. For Tcl this is 0% and 0% respectively because there is no GIL meaning Tcl threads execute in parallel and the workload continues to scale.

The difference between the two also shows that there is time when we are synchronizing between the threads and not running any workload at all.  As our results show the more threads we add the worse the performance gets due to this GIL synchronization.

held: 0.771 (0.774, 0.771, 0.771)
wait: 0.999 (0.998, 0.999, 0.999)
<140263890196288>
held: 0.0 (0.0, 0.0, 0.0)
wait: 0.0 (0.0, 0.0, 0.0)
<140263853995776>
held: 0.008 (0.008, 0.008, 0.008)
wait: 0.903 (0.887, 0.902, 0.903)
<140263845603072>
held: 0.01 (0.01, 0.01, 0.01)
wait: 0.904 (0.888, 0.903, 0.904)
...
<140257727084288>
held: 0.01 (0.01, 0.01, 0.01)
wait: 0.904 (0.892, 0.904, 0.904)

If we now return to VizTracer and perf using the method described here to trace Python with 16 threads we can now see that the Wall Duration time for the neword stored procedure has increased by 300% and we are spending a lot longer sleeping with only one thread allowed to execute at any one time. PostgreSQL can sustain higher throughput but our Python client cannot.

Python GIL PostgreSQL wait events

So if the bottleneck is in the client what wait event would we expect to see on our PostgreSQL database? To do this we can use the pg_stat_statements and pg_sentinel packages to query the active session history with the SQL as follows:

with ash as (
	 select *,ceil(extract(epoch from max(ash_time)over()-min(ash_time)over()))::numeric samples
	 from pg_active_session_history where ash_time>=current_timestamp - interval '10 minutes'
) select  round(100 * count(*)/sum(count(*)) over(),0) as "%", round(count(*)/samples,2) as "AAS",
   datname,wait_event_type,wait_event
 from ash
 group by samples,
  datname,wait_event_type,wait_event
 order by 1 desc
;

with the results as follows on Python:

 % | AAS | datname | wait_event_type | wait_event
----+------+---------+-----------------+--------------------
62 | 4.10 | tpcc | Client | ClientRead
33 | 2.19 | tpcc | CPU | CPU
3 | 0.18 | tpcc | LWLock | pg_stat_statements
1 | 0.06 | tpcc | IO | DataFileRead
1 | 0.03 | tpcc | LWLock | WALInsert
0 | 0.01 | tpcc | IO | SLRURead
0 | 0.01 | tpcc | IO | DataFileExtend
0 | 0.01 | tpcc | Timeout | VacuumDelay
0 | 0.00 | tpcc | IO | DataFileWrite
(9 rows)

and as follows on Tcl.

% | AAS | datname | wait_event_type | wait_event
----+-------+---------+-----------------+----------------------
60 | 34.76 | tpcc | CPU | CPU
26 | 15.12 | tpcc | LWLock | pg_stat_statements
7 | 3.88 | tpcc | LWLock | WALInsert
4 | 2.48 | tpcc | Client | ClientRead
1 | 0.52 | tpcc | IO | DataFileRead
1 | 0.32 | tpcc | Lock | extend
0 | 0.02 | tpcc | IO | SLRURead
0 | 0.03 | tpcc | LWLock | XidGen
0 | 0.26 | tpcc | Lock | transactionid
0 | 0.23 | tpcc | IPC | ProcArrayGroupUpdate
0 | 0.03 | tpcc | LWLock | LockManager
0 | 0.02 | tpcc | LWLock | BufferContent
0 | 0.12 | tpcc | LWLock | WALWrite
0 | 0.08 | tpcc | LWLock | XactSLRU
0 | 0.02 | tpcc | IO | DataFileExtend
0 | 0.26 | tpcc | LWLock | ProcArray
(16 rows)

So when using Python we are spending 62% of our time on a Client wait event called ClientRead or in other words PostgreSQL is telling us it is spending most of its time waiting for the Python client to respond, either the transaction has finished and it is waiting for Python to fetch the results or it has already fetched the results and is waiting for the next query to be sent.  Either way high waits on ClientRead means we are waiting for the client and not the database.  Also note that with the Tcl workload we now start to see some of the LWLock wait events that a database engineer working on scalability would focus on and we can use the HammerDB PostgreSQL statistics viewer to drill down on some of these wait events.

Summary

What we have observed is that the Python GIL makes it impossible to run anything but a single threaded benchmark against a database because the GIL means that only one thread can ever run at one time.  For decades databases have been designed to scale to process millions of transactions a minute whilst maintaining database consistency, a serialized benchmark client compromises the ability to compare databases.

HammerDB and its predecessor Hammerora were written in the Tcl language because they were written by database engineers looking to write scalable database benchmarks rather than by application engineers looking to use a particular language. At the time Tcl was the only multi-threaded GIL free scripting language for database access and this still holds true today. (And although the Python test script could run in multiple processes HammerDB is an application simulating multiple virtual users in a GUI and CLI environment requiring the close interaction and control of using multiple threads).

*700% Faster

Although we have seen that Tcl is 700% faster than Python for our example PostgreSQL benchmark we observed that even with Tcl we were only at approximately 50% CPU Utilization. With commerical databases HammerDB can scale to full CPU utilization and 2-3X the transaction rate seen here meaning that in reality the gap between Tcl and Python running against a commericial database is more than 700%.

HammerDB for Managers

This post is targeted towards the questions most often asked by non-technical management who want to get up to speed on what HammerDB is (what it isn’t) and how it can benefit their organization.

What is HammerDB?

HammerDB is a software application for database benchmarking.  It enables the user to measure database performance and make comparative judgements about database hardware and software.

HammerDB has graphical and command line interfaces for the Windows and Linux operating systems.

 

 

 

 

 

 

 

Why HammerDB was developed

Databases are highly sophisticated software, and to design and run a fair benchmark workload is a complex undertaking. The Transaction Processing Performance Council (TPC) was founded to bring standards to database benchmarking, and the history of the TPC can be found here. The TPC designed benchmarks for transaction processing (OLTP) and analytics (OLAP) and anyone can run these benchmarks, have them audited by the TPC and published on the official benchmark rankings.

However, although these results are the gold standard of database benchmarking to do so, requires time, expertise and not insignificant cost. Additionally, many databases contain a license clause colloquially known as a De-Witt clause that prevents the publication of non-approved benchmarks.

These factors meant that often when looking for database performance information, the results for a particular combination of software and hardware were not available.

As the TPC makes its specifications available for free, the need was seen for an open source benchmarking application that could leverage the standards written by the TPC yet implemented quickly, easily and at low cost by anyone. Also, when testing a database with a De-Witt clause, it was then possible to produce your own results without having to rely on a particular vendor to publish results of interest.

The HammerDB name

Originally, HammerDB was named  Hammerora because the first database the application supported was Oracle. Ora was a common prefix/suffix for Oracle related software, and the name inspired by a genre of classic films and the characters portrayed, in particular for testing Oracle RAC.  As more databases were added, the original name was less appropriate and the name HammerDB was suggested and adopted.

HammerDB development

HammerDB was originally developed by Steve Shaw as an employer approved  own-time, own-materials project to implement workloads derived from TPC specifications in a user accessible way.  An important concept was to simulate database users called Virtual Users in parallel (rather than concurrently) to accurately simulate a real database workload with multiple users running from separate systems. Programming languages such as Java, Lua and Python were unable to support this unique requirement (an application to simulate separate users in multiple threads that do not block or pause each other’s workloads, the Python GIL for example only permits one user to run at any one time) and the design decisions made are discussed in the following post.

What programming languages does HammerDB use and why does it matter?

As a result, HammerDB was able to scale beyond other database workload applications as the following post illustrates.

Why Tcl is 700% faster than Python for database benchmarking

Adoption by the TPC

Given the increasing importance of open source and the widespread use of HammerDB, the TPC adopted HammerDB in 2019 and now hosts the project on GitHub. Today, the TPC-OSS subcommittee oversees development and approves all modifications made through pull requests.

Usage and industry adoption

HammerDB maintains a web page under the stats link where the number of downloads can be tracked. HammerDB is used globally, with the areas of use shown in blue on the map below with a darker color showing higher levels of use, with the most popular destinations being the USA and China.

The results of HammerDB workloads have been published by all leading cloud vendors, database software vendors and systems suppliers, HammerDB maintains a collated list of these publications but does not vet or audit results before inclusion.

HammerDB Licensing

HammerDB is open source software licensed under the
The GNU General Public License v3.0 (GPLv3) and a quick guide to GPLv3 can be viewed here.  HammerDB has dependencies on external open source software and can be built from source.

How to build HammerDB from source

HammerDB is Free software and consequently engineers should consider not only how they can benefit from using the software but also how they can contribute to the community with code and documentation.

Businesses that depend on open source should consider sponsorship of open source projects or financial support to ensure that the open source they depend on remains freely available.

Supported Databases

HammerDB supports the most popular databases on the db-engines ranking, namely Oracle Database, Microsoft SQL Server, IBM Db2, TimesTen, MySQL, MariaDB, PostgreSQL, Greenplum, Postgres Plus Advanced Server, Citus Data, Amazon Aurora and Amazon Redshift.  HammerDB supports these databases running in the cloud and in the enterprise, and will also run workloads against databases derived from the most popular open source databases MySQL and PostgreSQL.

The wide range of database support gives HammerDB an advantage over other database benchmarking tools that only implement workloads against one or two databases, limiting comparison between database engines and assessing relative performance.

Derived Workloads

 When testing database performance there are 2 distinct workloads, transactional or OLTP and analytic (data warehouse, decision support) or OLAP.  HammerDB supports 2 workloads derived from TPC specifications to test these different requirements, namely TPROC-C derived from TPC-C for OLTP and TPROC-H derived from TPC-H for OLAP.

It is important to note that TPC-C and TPC-H are registered trademarks of the TPC and using the names TPC-C or TPC-H and/or as official metrics such as tpmC or QphH in a non-audited publication is considered a trademark violation and should not be used.

An additional specification called TPC-CH for hybrid transactional/analytical processing (HTAP) is under research and development for inclusion in a future release  as TPROC-CH.

In addition to the TPC-C specification for OLTP workloads, the TPC has also developed the TPC-E specification.  HammerDB will consider developing a TPC-E derived workload when official benchmark publications are made from at least 3 of the supported databases to ensure a fair representation in an open source version.

The NOPM Metric

When reporting TPROC-C workloads the key metric is known as NOPM or New Orders per Minute.  This metric measures the same value as the tpmC metric in an official TPC-C publication, however as noted previously the use of official terminology in derived workloads is not permitted and therefore HammerDB uses NOPM as a derived metric.  HammerDB also reports TPM as an engineering metric, it is not a requirement for there to be a fixed relationship between NOPM and TPM across databases and therefore NOPM can be used for comparison of performance  between databases whereas TPM is for analysing a particular database engine.

Cached vs Scaled Workloads

The official TPC-C OLTP workload is a what is known as a scaled workload, however a key difference in the HammerDB design was by default to implement a smaller more efficient cached workload based on the same specification that could still give an indication of comparative performance in the same way that the scaled workload could.

A key difference between cached and scaled workloads is the implementation of keying and thinking time to introduce a pause of time between transactions. In a scaled derived TPC-C workload allowing for this keying and thinking time, one Virtual User will complete approximately 1 New Order per Minute and therefore for example 10,000 database sessions will run at approximately 10,000 NOPM and 100,000 sessions at 100,000 NOPM.  The workload also outputted the data from the Virtual Users by simulating individual terminals.

Note that HammerDB can also implement a scaled workload with a feature called event-driven scaling.  However, this requires a large data set and middleware to manage the large database session count.  Instead, most users prefer to implement a cached workload.

When HammerDB was designed, it was clear that where the database software was scalable (initially such as Oracle), CPU performance at full utilization was the key determining factor for database performance. Therefore, a perfectly scaled implementation provided the levels of memory and I/O or disk capacity to reach full CPU utilization.

However, prior to the advent of high performance Solid State Disks (SSDs) to implement such a configuration required a high capacity of hard disk drives (HDDs) in one or many fibre attached storage arrays at considerable expense.

Instead, HammerDB implemented a cached workload by eliminating the keying and thinking time and the requirement for terminals. Now individual Virtual Users don’t pause between transactions and can run at tens of thousands NOPM each.  With the reduced I/O footprint, the data each Virtual User requires is much reduced, meaning that most of the workload is cached in memory. This means we can reach full CPU utilization (with scalable database software) quicker and without requiring middleware.

This approach gives us an indication of what an optimally configured scaled configuration with a high I/O and memory capacity can achieve, however be aware that for a production environment you will have measured the CPU’s database potential, but you will need sufficient memory, I/O and scalable database software to make full use of this CPU potential.

Summary

1. HammerDB is a software application for database benchmarking.

2. HammerDB was developed to allow anyone to run database benchmarks quickly, easily and at low cost.

3. HammerDB was designed to scale and is developed in a language that is not restrictied by a Global Interpreter Lock (GIL) that restricts workloads to being single-threaded and instead runs in parallel.

4. The HammerDB name was inspired by a genre of classic films.

5. HammerDB is hosted by Transaction Processing Performance Council (TPC) who oversee development.

6. HammerDB is used globally and provides statistics on downloads and publications.

7. HammerDB is open source software licensed under the The GNU General Public License v3.0 (GPLv3).

8. HammerDB supports the most popular relational databases on the db-engines ranking.

9. HammerDB runs workloads called TPROC-C and TPROC-H derived from the TPC specifications TPC-C and TPC-H respectively with the NOPM metric, the key metric for measuring transactional performance. Using TPC terminology for non-audited benchmarks violate TPC trademarks.

10. By default HammerDB implements a cached vs a scaled workload but can implement both types of benchmark.

 

How to build HammerDB from source

With pull requests Dependency build automation for Linux #323 and Dependency build automation for Windows #362 HammerDB has enabled the user to build their own full distribution of HammerDB from source on Linux x64 and Windows x64 with a single command. This post explains how you can build your own distribution.

Do you need to build from source?

Before building from source, the first question should be, do you need to?  HammerDB already provides pre-compiled packages from the download page so if you want to run HammerDB without building it then these packages are all you need.

WHY build from source?

If you are not already familiar with the programming languages that HammerDB uses, then this earlier post serves as an ideal introduction to what makes up the highest performing GIL free database benchmarking application.

What programming languages does HammerDB use and why does it matter?

You may want to become familiar with all the underlying source code in C or wish to build a distribution where you can verify every single line of source code that goes into HammerDB. All of the source code and dependencies in HammerDB are  open source right down to the layer above any 3rd party commercial database drivers for Oracle, SQL Server and Db2.  Building from source also enables you to build a distribution right up to date with any of the latest pull requests not yet included in the most recent HammerDB release.

BAWT by Paul Obermeier

HammerDB build automation uses an adapted version of the BAWT package by Paul Obermeier . BAWT is copyrighted by Paul Obermeier and distributed under the 3-clause BSD license.

GETTING STARTED ON LINUX

If you have read what programming languages HammerDB uses, then you will not be too surprised that you will firstly need to install a development environment using the gcc compiler for Linux. You will also need the p7zip tool as well as the Xft font library development packages.  The build has been tested on x64 Red Hat 8.X and Ubuntu 20.04.X Linux, with the following example from Red Hat Linux.

yum install p7zip-16.02-20.el8.x86_64.rpm
sudo yum install libXft-devel
yum group install "Development Tools"

The HammerDB Oracle and ODBC for SQL Server client libraries will build without the respective Oracle and SQL Server client libraries, but will need them at runtime.  However, the client or server must be installed for all of Db2, MariaDB, PostgreSQL and MySQL for the HammerDB build to be successful. The database installation must include both the include and lib directories.

For Db2 either the server or client can be installed, for example v11.5.7_linuxx64_server_dec.tar.gz.

HammerDB build automation will look for the Db2 installation in the location of the environment variable IBM_DB_DIR set using db2profile tool. Verify that this has been set as follows:

$ echo $IBM_DB_DIR
/home/ibm/sqllib

On Linux MariaDB, PostgreSQL and MySQL include a config command in the bin directory that returns details of the configuration. HammerDB uses these commands to find the headers and libraries needed for the build.

Before running the build, environment variables MARIADB_CONFIG, PG_CONFIG and MYSQL_CONFIG must be set to the location of the respective config commands for each database in the terminal running the build.

$ export MARIADB_CONFIG=/opt/mariadb/mariadb-10.6.7-linux-systemd-x86_64/bin
$ export PG_CONFIG=/opt/postgresql/bin
$ export MYSQL_CONFIG=/opt/mysql/mysql-8.0.28-linux-glibc2.12-x86_64/bin
GETTING STARTED ON WINDOWS

On Windows, download and install Visual Studio 2022 , Visual Studio is free for open source developers.  An additional gcc compiler will be downloaded and installed locally during the build. The build has been tested on x64 Windows 10 and 11.

As with Linux it is also mandatory to install a database server or client including the development environment of headers and libraries for MariaDB, Db2, MySQL and PostgreSQL.

For Db2 on Windows, there is no db2profile that sets the environment therefore the IBM_DB_DIR environment variable must be set to the location of the Db2 install.  Similarly, the MariaDB and MySQL config commands are not available on Windows either and should also be set to the database or client installation directory rather than the bin directory. PostgreSQL for Windows does include the config command and therefore the environment configuration is the same as Linux.

set MARIADB_CONFIG=C:\Program Files\MariaDB\MariaDB Connector C 64-bit
set MYSQL_CONFIG=C:\Program Files\MySQL\MySQL Server 8.0
set PG_CONFIG=C:\Program Files\PostgreSQL\pgsql\bin
set IBM_DB_DIR=C:\Program Files\IBM\SQLLIB

For all database installations on Windows whether client or server verify that the installation has the include, bin and lib directories. On Windows in particular, some installations may not include all the required files for development.

Download HAMMERDB SOURCE

At this stage you will have installed the compiler you need and database client/server installations for MariaDB, Db2, MySQL and PostgreSQL. To reiterate, HammerDB will not build correctly unless you have installed ALL the required database environments.

Next download HammerDB from gitHub by either cloning or downloading. From the main HammerDB GitHub page use the clone URL or the Download Zip link from the master branch.

$ git clone https://github.com/TPC-Council/HammerDB.git
Cloning into 'HammerDB'...
remote: Enumerating objects: 8099, done.
remote: Total 8099 (delta 0), reused 0 (delta 0), pack-reused 8099
Receiving objects: 100% (8099/8099), 40.41 MiB | 8.39 MiB/s, done.
Resolving deltas: 100% (2564/2564), done.

If cloning you will have a directory called “HammerDB” or extracting the zipfile a directory called “HammerDB-master”.

RUNNING THE BUILD

Navigate to the Build\Bawt-2.1.0 directory with the command line for both Linux and Windows. Note that during the build an InputLibs directory will be created and all HammerDB libraries downloaded and under the Build directory a BawtBuild directory will be created where the installation will take place. Therefore, sufficient disk space and permissions must be available for the build to take place.

Within the command line you are running the build make sure that you have correctly set all the MARIADB_CONFIG, MYSQL_CONFIG, PG_CONFIG, and IBM_DB_DIR environment variables used during the build and run the Build-Linux.sh command for Linux ./Build-Linux.sh x64 Setup/HammerDB-Linux.bawt update and Build-Windows.bat commands for Windows ./Build-Windows.bat x64 vs2022+gcc Setup/HammerDB-Windows.bawt update, the command sequence will look similar to the following on Linux:

$ export MYSQL_CONFIG=/opt/mysql-8.0.20-linux-glibc2.12-x86_64/bin
$ export MARIADB_CONFIG=/opt/mariadb-10.8.1-linux-x86_64/bin/
$ export PG_CONFIG=/opt/postgresql-14.1/bin
$ echo $IBM_DB_DIR
/opt/ibm/sqllib
$ ./Build-Linux.sh x64 Setup/HammerDB-Linux.bawt update

and the following on Windows:

set MARIADB_CONFIG=C:\Program Files\MariaDB\MariaDB Connector C 64-bit
set MYSQL_CONFIG=C:\Program Files\MySQL\MySQL Server 8.0
set PG_CONFIG=C:\Program Files\PostgreSQL\pgsql\bin
set IBM_DB_DIR=C:\Program Files\IBM\SQLLIB
Build-Windows.bat x64 vs2022+gcc Setup\HammerDB-Windows.bawt update

The first step the build will take is to download the required packages and build instructions from www.hammerdb.com, On Windows the MYSYS/MinGW package will also be downloaded.  These will be stored in the Bawt-2.1.0/InputLibs directory. Both checksums and modification times are verified with the remote packages. If a package is already present with the same checksum and modification time, it will not be downloaded again if already present. Also some packages such as Tcl have been modified from the original and therefore only the packages from www.hammerdb.com should be used.

awthemes-9.3.1.7z  libressl-2.6.4.7z  pgtcl-2.1.1.7z Tk-8.6.12.7z
awthemes.bawt      libressl.bawt      pgtcl.bawt        Tk.bawt
clearlooks-1.0.7z  mariatcl-0.1.7z    redis-0.1.7z tkblt-3.2.23.7z
clearlooks.bawt    mariatcl.bawt      redis.bawt        tkblt.bawt
db2tcl-2.0.1.7z    mysqltcl-3.052.7z  Tcl-8.6.12.7z tksvg-0.5.7z
db2tcl.bawt        mysqltcl.bawt      Tcl.bawt          tksvg.bawt
expect-5.45.4.7z   oratcl-4.6.7z      tcltls-1.7.22.7z
expect.bawt        oratcl.bawt        tcltls.bawt

Allow the build to complete. A summary will be given of the packages built and the location of the build given, for example for Linux as follows:

14:27:08 > Creating Distribution tar.gz in /opt/HammerDB/Build/BawtBuild/Linux/x64/Release/Distribution
14:27:08 >   TarGzip
               Source directory: /opt/HammerDB/Build/BawtBuild/Linux/x64/Release/Distribution/HammerDB-4.4
               Tar file        : /opt/HammerDB/Build/BawtBuild/Linux/x64/Release/Distribution/HammerDB-4.4-Linux.tar.gz
14:27:09 > End FinalizeStage

14:27:09 > Summary
           Setup file     : /opt/HammerDB/Build/Bawt-2.1.0/Setup/HammerDB-Linux.bawt
           Build directory: /opt/HammerDB/Build/BawtBuild/Linux/x64/Release/Build
           Architecture   : x64
           Compilers      : gcc
           Global stages  : Finalize
           #  : Library Name         Version    Build time Stages
----------------------------------------------------------------------
             1: Tcl                  8.6.12     1.52 minutes Clean Extract Configure Compile Distribute
             2: Tk                   8.6.12     0.36 minutes Clean Extract Configure Compile Distribute
             3: awthemes             9.3.1      0.00 minutes Clean Extract Configure Compile Distribute
             4: clearlooks           1.0        0.00 minutes Clean Extract Configure Compile Distribute
             5: db2tcl               2.0.1      0.10 minutes Clean Extract Configure Compile Distribute
             6: expect               5.45.4     0.12 minutes Clean Extract Configure Compile Distribute
             7: libressl             2.6.4      0.99 minutes Clean Extract Configure Compile Distribute
             8: mariatcl             0.1        0.04 minutes Clean Extract Configure Compile Distribute
             9: mysqltcl             3.052      0.04 minutes Clean Extract Configure Compile Distribute
            10: oratcl               4.6        0.05 minutes Clean Extract Configure Compile Distribute
            11: pgtcl                2.1.1      0.05 minutes Clean Extract Configure Compile Distribute
            12: redis                0.1        0.00 minutes Clean Extract Configure Compile Distribute
            13: tcltls               1.7.22     0.14 minutes Clean Extract Configure Compile Distribute
            14: tkblt                3.2.23     0.21 minutes Clean Extract Configure Compile Distribute
            15: tksvg                0.5        0.06 minutes Clean Extract Configure Compile Distribute
----------------------------------------------------------------------
           Total: 3.70 minutes

and for Windows:

You now have your own distribution of HammerDB with the latest source code.  You can run the hammerdbcli librarycheck command to verify that the libraries built correctly.

/opt/HammerDB/Build/BawtBuild/Linux/x64/Release/Distribution$ ls
HammerDB-4.4  HammerDB-4.4-Linux.tar.gz
$ cd HammerDB-4.4
/opt/HammerDB/Build/BawtBuild/Linux/x64/Release/Distribution/HammerDB-4.4$ ./hammerdbcli
HammerDB CLI v4.4
Copyright (C) 2003-2022 Steve Shaw
Type "help" for a list of commands
hammerdb>librarycheck
Checking database library for Oracle
Success ... loaded library Oratcl for Oracle
Checking database library for MSSQLServer
Success ... loaded library tdbc::odbc for MSSQLServer
Checking database library for Db2
Success ... loaded library db2tcl for Db2
Checking database library for MySQL
Success ... loaded library mysqltcl for MySQL
Checking database library for PostgreSQL
Success ... loaded library Pgtcl for PostgreSQL
Checking database library for MariaDB
Success ... loaded library mariatcl for MariaDB

You can also browse all of the C source code for the libraries you have built in the Build directory.

/opt/HammerDB/Build/BawtBuild/Linux/x64/Release/Build
VERIFYING DEPENDENT LIBRARIES

Having built the distribution yourself it should be clear that as the database libraries are dynamically linked when you install the distribution on another system you will  also need to install the database clients on that system.

For example on Linux we have a check that fails because the MariaDB libraries cannot be found and we can verify this with the ldd command.

hammerdb>librarycheck
...
Checking database library for MariaDB
Error: failed to load mariatcl - couldn't load file "/opt/HammerDB/Build/BawtBuild/Linux/x64/Release/Distribution/HammerDB-4.4/lib/mariatcl0.1/libmariatcl0.1.so": libmariadb.so.3: cannot open shared object file: No such file or directory
Ensure that MariaDB client libraries are installed and the location in the LD_LIBRARY_PATH environment variable

$ ldd libmariatcl0.1.so
    linux-vdso.so.1 (0x00007ffd534db000)
    libmariadb.so.3 => not found
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f259d2f9000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f259d513000)

When we install this dependent library and tell HammerDB where to find it:

export LD_LIBRARY_PATH=/opt/mariadb-10.2.34-linux-x86_64/lib:$LD_LIBRARY_PATH

HammerDB will correctly load the MariaDB package.


$ ldd libmariatcl0.1.so
    linux-vdso.so.1 (0x00007ffc6214d000)
    libmariadb.so.3 => /opt/mariadb-10.2.34-linux-x86_64/lib/libmariadb.so.3 (0x00007f3e028c9000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f3e026bd000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f3e0269a000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f3e02694000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f3e02545000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f3e02cd2000)

hammerdb>librarycheck
...
Checking database library for MariaDB
Success ... loaded library mariatcl for MariaDB

For an example Windows this time we cannot load the MySQL library.

HammerDB CLI v4.4
Copyright (C) 2003-2022 Steve Shaw
Type "help" for a list of commands
hammerdb>librarycheck
...
Checking database library for MySQL
Error: failed to load mysqltcl - couldn't load library "C:/HammerDB-master/Build/BawtBuild/vs2022/x64/Release/Distribution/HammerDB-4.4/lib/mysqltcl3.052/libmysqltcl.dll": No error
Ensure that MySQL client libraries are installed and the location in the PATH environment variable
hammerdb>

We run the DependenciesGui from the dependencies tool,  load libmysqltcl.dll and it shows we cannot find libmysql.dll.

After installing MySQL we set the environment variable to find the library.

and verify with the Dependencies GUI that now all dependencies can be found.

We can now confirm that the MySQL package can be correctly loaded.

HammerDB CLI v4.4
Copyright (C) 2003-2022 Steve Shaw
Type "help" for a list of commands
hammerdb>librarycheck
Checking database library for Oracle
Success ... loaded library Oratcl for Oracle
Checking database library for MSSQLServer
Success ... loaded library tdbc::odbc for MSSQLServer
Checking database library for Db2
Success ... loaded library db2tcl for Db2
Checking database library for MySQL
Success ... loaded library mysqltcl for MySQL
Checking database library for PostgreSQL
Success ... loaded library Pgtcl for PostgreSQL
Checking database library for MariaDB
Success ... loaded library mariatcl for MariaDB
hammerdb>
SUMMARY

You have now learnt how to build your own HammerDB from source. This gives you access to all of the source code and see how HammerDB is built for a released distribution to enable you to build your own.

 

HammerDB v4.3 New Features Pt2: Enhanced Webservice with SQLite repository

The most popular interface to HammerDB is using the GUI, however demand for running HammerDB workloads in environments without GUI interfaces led to the development of the CLI.  From v3.2 cloud based usage led to the development of a HammerDB HTTP based web service interface with output data stored in a SQLite database repository. From v4.3 this web interface has been enhanced to add command line functionality to the service as well as extending the ability to query the database of configuration, results and timing data to be returned in JSON format, enabling a long-term repository of HammerDB benchmark data for integrating HammerDB into service based cloud environments.

Configuring and starting the Webservice

The webservice configuration is set in the generic.xml file in the config directory. ws_port defines on which port to start the HTTP service, whilst sqlite_db sets the location of the SQLite repository database. The default value of TMP allows HammerDB to find a suitable temp directory in which to store the database.  if :memory: is given an in-memory database will be used, however some functionality such as the storing  of timing data is not available (because multiple threads cannot open the same in-memory database).

<webservice>
<ws_port>8080</ws_port> 
<sqlite_db>TMP</sqlite_db> 
</webservice>

To start hammerdb run the ./hammerdbws command on Linux or hammerdbws.bat on Windows.

./hammerdbws

Navigating to the defined port will show the Web Service interface, and clicking on the API link will return the available commands.

 

HammerDB Web Service Interface

In addition to the HTTP interface, the web service interface also provides a CLI interface to the service. This CLI interface translates the existing CLI commands to call the HTTP API whilst translating the output to JSON format.  At v4.3 the web service CLI supports all commands support by the standard CLI except for primary/replica features meaning switchmode and steprun are not available.

The web service CLI means that direct HTTP calls and CLI commands can be mixed whilst storing all output in the SQLite repository for querying at a later time.

Running a schema build

The following PostgreSQL build script will run in both the standard and web service CLI.

dbset db pg
dbset bm TPC-C
diset tpcc pg_count_ware 20
diset tpcc pg_num_vu 4
diset tpcc pg_superuser postgres
diset tpcc pg_superuserpass postgres
buildschema

Running in the web service CLI we can see that the command output is returned to the user, however the virtual user output is no longer returned and instead a jobid is given.

./hammerdbws 
HammerDB Web Service v4.3
Copyright (C) 2003-2021 Steve Shaw
Type "help" for a list of commands
The xml is well-formed, applying configuration
Initialized new SQLite on-disk database /tmp/hammer.DB
Starting HammerDB Web Service on port 8080
Listening for HTTP requests on TCP port 8080
hammerws>source pgbuild.tcl
{"success": {"message": "Database set to PostgreSQL"}}
{"success": {"message": "Benchmark set to TPC-C for PostgreSQL"}}
{"success": {"message": "Changed tpcc:pg_count_ware from 1 to 20 for PostgreSQL"}}
{"success": {"message": "Changed tpcc:pg_num_vu from 1 to 4 for PostgreSQL"}}
{"success": {"message": "Value postgres for tpcc:pg_superuserpass is the same as existing value postgres, no change made"}}
{"success": {"message": "Building 20 Warehouses with 5 Virtual Users, 4 active + 1 Monitor VU(dict value pg_num_vu is set to 4): JOBID=619CF6B05D1703E263931333"}}

The jobids can be queried directly over HTTP

List Jobs

and also for example the status of the current job.

Job status

This same output is available at the CLI prompt using the jobs command.

hammerws>jobs
["619CF6B05D1703E263931333"]

hammerws>jobs 619CF6B05D1703E263931333 status
[
"0",
"Ready to create a 20 Warehouse PostgreSQL TPROC-C schema\nin host LOCALHOST:5432 sslmode PREFER under user TPCC in database TPCC?",
"0",
"Vuser 1:RUNNING",
"0",
"Vuser 2:RUNNING",
"0",
"Vuser 3:RUNNING",
"0",
"Vuser 4:RUNNING",
"0",
"Vuser 5:RUNNING"
]

hammerws>jobs 619CF6B05D1703E263931333 1
[
"1",
"Monitor Thread",
"1",
"CREATING TPCC SCHEMA",
"1",
"CREATING DATABASE tpcc under OWNER tpcc",
"1",
"CREATING TPCC TABLES",
"1",
"Loading Item",
"1",
"Loading Items - 10000",
"1",
"Loading Items - 20000",
"1",
"Loading Items - 30000",
"1",
"Loading Items - 40000",
"1",
"Loading Items - 50000",
"1",
"Loading Items - 60000",
"1",
"Loading Items - 70000",
"1",
"Loading Items - 80000",
"1",
"Loading Items - 90000",
"1",
"Loading Items - 100000",
"1",
"Item done",
"1",
"Monitoring Workers...",
"1",
"Workers: 4 Active 0 Done"
]

In this example, we use the HTTP interface to verify the output that the build completed successfully.

Build Job Complete

Running a Test

The following script is also recognisable as a valid CLI script with the addition of jobs commands to retrieve the test output.

dbset db pg
print dict
diset tpcc pg_superuser postgres
diset tpcc pg_defaultdbase postgres
diset tpcc pg_driver timed
diset tpcc pg_rampup 1
diset tpcc pg_duration 2
diset tpcc pg_timeprofile true
tcset refreshrate 10
loadscript
foreach z {1 2 4 8} {
puts "$z VU TEST"
vuset vu $z
vucreate
tcstart
set jobid [ vurun ]
runtimer 200
tcstop
jobs $jobid result
jobs $jobid timing
vudestroy
}
puts "JOB SEQUENCE ENDED"

Note that in this example using runtimer to automate the tests means that the interactive prompt will not return whilst the test is running, however it is still possible to query the jobs using the HTTP interface whilst running.

Job status

Running this test script returns the following output.  Note that the job result and timing data is now queried from the SQLite repository during the script run.

hammerws>source pgrun.tcl
{"success": {"message": "Database set to PostgreSQL"}}
{
"connection": {
"pg_host": "localhost",
"pg_port": "5432",
"pg_sslmode": "prefer"
},
"tpcc": {
"pg_count_ware": "1",
"pg_num_vu": "1",
"pg_superuser": "postgres",
"pg_superuserpass": "postgres",
"pg_defaultdbase": "postgres",
"pg_user": "tpcc",
"pg_pass": "tpcc",
"pg_dbase": "tpcc",
"pg_tspace": "pg_default",
"pg_vacuum": "false",
"pg_dritasnap": "false",
"pg_oracompat": "false",
"pg_storedprocs": "false",
"pg_partition": "false",
"pg_total_iterations": "10000000",
"pg_raiseerror": "false",
"pg_keyandthink": "false",
"pg_driver": "test",
"pg_rampup": "2",
"pg_duration": "5",
"pg_allwarehouse": "false",
"pg_timeprofile": "false",
"pg_async_scale": "false",
"pg_async_client": "10",
"pg_async_verbose": "false",
"pg_async_delay": "1000",
"pg_connect_pool": "false"
}
}
{"success": {"message": "Value postgres for tpcc:pg_defaultdbase is the same as existing value postgres, no change made"}}
{"success": {"message": "Set driver script to timed, clearing Script, reload script to activate new setting"}}
{"success": {"message": "Changed tpcc:pg_rampup from 2 to 1 for PostgreSQL"}}
{"success": {"message": "Changed tpcc:pg_duration from 5 to 2 for PostgreSQL"}}
{"success": {"message": "Changed tpcc:pg_timeprofile from false to true for PostgreSQL"}}
{"success": {"message": "Transaction Counter refresh rate set to 10"}}
{"success": {"message": "script loaded"}}
1 VU TEST
{"success": {"message": "Virtual users set to 1"}}
{"success": {"message": "2 Virtual Users Created with Monitor VU"}}
{"success": {"message": "Transaction Counter Thread Started"}}
{"success": {"message": "Running Virtual Users: JOBID=619D05045D1703E213238373"}}
{"success": {"message": "Timer: 1 minutes elapsed"}}
{"success": {"message": "Timer: 2 minutes elapsed"}}
{"success": {"message": "Timer: 3 minutes elapsed"}}
{"success": {"message": "runtimer returned after 183 seconds"}}
{"success": {"message": "Transaction Counter thread running with threadid:tid0x7effcb1c5700"}}{"success": {"message": "Stopping Transaction Counter"}}
[
"619D05045D1703E213238373",
"2021-11-23 15:13:08",
"1 Active Virtual Users configured",
"TEST RESULT : System achieved 8089 NOPM from 18636 PostgreSQL TPM"
]
{
"NEWORD": {
"elapsed_ms": "181149.0",
"calls": "23974",
"min_ms": "1.856",
"avg_ms": "3.664",
"max_ms": "116.224",
"total_ms": "87851.402",
"p99_ms": "7.321",
"p95_ms": "5.624",
"p50_ms": "3.203",
"sd": "2550.005",
"ratio_pct": "48.497"
},
"PAYMENT": {
"elapsed_ms": "181149.0",
"calls": "23990",
"min_ms": "1.412",
"avg_ms": "2.799",
"max_ms": "119.255",
"total_ms": "67138.15",
"p99_ms": "4.552",
"p95_ms": "3.868",
"p50_ms": "2.52",
"sd": "2448.404",
"ratio_pct": "37.062"
},
"DELIVERY": {
"elapsed_ms": "181149.0",
"calls": "2473",
"min_ms": "2.609",
"avg_ms": "5.227",
"max_ms": "33.868",
"total_ms": "12927.033",
"p99_ms": "10.588",
"p95_ms": "8.638",
"p50_ms": "4.569",
"sd": "2507.252",
"ratio_pct": "7.136"
},
"SLEV": {
"elapsed_ms": "181149.0",
"calls": "2452",
"min_ms": "0.7",
"avg_ms": "2.679",
"max_ms": "13.246",
"total_ms": "6569.494",
"p99_ms": "6.72",
"p95_ms": "5.279",
"p50_ms": "2.258",
"sd": "1407.091",
"ratio_pct": "3.627"
},
"OSTAT": {
"elapsed_ms": "181149.0",
"calls": "2407",
"min_ms": "0.182",
"avg_ms": "0.74",
"max_ms": "2.494",
"total_ms": "1781.993",
"p99_ms": "1.75",
"p95_ms": "1.436",
"p50_ms": "0.614",
"sd": "392.984",
"ratio_pct": "0.984"
}
}
{"success": {"message": "vudestroy success"}}
2 VU TEST
{"success": {"message": "Virtual users set to 2"}}
{"success": {"message": "3 Virtual Users Created with Monitor VU"}}
{"success": {"message": "Transaction Counter Thread Started"}}
{"success": {"message": "Running Virtual Users: JOBID=619D05BC5D1703E263930313"}}
{"success": {"message": "Timer: 1 minutes elapsed"}}
{"success": {"message": "Timer: 2 minutes elapsed"}}
{"success": {"message": "Timer: 3 minutes elapsed"}}
{"success": {"message": "runtimer returned after 181 seconds"}}
{"success": {"message": "Transaction Counter thread running with threadid:tid0x7effca74c700"}}{"success": {"message": "Stopping Transaction Counter"}}
[
"619D05BC5D1703E263930313",
"2021-11-23 15:16:12",
"2 Active Virtual Users configured",
"TEST RESULT : System achieved 14102 NOPM from 32446 PostgreSQL TPM"
]
{
"NEWORD": {
"elapsed_ms": "179711.5",
"calls": "20902",
"min_ms": "2.13",
"avg_ms": "3.97",
"max_ms": "150.767",
"total_ms": "82979.222",
"p99_ms": "8.438",
"p95_ms": "5.836",
"p50_ms": "3.293",
"sd": "4482.874",
"ratio_pct": "46.323"
},
"PAYMENT": {
"elapsed_ms": "179711.5",
"calls": "20749",
"min_ms": "1.601",
"avg_ms": "3.56",
"max_ms": "147.367",
"total_ms": "73867.606",
"p99_ms": "6.637",
"p95_ms": "4.614",
"p50_ms": "3.201",
"sd": "4299.084",
"ratio_pct": "41.237"
},
"DELIVERY": {
"elapsed_ms": "179711.5",
"calls": "2114",
"min_ms": "2.789",
"avg_ms": "5.394",
"max_ms": "85.028",
"total_ms": "11401.954",
"p99_ms": "11.38",
"p95_ms": "8.898",
"p50_ms": "4.95",
"sd": "3780.131",
"ratio_pct": "6.365"
},
"SLEV": {
"elapsed_ms": "179711.5",
"calls": "2152",
"min_ms": "0.803",
"avg_ms": "2.459",
"max_ms": "12.131",
"total_ms": "5292.509",
"p99_ms": "6.567",
"p95_ms": "5.241",
"p50_ms": "2.039",
"sd": "1300.169",
"ratio_pct": "2.955"
},
"OSTAT": {
"elapsed_ms": "179711.5",
"calls": "2131",
"min_ms": "0.181",
"avg_ms": "0.631",
"max_ms": "2.649",
"total_ms": "1345.042",
"p99_ms": "1.656",
"p95_ms": "1.331",
"p50_ms": "0.526",
"sd": "329.533",
"ratio_pct": "0.751"
}
}
{"success": {"message": "vudestroy success"}}
4 VU TEST
{"success": {"message": "Virtual users set to 4"}}
{"success": {"message": "5 Virtual Users Created with Monitor VU"}}
{"success": {"message": "Transaction Counter Thread Started"}}
{"success": {"message": "Running Virtual Users: JOBID=619D06735D1703E263039373"}}
{"success": {"message": "Timer: 1 minutes elapsed"}}
{"success": {"message": "Timer: 2 minutes elapsed"}}
{"success": {"message": "Timer: 3 minutes elapsed"}}
{"success": {"message": "runtimer returned after 183 seconds"}}
{"success": {"message": "Transaction Counter thread running with threadid:tid0x7effc974a700"}}{"success": {"message": "Stopping Transaction Counter"}}
[
"619D06735D1703E263039373",
"2021-11-23 15:19:15",
"4 Active Virtual Users configured",
"TEST RESULT : System achieved 22392 NOPM from 51340 PostgreSQL TPM"
]
{
"NEWORD": {
"elapsed_ms": "180723.5",
"calls": "17438",
"min_ms": "2.082",
"avg_ms": "5.08",
"max_ms": "160.272",
"total_ms": "88591.161",
"p99_ms": "10.536",
"p95_ms": "7.33",
"p50_ms": "4.737",
"sd": "5223.001",
"ratio_pct": "48.826"
},
"PAYMENT": {
"elapsed_ms": "180723.5",
"calls": "17340",
"min_ms": "1.815",
"avg_ms": "4.011",
"max_ms": "133.818",
"total_ms": "69556.936",
"p99_ms": "8.386",
"p95_ms": "5.476",
"p50_ms": "3.329",
"sd": "5131.308",
"ratio_pct": "38.335"
},
"DELIVERY": {
"elapsed_ms": "180723.5",
"calls": "1723",
"min_ms": "2.601",
"avg_ms": "7.267",
"max_ms": "130.229",
"total_ms": "12520.634",
"p99_ms": "14.639",
"p95_ms": "11.303",
"p50_ms": "6.758",
"sd": "5597.635",
"ratio_pct": "6.901"
},
"SLEV": {
"elapsed_ms": "180723.5",
"calls": "1738",
"min_ms": "0.75",
"avg_ms": "2.816",
"max_ms": "7.497",
"total_ms": "4893.517",
"p99_ms": "6.261",
"p95_ms": "5.099",
"p50_ms": "2.784",
"sd": "1344.462",
"ratio_pct": "2.697"
},
"OSTAT": {
"elapsed_ms": "180723.5",
"calls": "1697",
"min_ms": "0.164",
"avg_ms": "0.781",
"max_ms": "3.79",
"total_ms": "1324.922",
"p99_ms": "2.155",
"p95_ms": "1.436",
"p50_ms": "0.755",
"sd": "414.512",
"ratio_pct": "0.73"
}
}
{"success": {"message": "vudestroy success"}}
8 VU TEST
{"success": {"message": "Virtual users set to 8"}}
{"success": {"message": "9 Virtual Users Created with Monitor VU"}}
{"success": {"message": "Transaction Counter Thread Started"}}
{"success": {"message": "Running Virtual Users: JOBID=619D072D5D1703E273631383"}}
{"success": {"message": "Timer: 1 minutes elapsed"}}
{"success": {"message": "Timer: 2 minutes elapsed"}}
{"success": {"message": "Timer: 3 minutes elapsed"}}
{"success": {"message": "runtimer returned after 184 seconds"}}
{"success": {"message": "Transaction Counter thread running with threadid:tid0x7effaaffd700"}}{"success": {"message": "Stopping Transaction Counter"}}
[
"619D072D5D1703E273631383",
"2021-11-23 15:22:21",
"8 Active Virtual Users configured",
"TEST RESULT : System achieved 36483 NOPM from 84065 PostgreSQL TPM"
]
{
"PAYMENT": {
"elapsed_ms": "181853.0",
"calls": "12384",
"min_ms": "1.85",
"avg_ms": "6.93",
"max_ms": "378.076",
"total_ms": "85816.379",
"p99_ms": "68.816",
"p95_ms": "11.981",
"p50_ms": "4.779",
"sd": "13572.668",
"ratio_pct": "48.716"
},
"NEWORD": {
"elapsed_ms": "181853.0",
"calls": "12454",
"min_ms": "2.182",
"avg_ms": "5.892",
"max_ms": "412.09",
"total_ms": "73383.456",
"p99_ms": "66.096",
"p95_ms": "10.838",
"p50_ms": "3.967",
"sd": "11271.032",
"ratio_pct": "41.658"
},
"DELIVERY": {
"elapsed_ms": "181853.0",
"calls": "1272",
"min_ms": "1.174",
"avg_ms": "7.539",
"max_ms": "103.49",
"total_ms": "9589.561",
"p99_ms": "65.102",
"p95_ms": "16.171",
"p50_ms": "5.538",
"sd": "8412.093",
"ratio_pct": "5.444"
},
"SLEV": {
"elapsed_ms": "181853.0",
"calls": "1260",
"min_ms": "1.165",
"avg_ms": "2.744",
"max_ms": "18.103",
"total_ms": "3456.896",
"p99_ms": "10.539",
"p95_ms": "7.59",
"p50_ms": "2.09",
"sd": "1989.072",
"ratio_pct": "1.962"
},
"OSTAT": {
"elapsed_ms": "181853.0",
"calls": "1270",
"min_ms": "0.205",
"avg_ms": "0.824",
"max_ms": "9.996",
"total_ms": "1045.855",
"p99_ms": "4.74",
"p95_ms": "2.312",
"p50_ms": "0.516",
"sd": "857.958",
"ratio_pct": "0.594"
}
}
{"success": {"message": "vudestroy success"}}
JOB SEQUENCE ENDED

With all jobs complete the SQLite repository can be queried either over HTTP or from the jobs command.  If the webservice is stopped and restarted an on-disk repository is retained for querying at a later point in time.

hammerws>jobs
[
"619CF6B05D1703E263931333",
"619D05045D1703E213238373",
"619D05BC5D1703E263930313",
"619D06735D1703E263039373",
"619D072D5D1703E273631383"
]
hammerws>

The help jobs command can show the available commands to query the repository.

hammerws>help jobs
jobs - Usage: jobs
list all jobs.

jobs - Usage: jobs [jobid|result|timestamp]
jobid: list VU output for jobid.
result: list result for all jobs.
timestamp: list starting timestamp for all jobs.

jobs jobid - Usage: jobs jobid [bm|db|delete|dict|result|status|tcount|timestamp|timing|vuid]
bm: list benchmark for jobid.
db: list database for jobid.
delete: delete jobid.
dict: list dict for jobid.
result: list result for jobid.
status: list status for jobid.
tcount: list count for jobid.
timestamp: list starting timestamp for jobid.
timing: list xtprof summary timings for jobid.
vuid: list VU output for VU with vuid for jobid.

jobs jobid timing - Usage: jobs jobid timing vuid
timing vuid: list xtprof timings for vuid for jobid.

The CLI jobs result command will query all jobs and return the result if it finds one. In the example below as expected the build job does not have a result but the results are returned for all of the driver jobs.

hammerws>jobs result
[
"619CF6B05D1703E263931333",
"Jobid has no test result"
]
[
"619D05045D1703E213238373",
"2021-11-23 15:13:08",
"1 Active Virtual Users configured",
"TEST RESULT : System achieved 8089 NOPM from 18636 PostgreSQL TPM"
]
[
"619D05BC5D1703E263930313",
"2021-11-23 15:16:12",
"2 Active Virtual Users configured",
"TEST RESULT : System achieved 14102 NOPM from 32446 PostgreSQL TPM"
]
[
"619D06735D1703E263039373",
"2021-11-23 15:19:15",
"4 Active Virtual Users configured",
"TEST RESULT : System achieved 22392 NOPM from 51340 PostgreSQL TPM"
]
[
"619D072D5D1703E273631383",
"2021-11-23 15:22:21",
"8 Active Virtual Users configured",
"TEST RESULT : System achieved 36483 NOPM from 84065 PostgreSQL TPM"
]

An individual job can queried for any related information such as the timestamp when the job started, the configuration dict for the job or as the example below because the transaction counter was started the transaction count data for the run can retrieved based on the jobid.

hammerws>job 619D072D5D1703E273631383 tcount
{"PostgreSQL tpm": {
"0": "2021-11-23 15:22:21",
"76662": "2021-11-23 15:22:31",
"102426": "2021-11-23 15:22:41",
"74574": "2021-11-23 15:22:51",
"71526": "2021-11-23 15:23:01",
"119334": "2021-11-23 15:23:11",
"96852": "2021-11-23 15:23:21",
"59640": "2021-11-23 15:23:31",
"79512": "2021-11-23 15:23:41",
"93750": "2021-11-23 15:23:51",
"92322": "2021-11-23 15:24:01",
"68988": "2021-11-23 15:24:11",
"95556": "2021-11-23 15:24:21",
"97512": "2021-11-23 15:24:31",
"76590": "2021-11-23 15:24:41",
"55236": "2021-11-23 15:24:51",
"97254": "2021-11-23 15:25:01",
"98364": "2021-11-23 15:25:11",
"95784": "2021-11-23 15:25:21"
}}

Summary

It should be clear that the v4.3 Webservice functionality provides a building block to integrate HammerDB functionality into wider webservice environments. The addition of the CLI provides an easy to use command line interface to the functionality whilst the SQLite repository and output in JSON format enables HammerDB workloads to be automated with results, output, timing data and configuration all able to be retrieved for a particular job at any further point in time.

HammerDB v4.3 New Features Pt1: Graphical Metrics for PostgreSQL

Introducing the PostgreSQL performance metrics viewer

Prior to  version 4.3, HammerDB included a graphical performance metrics view for the Oracle database only. At v4.3 HammerDB includes the same functionality for PostgreSQL enabling the user to drill down on database metrics in real time.  Additionally, using the Active Session History functionality, it is possible to select a previous time period of statistics in the graph and view the PostgreSQL metrics for this earlier period of time.  This enables the user to compare and contrast performance across different benchmark scenarios.

PostgreSQL Graphical Metrics

Install pg_stat_statements and pg_sentinel extensions

To use the PostgreSQL graphical metrics, it is necessary to install the pg_stat_statements and pg_sentinel extensions first in the database to be viewed.

This example uses a PostgreSQL source based build with the pg_stat_statements extension found in the contrib directory.  Running make and make install in this directory installs the extension.

/postgresql-14.1/contrib/pg_stat_statements$ make
...
make[1]: Entering directory '/opt/postgresql-4.1/src/backend'
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -O2 -fPIC -shared -o pg_stat_statements.so pg_stat_statements.o -L../../src/port -L../../src/common -Wl,--as-needed -Wl,-rpath,'/opt/postgresql-14.1/lib',--enable-new-dtags -lm 
...
./postgresql-14.1/contrib/pg_stat_statements$ make install
...
/usr/bin/install -c -m 644 ./pg_stat_statements--1.4.sql ./pg_stat_statements--1.8--1.9.sql ./pg_stat_statements--1.7--1.8.sql ./pg_stat_statements--1.6--1.7.sql ./pg_stat_statements--1.5--1.6.sql ./pg_stat_statements--1.4--1.5.sql ./pg_stat_statements--1.3--1.4.sql ./pg_stat_statements--1.2--1.3.sql ./pg_stat_statements--1.1--1.2.sql ./pg_stat_statements--1.0--1.1.sql '/opt/postgresql-14.1/share/extension/'
...

To build pgsentinel it is necessary to download it from github and to run make and make install as we did previously for pg_stat_statements.

git clone https://github.com/pgsentinel/pgsentinel.git
cd pgsentinel/src
opt/postgresql-14.1/pgsentinel/src$ make
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -O2 -fPIC -I. -I./ -I/opt/postgresql-14.1/include/server -I/opt/postgresql-14.1/include/internal -D_GNU_SOURCE -c -o pgsentinel.o pgsentinel.c
/opt/postgresql-4.1/pgsentinel/src$ make install
/bin/mkdir -p '/opt/postgresql-14.1/lib'
...

Add the following example entries to the postgresql.conf

shared_preload_libraries = 'pg_stat_statements,pgsentinel'
track_activity_query_size=2048
pg_stat_statements.save=on
pg_stat_statements.track=all
pgsentinel_pgssh.enable = true
pgsentinel_ash.pull_frequency = 1
pgsentinel_ash.max_entries = 1000000

start the database and then login and create the extensions as follows

postgres=# create extension pg_stat_statements;
CREATE EXTENSION
postgres=# create extension pgsentinel;
CREATE EXTENSION

If the build went according to plan then the  pg_active_session_history table will be found.

postgres=# select * from pg_active_session_history;
ash_time | datid | datname | pid | leader_pid | usesysid | usename | applicatio
n_name | client_addr | client_hostname | client_port | backend_start | xact_star
t | query_start | state_change | wait_event_type | wait_event | state | backend_
xid | backend_xmin | top_level_query | query | cmdtype | queryid | backend_type 
| blockers | blockerpid | blocker_state 
----------+-------+---------+-----+------------+----------+---------+-----------
-------+-------------+-----------------+-------------+---------------+----------
--+-------------+--------------+-----------------+------------+-------+---------
----+--------------+-----------------+-------+---------+---------+--------------
+----------+------------+---------------
(0 rows)

Starting the PostgreSQL Metrics

With the pg_stat_statements and pg_sentinel extensions installed, it is now possible to start the HammerDB graphical metrics to view the Active Session History.

Under the tree view there is the Metrics entry enabling the editing of options and to start the display.  Note that if a workload is running and the treeview is locked, it is also possible to access the options from the Options menu and to start the metrics from the Metrics button.

PostgreSQL Metrics treeview

Clicking on options, it shows the login credentials for the PostgreSQL superuser.  The Agent ID and Hostname are identical to the metrics option at earlier HammerDB versions, enabling the viewing of CPU performance by connecting to a running agent.

PostgreSQL Metrics Options

By clicking on the Metrics Display option, HammerDB will connect to the target database and start monitoring the performance, note that if the pg_active_session_history table is empty the viewer will report an error and refuse to start.

No rows found

If the pg_active_session_history table is populated (after running or having previously run a workload) the Metrics tab will become activated.

Started Metrics

Viewing PostgreSQL Metrics for a HammerDB workload

For full performance details, grab the Metrics tab and pull it out of the HammerDB window to expand.

Drag out Metrics tab

The example shows a TPROC-C workload running with 4 Active Virtual Users. The example has a configuration set to illustrate a number of wait events and has therefore not been configured for performance.

Workload running

The PostgreSQL performance metrics will automatically start to be populated in the viewer.  The options enable the user to drill down on the SQL running in the database, the wait events and the user statistics.  The example shows we have 4 users called tpcc running the SQL from the TPROC-C workload with key WALWrite and CPU events.  The events are colour coded and indexed in the graph to the wait event groups.

Metrics view for benchmark

When a benchmark workload has completed, use the selection tool in the graph to select the metrics for a period of time of interest. The viewer will be populated with the metrics for that specific period of time.  Clicking on the SQL, events or user will display the output relevant to that selection. The SQL entry gives further options of sql text, io and stats, the wait events show the SQL that caused that wait event and the users show a summary of statistics for that user. The following example shows the SQL view.

Active Session History SQL view

The following example shows the LWLock WalWrite event, illustrating the top SQL that caused that event.

Active Session History event view

Closing the Window will return it to the main GUI showing the performance graph summary.

Summary display

Pressing stop on the Metrics button will close the PostgreSQL metrics viewer.

Summary

The HammerDB graphical metrics viewer in v4.3 adds the functionality to view the PostgreSQL active session history for benchmark workloads, enabling the user to find and diagnose bottlenecks in hardware and software configurations in a PostgreSQL environment.  Further information on CPU and database metrics can be found in the documentation.

 

HammerDB: Using MySQL 5.7 vs 8.0 to understand performance profiles

One of the most important concepts in analysing database performance is that of understanding scalability. When a system ‘scales’ it is able to deliver higher levels of performance proportional to the system resources available to it.  In particular, this means as we add CPU cores and increase the system load, we see higher performance. The way we measure this is through a ‘performance profile’. This simply means that we run a series of tests incrementally increasing the system load until we find the peak performance. Plotting these data points enables us to understand the scalability of the database software being tested on that system.

In this example, we will compare MySQL 5.7.33 and MySQL 8.0.25 with the TPROC-C workload on a system with 2 sockets of Intel Xeon 8280L that means we have 28 cores per socket, with 56 physical CPUs and 112 logical CPUs with Hyper-Threading.

So as a first step we will install MySQL 5.7.33 and MySQL 8.0.25 and build a schema on both. In this example, we will use the HammerDB CLI and build the schema with the script as follows in both databases.

dbset db mysql
dbset bm TPC-C
vuset logtotemp 1
diset connection mysql_socket /tmp/mysql.sock
diset tpcc mysql_count_ware 800
diset tpcc mysql_num_vu 64
diset tpcc mysql_partition true
buildschema
waittocomplete
quit

Then we can run an interactive workload for a single Virtual User as follows for MySQL 8.0.25

./hammerdbcli
HammerDB CLI v4.2
Copyright (C) 2003-2021 Steve Shaw
Type "help" for a list of commands
The xml is well-formed, applying configuration
hammerdb>dbset db mysql
Database set to MySQL
hammerdb>diset tpcc mysql_driver timed
Clearing Script, reload script to activate new setting
Script cleared
Changed tpcc:mysql_driver from test to timed for MySQL
hammerdb>vuset logtotemp 1
hammerdb>vuset unique 1
hammerdb>vuset vu 1
hammerdb>loadscript
Script loaded, Type "print script" to view
hammerdb>vucreate
Vuser 1 created MONITOR - WAIT IDLE
Vuser 2 created - WAIT IDLE
Logging activated
to /tmp/hammerdb_60F169935C7303E293236333.log
2 Virtual Users Created with Monitor VU
hammerdb>vurun
Vuser 1:RUNNING
Vuser 1:Beginning rampup time of 2 minutes
Vuser 2:RUNNING
Vuser 2:Processing 10000000 transactions with output suppressed...
...
hammerdb>Vuser 1:Rampup 1 minutes complete ...
Vuser 1:Rampup 2 minutes complete ...
Vuser 1:Rampup complete, Taking start Transaction Count.
Vuser 1:Timing test period of 5 in minutes
Vuser 1:1 ...,
Vuser 1:2 ...,
Vuser 1:3 ...,
Vuser 1:4 ...,
Vuser 1:5 ...,
Vuser 1:Test complete, Taking end Transaction Count.
Vuser 1:1 Active Virtual Users configured
Vuser 1:TEST RESULT : System achieved 25469 NOPM from 76873 MySQL TPM
Vuser 1:FINISHED SUCCESS
Vuser 2:FINISHED SUCCESS
ALL VIRTUAL USERS COMPLETE

and MySQL 5.7.33

hammerdb>dbset db mysql
Database set to MySQL
hammerdb>diset tpcc mysql_driver timed
Clearing Script, reload script to activate new setting
Script cleared
Changed tpcc:mysql_driver from test to timed for MySQL
hammerdb>vuset logtotemp 1
hammerdb>vuset unique 1
hammerdb>loadscript
Script loaded, Type "print script" to view
hammerdb>vuset vu 1
hammerdb>vucreate
Vuser 1 created MONITOR - WAIT IDLE
Vuser 2 created - WAIT IDLE
Logging activated
to /tmp/hammerdb_60F183575C7303E273030333.log
2 Virtual Users Created with Monitor VU
hammerdb>vurun
Vuser 1:RUNNING
Vuser 1:Beginning rampup time of 2 minutes
Vuser 2:RUNNING
Vuser 2:Processing 10000000 transactions with output suppressed...
hammerdb>Vuser 1:Rampup 1 minutes complete ...
Vuser 1:Rampup 2 minutes complete ...
Vuser 1:Rampup complete, Taking start Transaction Count.
Vuser 1:Timing test period of 5 in minutes
Vuser 1:1 ...,
Vuser 1:2 ...,
Vuser 1:3 ...,
Vuser 1:4 ...,
Vuser 1:5 ...,
Vuser 1:Test complete, Taking end Transaction Count.
Vuser 1:1 Active Virtual Users configured
Vuser 1:TEST RESULT : System achieved 27154 NOPM from 82383 MySQL TPM
Vuser 1:FINISHED SUCCESS
Vuser 2:FINISHED SUCCESS
ALL VIRTUAL USERS COMPLETE

Extracting the NOPM value from the CLI or from the log file, we found that MySQL 8.0.25 returned 25469 NOPM to MySQL 5.7.33’s 27154 NOPM meaning the older version of MySQL gave a slightly better result.

So does this mean that we can conclude that MySQL 5.7 is 1.06X faster than MySQL 8.0? Of course not, we have only run a single Virtual User and therefore not tested MySQL’s ability to manage multiple sessions running the same workload concurrently.  Therefore, again using the interactive shell let us now run more tests, increasing the Virtual User count to 20.

Now at 20 Virtual Users MySQL 8.0 is at 390014 TPM with MySQL 5.7 at 379972 putting MySQL 8.0 at 1.02X better than MySQL 5.7. However, again this might not be giving us the full picture, so let’s run a fully automated test up to 120 Virtual Users using the script as follows:

puts "MySQL Test Started"
dbset db mysql
dbset bm TPC-C
diset connection mysql_socket /tmp/mysql.sock
diset tpcc mysql_driver timed
diset tpcc mysql_rampup 2
diset tpcc mysql_duration 5
vuset logtotemp 1
vuset unique 1
loadscript
foreach z {1 2 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100 104 108 112 116 120} {
puts "$z VU test"
vuset vu $z
vucreate
vurun
runtimer 480
vudestroy
}
puts "MySQL Test Complete"

When we graph the result, we now get a much better picture of the full capabilities of the MySQL 5.7 and 5.8 software running on the same server.

Now we can see that MySQL 8.0 reaches peak performance at 1006469 NOPM at 112 Virtual Users, with MySQL 5.7 peaking at 666407 at 68 Virtual Users, putting MySQL 8.0.25 at 1.51X over MySQL 5.7.33.

It should be clear from the performance profile that the difference in performance can be attributed to the scalability of the database software, in this case MySQL. This is the same server, same OS and same HammerDB client, the only difference is in the server software.

Crucially HammerDB can also run against Oracle, SQL Server, Db2, MariaDB and PostgreSQL on Windows and Linux, so we know that the capabilities of the server with commercial database software is actually a lot higher and also that the capabilities of the HammerDB client is also a lot higher – it is the database software and the database software alone that limits the scalability in this case.

Also important is the user experience, so we have also gathered the transaction response times with the summaries shown below for both MySQL versions at the peak performance of MySQL 5.7.33.

MySQL 8.0.25

Vuser 1:68 Active Virtual Users configured
Vuser 1:TEST RESULT : System achieved 841337 NOPM from 2550191 MySQL TPM
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>>>> SUMMARY OF 68 ACTIVE VIRTUAL USERS : MEDIAN ELAPSED TIME : 403363ms
>>>>> PROC: NEWORD
CALLS: 5775293 MIN: 0.810ms AVG: 2.401ms MAX: 57.323ms TOTAL: 13867195.952ms
P99: 4.005ms P95: 3.698ms P50: 2.316ms SD: 792.316 RATIO: 50.557%
>>>>> PROC: PAYMENT
CALLS: 5775133 MIN: 0.435ms AVG: 1.042ms MAX: 56.802ms TOTAL: 6022416.890ms
P99: 2.052ms P95: 1.576ms P50: 0.978ms SD: 401.213 RATIO: 21.957%
>>>>> PROC: DELIVERY
CALLS: 578532 MIN: 4.001ms AVG: 8.929ms MAX: 67.479ms TOTAL: 5165817.194ms
P99: 12.438ms P95: 11.282ms P50: 9.063ms SD: 1831.663 RATIO: 18.834%
>>>>> PROC: SLEV
CALLS: 578436 MIN: 0.752ms AVG: 2.468ms MAX: 56.045ms TOTAL: 1427926.455ms
P99: 3.585ms P95: 3.230ms P50: 2.459ms SD: 560.309 RATIO: 5.206%
>>>>> PROC: OSTAT
CALLS: 576834 MIN: 0.286ms AVG: 0.955ms MAX: 39.063ms TOTAL: 551043.284ms
P99: 1.796ms P95: 1.432ms P50: 0.912ms SD: 334.873 RATIO: 2.009%
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

MySQL 5.7.33

Vuser 1:68 Active Virtual Users configured
Vuser 1:TEST RESULT : System achieved 655295 NOPM from 1986131 MySQL TPM
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>>>> SUMMARY OF 68 ACTIVE VIRTUAL USERS : MEDIAN ELAPSED TIME : 403998ms
>>>>> PROC: NEWORD
CALLS: 4542033 MIN: 0.714ms AVG: 3.082ms MAX: 42.174ms TOTAL: 14000329.326ms
P99: 5.124ms P95: 4.720ms P50: 3.001ms SD: 994.571 RATIO: 50.962%
>>>>> PROC: DELIVERY
CALLS: 453821 MIN: 3.020ms AVG: 13.931ms MAX: 58.857ms TOTAL: 6322207.859ms
P99: 19.838ms P95: 17.926ms P50: 13.975ms SD: 2826.392 RATIO: 23.013%
>>>>> PROC: PAYMENT
CALLS: 4544693 MIN: 0.352ms AVG: 1.157ms MAX: 37.108ms TOTAL: 5261866.428ms
P99: 2.231ms P95: 1.734ms P50: 1.096ms SD: 429.929 RATIO: 19.154%
>>>>> PROC: SLEV
CALLS: 455338 MIN: 0.686ms AVG: 2.407ms MAX: 54.447ms TOTAL: 1096000.372ms
P99: 3.253ms P95: 3.048ms P50: 2.381ms SD: 501.128 RATIO: 3.990%
>>>>> PROC: OSTAT
CALLS: 454731 MIN: 0.252ms AVG: 0.946ms MAX: 28.175ms TOTAL: 430240.916ms
P99: 1.856ms P95: 1.458ms P50: 0.905ms SD: 321.024 RATIO: 1.566%
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

What we can see is that with MySQL 8.0.25 we have both higher throughput and lower response times. MySQL 8.0 therefore offers a higher capacity for load than MySQL 5.7 on this server.

It should also be clear that testing on a single socket environment may not give you an accurate comparison of database software designed to take advantage of multiple CPU sockets. In this case we tested MySQL vs MySQL however you should exercise caution in drawing conclusions about database software capabilities without having first tested in a server environment.

HammerDB v4.2 New Features Pt2: Increase of TPROC-C Schema Size Limits

Prior to version 4.2 the limits to the size of the schema that could be generated was set to 5000 warehouses in the GUI and 30,000 if using the datagen feature to generate flat files for external loading. From version 4.2 these limits have been increased to 100,000 for both features. This post gives a background to the schema size limits, guidance on the warehouse count and expected size limits. It also shows how to manually exceed the 100,000 limit if you wish.

Up to version 4.1 when using the GUI the number of warehouses is adjusted by a spinbox with an upper limit set to 5000. You could manually enter a figure larger, but would then receive the error shown that 5000 was the limit.

This was often interpreted as a limit of the amount of data that HammerDB could generate. Instead, however, this was an imposed limit to encourage right-sizing of the test database instead of over-sizing.

The number of warehouses is passed as one of the parameters in the last line of the build script that has no limit.  In the example below, the value 5000 can be seen as the fourth argument to the function do_tpcc. If you stop the build, manually modify this value and re-run the build, it will generate the number of warehouses you have defined. This is the same for the previous 5000 limit or the current 100,000 limit.

However, the precision of some numeric data types may not have been sufficient for larger values in some databases. For example, for Oracle as shown “W_ID” of the table WAREHOUSE was set to the NUMBER data type with scale and precision of (4,0) meaning that the maximum number of warehouses permitted was 9999 before Oracle would generate an error.

"CREATE TABLE WAREHOUSE (W_ID NUMBER(4, 0), W_YTD NUMBER(12, 2), W_TAX NUMBER(4, 4), W_NAME VARCHAR2(10), W_STREET_1 VARCHAR2(20), W_STREET_2 VARCHAR2(20), W_CITY VARCHAR2(20), W_STATE CHAR(2), W_ZIP CHAR(9)) INITRANS 4 PCTFREE 99 PCTUSED 1"

All databases have been checked and updated for v4.2 to ensure that schema builds of up to 100,000 warehouses will complete without error.

Regarding the amount of disk space that should be reserved for a schema build, the general guidance is to allow up to 100MB per warehouse. Typically, not all of this space will be needed and varies per database, however it should not exceed this value. Therefore, for 100,000 warehouses, 10TB of space should be allowed.

However, be aware that just because you can build schemas of up to 100,000 warehouses, it does not necessarily mean that this will right for you. With the default workload, each Virtual User will process 90% of its transactions against its home warehouse selected at random at the start of the test. Consequently, as shown if you are running 2 Virtual Users it makes little difference if you have created, 10, 1000, 10,000 or 100,000 warehouses, aside from the 2 selected at random most of that data will remain idle and will not affect the result of the test.  You should create a schema large enough to allow an even, uncontended, random distribution of Virtual Users for the maximum Virtual User count you will expect to create.

For further information on right sizing, see the post:

How Many Warehouses for the HammerDB TPC-C Test?

Where larger warehouse counts are of most benefit is when using advanced driver script options such as use all warehouses and event driven scaling.  These are areas where creating larger schema sizes can add additional benefit and insight to your testing above and beyond the default schema installations.

HammerDB v4.2 New Features Pt1: MariaDB Build and Test Example with the CLI

Prior to HammerDB v4.2 support for MariaDB has been provided by using the MySQL interface. However, as the two databases diverged from a shared common codebase, this presented a challenge in that it was not possible to modify the workload for MySQL or MariaDB without also changing the other.  Also, the additional installation of the MySQL client libraries was needed when testing MariaDB.  HammerDB v4.2 adds full support for MariaDB in both the GUI as shown:

And in the CLI as a separate database from MySQL, this post describes a setup and test for MariaDB using the CLI on Linux as an example running the TPROC-C workload.

First, download your favourite release and format of MariaDB from https://downloads.mariadb.org/. We will use the Linux generic file mariadb-10.2.34-linux-x86_64.tar.gz of release 10.2.34 and extract it into a directory such as /opt. We have added a my.cnf file to the file listing.

/opt/mariadb-10.2.34-linux-x86_64$ ls
bin EXCEPTIONS-CLIENT man README-wsrep support-files
COPYING include my.cnf scripts THIRDPARTY
CREDITS INSTALL-BINARY mysql-test share
data lib README.md sql-bench

Also install HammerDB v4.2 if you have not done so already and run the librarycheck command from the CLI.  As we have not told HammerDB where to find the MariaDB libraries, it will generate an error telling you to add the location of these libraries to your LIBRARY_PATH.

hammerdb>librarycheck
....
Checking database library for MariaDB
Error: failed to load mariatcl - couldn't load file "/home/HammerDB-4.2/lib/mariatcl0.1/libmariatcl0.1.so": libmariadb.so.3: cannot open shared object file: No such file or directory
Ensure that MariaDB client libraries are installed and the location in the LD_LIBRARY_PATH environment variable

Note that the error tells us we are missing the file libmariadb.so.3. Advanced users can check this using the ldd command on the HammerDB library file itself.

~/HammerDB-4.2/lib/mariatcl0.1$ ldd libmariatcl0.1.so
linux-vdso.so.1 (0x00007ffd3edff000)
libmariadb.so.3 => not found
...

If we look in the lib directory of the MariaDB installation, we can find this file

/opt/mariadb-10.2.34-linux-x86_64/lib$ ls
galera libmariadb.so.3 libmysqlclient.so libmysqlservices.a
libgalera_smm.so libmysqlclient.a libmysqld.a pkgconfig
libmariadbclient.a libmysqlclient_r.a libmysqld.so plugin
libmariadb.so libmysqlclient_r.so libmysqld.so.19

So lets add it to the LIBRARY_PATH and retrace our steps.

$ export LD_LIBRARY_PATH=/opt/mariadb-10.2.34-linux-x86_64/lib:$LD_LIBRARY_PATH

Now using ldd we can see that the library can be found right in the directory we added to the LIBRARY_PATH.

~/HammerDB-4.2/lib/mariatcl0.1$ ldd libmariatcl0.1.so
linux-vdso.so.1 (0x00007fff613f3000)
libmariadb.so.3 => /opt/mariadb-10.2.34-linux-x86_64/lib/libmariadb.so.3 (0x00007fa177941000)
...

and now librarycheck works as it knows where this file is, meaning HammerDB is ready to start testing MariaDB.

~/HammerDB-4.2$ ./hammerdbcli
HammerDB CLI v4.2
Copyright (C) 2003-2021 Steve Shaw
Type "help" for a list of commands
The xml is well-formed, applying configuration
hammerdb>librarycheck
...
Checking database library for MariaDB
Success ... loaded library mariatcl for MariaDB

As noted previously, we’ve added a my.cnf file to the MariaDB directory and tuned this for performance rather than resilience.  (A future blog post on performance profiles will explain why this is a good first test on which to baseline other changes)

$ more my.cnf
[mysqld]
skip-log-bin
lc_messages_dir=/opt/mariadb-10.2.34-linux-x86_64/share/english
datadir=/opt/mariadb-10.2.34-linux-x86_64/data
default_authentication_plugin=mysql_native_password
socket=/tmp/mariadb10234.sock
port=3306
log-error=/tmp/mariadb10234.log
bind_address=localhost
# general
max_connections=1000
table_open_cache=2000
table_open_cache_instances=16
back_log=1500
#default_password_lifetime=0
ssl=0
performance_schema=OFF
max_prepared_stmt_count=12800
skip_log_bin=1
character_set_server=latin1
collation_server=latin1_swedish_ci
transaction_isolation=REPEATABLE-READ
# files
#innodb_file_per_table
innodb_log_file_size=16384M
innodb_open_files=1000
# buffers
innodb_buffer_pool_size=64000M
innodb_buffer_pool_instances=16
innodb_log_buffer_size=64M
# tune
innodb_doublewrite=0
innodb_thread_concurrency=0
innodb_flush_log_at_trx_commit=0
innodb_max_dirty_pages_pct=90
innodb_max_dirty_pages_pct_lwm=10
join_buffer_size=32K
sort_buffer_size=32K
innodb_use_native_aio=1
innodb_stats_persistent=1
innodb_spin_wait_delay=6
innodb_max_purge_lag_delay=300000
innodb_max_purge_lag=0
innodb_flush_method=O_DIRECT_NO_FSYNC
#innodb_checksum_algorithm=none
innodb_io_capacity=4000
innodb_io_capacity_max=20000
innodb_lru_scan_depth=9000
innodb_change_buffering=none
innodb_read_only=0
#innodb_page_cleaners=2
#innodb_undo_log_truncate=off
# perf special
innodb_adaptive_flushing=1
innodb_flush_neighbors=0
innodb_read_io_threads=16
innodb_write_io_threads=16
innodb_purge_threads=4
innodb_adaptive_hash_index=0
# monitoring
innodb_monitor_enable='%'

Next we will install the database and start MariaDB

./scripts/mysql_install_db --defaults-file=/opt/mariadb-10.6.3-linux-x86_64/my.cnf --user=mysql
./bin/mysqld --defaults-file=./my.cnf

Finally, we will set the root password

./bin/mysql -uroot -S/tmp/mariadb10234.sock
...
MySQL [(none)]> SET PASSWORD FOR 'root'@localhost = PASSWORD("mysql");

At this point, we are ready to start testing MariaDB with HammerDB so lets create 2 scripts, a build script and a run script. For the build script the example sets the socket name we have used and opts to create 800 warehouses with 64 virtual users in a partitioned schema.

dbset db maria
dbset bm TPC-C
diset connection maria_socket /tmp/mariadb10234.sock
diset tpcc maria_count_ware 800
diset tpcc maria_num_vu 64
diset tpcc maria_partition true
buildschema
waittocomplete
quit

Save the script as mysqlbuild.tcl and run as follows:

./hammerdbcli auto mysqlbuild.tcl

Wait for the schema build to complete. In the example below it took just over 14 minutes to create 800 warehouses.

Hammerdb Log @ Wed Jul 07 07:33:23 PDT 2021
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
Vuser 1:Monitor Thread
Vuser 1:CREATING TPCC SCHEMA
Vuser 1:CREATING DATABASE tpcc
Vuser 1:CREATING TPCC TABLES
Vuser 1:Loading Item
Vuser 2:Worker Thread
Vuser 2:Waiting for Monitor Thread...
Vuser 2:Loading 13 Warehouses start:1 end:13
Vuser 2:Start:Wed Jul 07 07:33:24 PDT 2021
Vuser 2:Loading Warehouse
Vuser 2:Loading Stock Wid=1
...
Vuser 30:Orders Done
Vuser 30:Loading Orders for D=10 W=377
Vuser 30:...1000
Vuser 30:...2000
Vuser 30:...3000
Vuser 30:Orders Done
Vuser 30:End:Wed Jul 14 07:17:36 PDT 2021
Vuser 1:Workers: 0 Active 64 Done
Vuser 1:CREATING TPCC STORED PROCEDURES
Vuser 1:GATHERING SCHEMA STATISTICS
Vuser 1:TPCC SCHEMA COMPLETE
Vuser 1:Workers: 0 Active 64 Done
Vuser 1:CREATING TPCC STORED PROCEDURES
Vuser 1:GATHERING SCHEMA STATISTICS

Next we are ready to run the test. In the example below, we want to collect a performance profile so will run tests without interruption from 1 to 100 Virtual Users creating a unique log file for each test.

puts "MariaDB 10.2.34 Test Started"
dbset db maria
dbset bm TPC-C
diset connection maria_socket /tmp/mariadb10234.sock
diset tpcc maria_driver timed
diset tpcc maria_rampup 2
diset tpcc maria_duration 5
vuset logtotemp 1
vuset unique 1
loadscript
foreach z {1 2 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100} {
puts "$z VU test"
vuset vu $z
vucreate
vurun
runtimer 480
vudestroy
}
puts "MariaDB 10.2.34 Test Complete"

The script is run as follows:

./hammerdbcli auto mysqlrun.tcl

and can then be left without intervention to complete the test. On completion in the /tmp directory are a number of output files with the NOPM and TPM results of the test.

Hammerdb Log @ Mon Jul 05 04:36:59 PDT 2021
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
Vuser 1:Beginning rampup time of 2 minutes
Vuser 2:Processing 10000000 transactions with output suppressed...
Vuser 3:Processing 10000000 transactions with output suppressed...
Vuser 4:Processing 10000000 transactions with output suppressed...
Vuser 65:Processing 10000000 transactions with output suppressed...
Vuser 1:Rampup 1 minutes complete ...
Vuser 1:Rampup 2 minutes complete ...
Vuser 1:Rampup complete, Taking start Transaction Count.
Vuser 1:Timing test period of 5 in minutes
Vuser 1:1 ...,
Vuser 1:2 ...,
Vuser 1:3 ...,
Vuser 1:4 ...,
Vuser 1:5 ...,
Vuser 1:Test complete, Taking end Transaction Count.
Vuser 1:64 Active Virtual Users configured
Vuser 1:TEST RESULT : System achieved 669086 NOPM from 2026420 MariaDB TPM

Add these results to your favourite spreadsheet and you have your first performance profile of MariaDB on your system.

Once you have explored the TPROC-C workload you can also use HammerDB to run TPROC-H for analytic workloads and test solutions such as the MariaDB ColumnStore https://mariadb.com/kb/en/mariadb-columnstore/.

Also don’t forget that HammerDB is fully open source, so if you wish to help make HammerDB even better for testing MariaDB you can contribute both ideas and code at https://github.com/TPC-Council/HammerDB.