HammerDB v4.1 New Features Pt1: Extended Time Profiling

Up to HammerDB v4.0 you have had the ability to do time profiling for the first Active Virtual User only.  This post showed you how to graph the transaction response times using this package called etprof.  v4.1 enhances time profiling by introducing a new package called xtprof that enables you to capture timing data for all Active Virtual Users simultaneously. This post will get you started with time profiling in v4.1.

Time profiling of a workload is the process of capturing transaction response times.  Response times give us multiple insights beyond just transaction rates in the form of NOPM and TPM.  NOPM shows us the new orders per minute so the number of new order transactions only.  TPM shows us the user commits and user rollbacks across the whole database, however with both values we are recording the average transaction rate across a minute.  With the transaction counter this shows us how even this transaction rate is across the measured time, however time profiling enables a much finer granular view on the workload of each Virtual User.

Within the TPROC-C workload there is a transaction mix of neword 45%, payment 43%, delivery 4%, order status 4%, stock level 4%.   The transaction to run is selected according to this mix and NOPM records only 45% of this mix. Note that for example New Order is called 45% of the time however the actual time ratio attributed to the transaction could be longer or shorter.

So using the test workload as an example for a single Virtual User they will run a sequence of transactions such as follows.

Vuser 1:order status
Vuser 1:payment
Vuser 1:payment
Vuser 1:new order
Vuser 1:new order
Vuser 1:new order
Vuser 1:stock level
Vuser 1:new order
Vuser 1:payment
Vuser 1:payment
Vuser 1:payment
Vuser 1:new order
Vuser 1:payment
Vuser 1:payment
Vuser 1:order status
Vuser 1:new order
Vuser 1:new order
Vuser 1:payment
Vuser 1:new order
Vuser 1:delivery

Looking at this single Virtual User if for example the Stock Level or Order Status transactions takes longer, then it should be clear they are going to be able to run fewer New Orders in a minute because while they are running other transactions they are not recording any New Order transactions.

However, it is in most cases not just one Virtual User,  instead it is tens, hundreds or thousands running at the same time and the database is managing the concurrency between them. Stock Level for example is querying the district, stock and order_line tables so while other virtual users are inserting and updating these tables with New Order and Payment or deleting with Delivery, Stock Level is doing a longer running query with locks and multiversioning to ensure that the data is consistent.  A very basic database approach for Stock Level would be to lock the tables to ensure consistency, however doing this would block any other transactions from running until Stock Level was complete resulting in low performance overall.  This is the very design of TPC-C specification in that the transactions are intended to do inserts, updates, deletes and queries on the data at the same time to test how well the database engine can manage the concurrency. Time profiling can give you a deeper insight into how well this is managed.

To begin using extended profiling view the settings in the generic.xml file in the config directory. From v4.1 the default profiler will be the newer xtprof, however this can be changed back to etprof to use the earlier single Virtual User profiler.  As xtprof profiles all Virtual Users output now gets written to a dedicated log file and therefore there is the option of whether a unique log name is required, 0 for a regular filename and 1 for a unique id.

<timeprofile>
<profiler>xtprof</profiler>
<xt_unique_log_name>0</xt_unique_log_name>
</timeprofile>

With the profiler setting set to xtprof HammerDB will use the new profiler automatically enabling it for all Virtual Users when selected.

To enable time profiling when configuring the driver script select the time profiling option.

Time Profile Option

In the CLI set the timeprofile option to true.

hammerdb>diset tpcc mssqls_timeprofile true
Changed tpcc:mssqls_timeprofile from false to true for MSSQLServer

When running a workload you will now see a message “Initializing xtprof time profiler” in all Virtual Users. Note that the package initializes in the Monitor Virtual User also as all data is gathered, processed and reported by the Monitor.

Time Profiling Started

When the workload is complete after reporting the NOPM/TPM the Monitor Virtual User will report messages on Gathering timing data/Calculating timings and Writing timing data to an external logfile to mark the separate stages of gathering and processing the data.

In the logfile there is a report of the timings for all of the Active Virtual Users followed by a summary of the cumulative data.

In the report the values have the following meaning with all timings in milliseconds.

Value Description
CALLS Number of times that the stored procedure was called.
MIN Minimum response time in milliseconds.
AVG Average response time in milliseconds.
MAX Maximum response time in milliseconds.
TOTAL Total time spent in that stored procedure during the measuring interval. The total time will include both rampup and timed test times.
P99 99th percentile in milliseconds.
P95 95th percentile in milliseconds.
P50 50th percentile in milliseconds.
SD Standard Deviation showing variance in captured values.
RATIO Ratio showing percentage time taken for that stored procedure. The total of these values may be less than 100% as only timings for the stored procedures are shown.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
MSSQLServer Hammerdb Time Profile Report @ Thu May 06 13:37:27 BST 2021
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>>>> VIRTUAL USER 2 : ELAPSED TIME : 420128ms
>>>>> PROC: NEWORD
CALLS: 173439 MIN: 0.481ms AVG: 0.938ms MAX: 1247.509ms TOTAL: 162638.013ms
P99: 2.147ms P95: 1.358ms P50: 0.842ms SD: 40379.430 RATIO: 38.712%
>>>>> PROC: PAYMENT
CALLS: 174701 MIN: 0.370ms AVG: 0.774ms MAX: 1191.065ms TOTAL: 135295.676ms
P99: 2.108ms P95: 1.261ms P50: 0.648ms SD: 39317.110 RATIO: 32.203%
>>>>> PROC: DELIVERY
CALLS: 17433 MIN: 1.483ms AVG: 3.351ms MAX: 552.191ms TOTAL: 58422.210ms
P99: 7.287ms P95: 5.305ms P50: 3.121ms SD: 47931.042 RATIO: 13.906%
>>>>> PROC: SLEV
CALLS: 17686 MIN: 0.637ms AVG: 2.044ms MAX: 640.690ms TOTAL: 36157.299ms
P99: 2.260ms P95: 1.740ms P50: 1.214ms SD: 194033.832 RATIO: 8.606%
>>>>> PROC: OSTAT
CALLS: 17189 MIN: 0.236ms AVG: 0.907ms MAX: 425.793ms TOTAL: 15597.346ms
P99: 1.776ms P95: 1.346ms P50: 0.716ms SD: 62930.031 RATIO: 3.713%
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>>>> VIRTUAL USER 3 : ELAPSED TIME : 419621ms
>>>>> PROC: NEWORD
CALLS: 175953 MIN: 0.492ms AVG: 0.940ms MAX: 1031.020ms TOTAL: 165474.149ms
P99: 2.172ms P95: 1.373ms P50: 0.842ms SD: 37665.637 RATIO: 39.434%
>>>>> PROC: PAYMENT
CALLS: 176330 MIN: 0.360ms AVG: 0.770ms MAX: 942.985ms TOTAL: 135763.464ms
P99: 2.131ms P95: 1.269ms P50: 0.648ms SD: 28595.491 RATIO: 32.354%
>>>>> PROC: DELIVERY
CALLS: 17424 MIN: 1.449ms AVG: 3.351ms MAX: 1051.259ms TOTAL: 58383.716ms
P99: 7.215ms P95: 5.295ms P50: 3.110ms SD: 81873.726 RATIO: 13.913%
>>>>> PROC: SLEV
CALLS: 17662 MIN: 0.670ms AVG: 1.759ms MAX: 1719.874ms TOTAL: 31063.587ms
P99: 2.240ms P95: 1.755ms P50: 1.226ms SD: 181722.634 RATIO: 7.403%
>>>>> PROC: OSTAT
CALLS: 17431 MIN: 0.240ms AVG: 0.984ms MAX: 366.508ms TOTAL: 17150.844ms
P99: 1.771ms P95: 1.351ms P50: 0.716ms SD: 78557.793 RATIO: 4.087%
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>>>> VIRTUAL USER 4 : ELAPSED TIME : 419456ms
>>>>> PROC: NEWORD
CALLS: 187196 MIN: 0.481ms AVG: 0.923ms MAX: 871.146ms TOTAL: 172755.826ms
P99: 2.133ms P95: 1.274ms P50: 0.821ms SD: 34184.264 RATIO: 41.186%
>>>>> PROC: PAYMENT
CALLS: 187555 MIN: 0.362ms AVG: 0.729ms MAX: 1011.880ms TOTAL: 136729.814ms
P99: 1.894ms P95: 1.177ms P50: 0.629ms SD: 26574.336 RATIO: 32.597%
>>>>> PROC: DELIVERY
CALLS: 18767 MIN: 1.430ms AVG: 2.560ms MAX: 141.821ms TOTAL: 48042.982ms
P99: 4.802ms P95: 3.548ms P50: 2.415ms SD: 19616.865 RATIO: 11.454%
>>>>> PROC: SLEV
CALLS: 18839 MIN: 0.648ms AVG: 1.700ms MAX: 630.934ms TOTAL: 32031.203ms
P99: 2.085ms P95: 1.651ms P50: 1.190ms SD: 144279.102 RATIO: 7.636%
>>>>> PROC: OSTAT
CALLS: 18747 MIN: 0.225ms AVG: 0.951ms MAX: 359.600ms TOTAL: 17822.935ms
P99: 1.781ms P95: 1.331ms P50: 0.705ms SD: 72273.598 RATIO: 4.249%
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>>>> VIRTUAL USER 5 : ELAPSED TIME : 419300ms
>>>>> PROC: NEWORD
CALLS: 186330 MIN: 0.478ms AVG: 0.917ms MAX: 1004.861ms TOTAL: 170810.498ms
P99: 2.079ms P95: 1.281ms P50: 0.826ms SD: 33795.633 RATIO: 40.737%
>>>>> PROC: PAYMENT
CALLS: 185915 MIN: 0.359ms AVG: 0.737ms MAX: 1051.469ms TOTAL: 136976.940ms
P99: 1.893ms P95: 1.186ms P50: 0.633ms SD: 38761.422 RATIO: 32.668%
>>>>> PROC: DELIVERY
CALLS: 18506 MIN: 1.436ms AVG: 2.584ms MAX: 184.669ms TOTAL: 47813.602ms
P99: 4.927ms P95: 3.574ms P50: 2.444ms SD: 22552.186 RATIO: 11.403%
>>>>> PROC: SLEV
CALLS: 18723 MIN: 0.642ms AVG: 1.759ms MAX: 709.370ms TOTAL: 32932.960ms
P99: 2.057ms P95: 1.654ms P50: 1.189ms SD: 150924.608 RATIO: 7.854%
>>>>> PROC: OSTAT
CALLS: 18558 MIN: 0.230ms AVG: 1.009ms MAX: 554.386ms TOTAL: 18718.672ms
P99: 1.763ms P95: 1.349ms P50: 0.711ms SD: 89937.214 RATIO: 4.464%
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>>>> SUMMARY OF 4 ACTIVE VIRTUAL USERS : MEDIAN ELAPSED TIME : 419539ms
>>>>> PROC: NEWORD
CALLS: 722918 MIN: 0.478ms AVG: 0.929ms MAX: 1247.509ms TOTAL: 671678.272ms
P99: 2.134ms P95: 1.323ms P50: 0.832ms SD: 36516.643 RATIO: 40.025%
>>>>> PROC: PAYMENT
CALLS: 724501 MIN: 0.359ms AVG: 0.752ms MAX: 1191.065ms TOTAL: 544765.715ms
P99: 2.016ms P95: 1.222ms P50: 0.639ms SD: 33766.268 RATIO: 32.462%
>>>>> PROC: DELIVERY
CALLS: 72130 MIN: 1.430ms AVG: 2.948ms MAX: 1051.257ms TOTAL: 212662.428ms
P99: 6.728ms P95: 4.638ms P50: 2.661ms SD: 49195.620 RATIO: 12.672%
>>>>> PROC: SLEV
CALLS: 72910 MIN: 0.637ms AVG: 1.813ms MAX: 1719.871ms TOTAL: 132185.007ms
P99: 2.165ms P95: 1.700ms P50: 1.204ms SD: 168407.157 RATIO: 7.877%
>>>>> PROC: OSTAT
CALLS: 71925 MIN: 0.225ms AVG: 0.963ms MAX: 554.387ms TOTAL: 69289.775ms
P99: 1.775ms P95: 1.344ms P50: 0.712ms SD: 76749.333 RATIO: 4.129%
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Some key observations are to note that time profiling begins immediately for each Virtual User and includes both rampup and timed durations. For this reason by default the captured time for the first Virtual User will be longer than the second and so on because there is a  pause between each starting. It is a proposal to Exclude ramp up duration for time profiling #233 and this may be an enhancement for a future version. Also note that for the summary the elapsed times will be proprtionally longer than the duration of the test as it is recording the elapsed time for all Virtual Users.

Typically the key metrics you will want to observe will be P95 & P99 as this shows that 95% and 99% respectively of transactions completed inside this time. You would normally expect the total elapsed time to be in the order of NEWORD, PAYMENT, DELIVERY, SLEV, OSTAT however this may vary for some databases when higher levels of locking are experienced.  Finally the SD value or standard deviation can give an indication of the variance between the recorded values. A higher variation gives an indication of less consistent transaction processing times.

Finally for advanced users comfortable with examining the HammerDB source code in the xtprof module there is the following comment as a reference point.

#At this point [dict get $monitortimings $vutr $sproc clickslist] will return all unsorted data points for vuser $vutr for stored proc $sproc
#To record all individual data points for a virtual user write the output of this command to a file

As shown in the comment at this section of the code if desired you can use the commands given to print out all individual data points for all Virtual Users for advanced plotting and analysis of timing data. This is also functionality considered for future releases beyond v4.1 and a potential area for future code contribution.

 

 

Interacting with Open Source for HammerDB Code and Documentation

So the HammerDB project is open source.  That shouldn’t come as too much of a surprise.  When you install it you accept the license agreement and once installed there is a file called LICENSE headed GNU GENERAL PUBLIC LICENSE Version 3,  29 June 2007 – so you know that the code is open source under GPLv3.

It is rare these days for people downloading and using open source not to have awareness of open source etiquette and responsibilities. We all know by now that ‘Free’ software means free as in freedom and by taking you also take the responsibility to give according to your ability. This doesn’t mean that if you are not an expert programmer or developer you have nothing to contribute. To the contrary everyone downloading and using HammerDB can at a minimum contribute to Issues and Discussions on the HammerDB GitHub site or by publishing performance results.

For example if you find a bug and then create an Issue this start the conversation of what needs to be done to make the software better for everyone.  On the other hand if you don’t interact or only tweet or blog about bugs then that doesn’t make the software better either for you or anyone else.  Even better practice is creating an Issue and then the Pull Request to resolve it, here is just one  example of good practice that typifies the open source approach.

GitHub User sravanigomatam raised the Issue Add limit clause to query10 for TPC-H Oracle #172 this correctly identified that HammerDB was missing the limit clauses in one of the TPC-H queries for Oracle that meant that this query took longer than it should. Further investigation showed that more queries were impacted by the same issue.  sravanigomatam then created the Pull Request Add limit clause to TPC-H Oracle queries #186 to resolve the Issue. This fix was included in the release HammerDB v4.1 for everyone’s benefit.

If you don’t have the skills to do the Pull Request just raising an Issue you have identified can be a way to contribute to the project. Similarly, if you have a question GitHub discussions is the best place to ask HammerDB related questions to receive an informed answer.  Even better is answering questions on GitHub discussions can be a way to provide your unique insights to the HammerDB community.

And if you don’t have the skills right now for Pull Requests or answering questions then everyone can contribute Ideas through Issues and Discussions or share their performance results. Of course the best way to see your idea included in HammerDB is to improve those development skills and do the Pull Request but even if you can’t then all new features start as an idea.

But what about documentation? Many open source users can be unaware that most open source project documentation is open source as well. This means you have the freedom to contribute to the documentation as well. If you feel that open source documentation is insufficient then this is a great opportunity for you to give back and improve it without programming skills.

In the case of HammerDB the documentation is published under the GNU Free Documentation License. HammerDB documentation is written in Docbook format  meaning anyone can edit the documentation and submit their changes via a GitHub Pull Request to the HammerDB project.

To get started go to the HammerDB project under the Docbook directory.  Here you will find a docs.xml file containing the documentation in Docbook v5.1 standard and the images included in the HammerDB documentation. If you clone or download the project you will already have a copy of the documentation and images that you need to start editing.

There are many Docbook editors that you can use to edit the documentation such as XMLmind Personal Edition that is free to use for open source projects. Using XMLmind as an example we have downloaded a ZIP copy of the HammerDB project extracted it and navigated to the Docbook directory. There we can open the docs.xml file and begin writing documentation.

When you have added to the docs.xml file save the contents not forgetting to include any new images in the images directory and submit a Pull Request with your changes.  Once reviewed these can be converted to HTML by HammerDB and uploaded to the HammerDB website for everyone to benefit from your insights.

So if you find HammerDB useful whether writing code, raising and resolving issues, answering questions, submitting ideas or writing documentation remember that open source like any community thrives on what you give back keeping software Free for everyone’s benefit.

 

 

 

 

HammerDB CLI 101

This post is to give anyone starting out with HammerDB a guide on using the CLI or command line interface for text based environments.  As the workflow in the CLI and GUI are the same we will show equivalent commands side by side to help you quickly get up to speed on using the CLI in both interactive and scripted scenarios.

Help and Navigation

To begin with run the hammerdbcli command in Linux or hammerdbcli.bat in Windows and type help at the hammerdb prompt.

help command

This displays the available CLI commands with “help command” providing detailed information about the command and arguments required.

To navigate and edit at the CLI use the standard Ctrl commands as follows:

Ctrl Command Action
Ctrl-P Move to previous command
Ctrl-N Move to next command
Ctrl-F Move cursor forward
Ctrl-B Move cursor backward
Ctrl-A Move cursor to the start
Ctrl-E Move cursor to the end
Ctrl-G Clear Line
Ctrl-K Cut
Ctrl-Y Paste
Ctrl-H Backspace

Librarycheck

One of the first things you will want to do is make sure that we can access the 3rd party driver libraries for the database that we want to use. This is done with the librarycheck command.  In this example we are using SQL Server so the message shows that everything is in order and we can proceed with running tests.  If the library failed to load consult the HammerDB documentation on installing and configuring your libraries with the PATH environment variable for Windows or LIBRARY_PATH environment for Linux.

librarycheck

Selecting a database

Select a database

The next thing you will want to do is to select your preferred database. In the GUI we can select from the menu or double right-click the database heading.

This will show the benchmark options dialog.

Benchmark Options

In the CLI this corresponds to the dbset command with the database set using the db argument  according to the prefix in the XML configuration which are ora, mssqls, db2, mysql, pg for Oracle, Microsoft SQL Server, IBM Db2, MySQL and PostgreSQL respectively.

dbset db command

and benchmark set with the bm argument.  You can use either the TPROC or TPC terminology at the CLI.

dbset bm command

Building the Schema

Expanding the GUI menu presents the workflow with our first task of building the schema.

GUI Workflow

Selecting schema build and options presents the schema build options dialog. In the example below we have modified the SQL Server, number of warehouses to build and the virtual users to build them.

Build Options

In the CLI the print dict command shows us the available options.

print dict

These can be modified with the diset command specifying the option and the value to be changed. The example below shows the same settings made in the GUI. We have set the connection value of mssqls_server and then the tpcc value of the warehouse count and the number of virtual users to build them. Note that for the mssqls_server value there is the backslash special character and therefore the entered value is wrapped with curly brackets {…} to preserve the special character.

diset command

In the GUI clicking on Build presents the build dialog. Clicking yes will start the schema build.

Create Schema Dialog

In the CLI buildschema shows the same prompt and accepts automatically.

buildschema command

A key aspect is being able to visualise the multithreaded nature of the Virtual Users. In the GUI the Virtual User output is shown in a grid and status to a table easily enabling us to see the multithreaded nature of the workload. In the CLI all output is printed to the console preceded by the name of Virtual User producing it.  Nevertheless the CLI is multithreaded in the same way as the GUI.  For both the time it will take for the build to complete will depend on the HammerDB client CPU and the performance of the database server being loaded, during this time each action will be printed to the display. You may need a number of minutes for the build to complete.

Build in progress

When the build is complete Virtual User 1 will show TPCC SCHEMA COMPLETE.  The schema build is the same process whether built from the GUI or the CLI.

Build Complete

Using the vustatus command we can now see the status of the Virtual Users as having completed successfully. Note that as the CLI is running in interactive mode the vustatus command can be also be run while a workload is running. Press return for a prompt and then type the command needed.  vudestroy will perform the equivalent action as pressing the stop button in the GUI to close the Virtual Users Down.  Similarly doing the same while a workload is running will also do the same action as pressing the stop button while a workload is running in the GUI.

vustatus

Loading the Driver and running the test

We have now built a schema. The next step in the workflow is to define the driver script options.  In the GUI we are presented with an options dialog to set the configuration.  In the example we have again set the server name, have chosen the timed workload and also selected the Use All Warehouses option.

GUI Driver Options

To do the same in the CLI we again use the diset command.

diset Driver Options

Once the options have been chosen the driver script is loaded automatically in the GUI or can be re-loaded with the Load command.

GUI Driver Script

The loadscript command does the same at the CLI with the print script command showing the script loaded.  Note that the driver script is identical in both the GUI and CLI (as long as you have chosen the same options) meaning that the workload that is run is also identical regardless of the interface chosen. You can also load a modified script using the customscript command meaning that you can edit a script in the HammerDB GUI save it and then load it to run in the CLI.

loadscript

Referring back to the GUI for our workflow the next step is the creation of Virtual Users for running the driver script loaded.

GUI Virtual User Options

In the CLI the vuset command sets the Virtual User options and the print vuconf command displays the setting.

CLI Virtual User Options

With the Virtual User configuration set the next stage is to create the Virtual Users.  Having chosen the timed workload we see a monitor Virtual User in addition to the active Virtual Users chosen.

GUI Create Virtual Users

In the CLI the vucreate command creates the Virtual Users and the vustatus command shows the status that is shown in the status column of the Virtual User table in the GUI.

vucreate

ln the GUI we would then run the Virtual Users, in the CLI the workload is started with with the vurun command.

vurun

When the workload is complete we see the TEST RESULT output and the status of the Virtual Users. The vudestroy command will close down the Virtual Users in the same way as pressing the red stop button in the GUI.

TEST RESULT

Additional CLI Functionality

At this stage we have followed the GUI workflow to use the CLI to create the schema and run the TPROC-C workload with a number of Virtual Users.  It is of note that much of the additional GUI functionality is also available with CLI commands, for example primary and replica instances can be created and connected in the CLI and also as shown the CLI transaction counter.

CLI Transaction Counter

Scripting the workloads

We have seen how to run the HammerDB CLI interactively by typing commands in a similar manner that we would use the GUI to build schemas and run workloads. However one major benefit of the command line use is also the ability to script workloads. There are 2 approaches to scripting HammerDB CLI commands. Firstly we can run a script from the interactive prompt using the source command. Secondly we can use the auto argument to run a script directly without the interactive prompt.

To run a script using the source command we can take a text editor and enter the commands into a file with a .tcl extension. In this example we are running a timed workload with 2 active virtual users, we are logging the output and also running the transaction counter with logged output. Note one additional command has been added to what has been seen previously when running interactively, namely runtimer.  This is where having run the same workload in the GUI helps understand the concepts. HammerDB ins multithreaded and the Virtual Users run independently as operating system threads. Consequently if vurun is followed by vudestroy in a script then the Virtual Users will be immediately terminated by the main thread as soon as they are started. This is unlikely to be the desired effect. Therefore runtimer keeps the main HammerDB thread busy and will not continue to the next command until one of 2 things happen. Firstly if the vucomplete command returns true or the seconds value is reached. For this reason the runtimer seconds value should exceed both the rampup and duration time.  Then only when the Virtual Users have completed the workload will vudestroy be run.

sqlrun.tcl

Now we can run the source command giving our script as an argument and the commands will be run without further interaction.

source sqlrun.tcl

We can see how the script ran to completion, called vudestroy and returned us to the interactive prompt. If desired the quit command returns from the interactive prompt to the shell prompt.

ALL VIRTUAL USERS COMPLETE

The HammerDB CLI is not restricted only to the commands shown in the help menu.  The CLI instead supports the full syntax of the TCL language meaning you can build more complex workloads.

There is an introductory tutorial to TCL here.  We also recommend the following book as a comprehensive reference to TCL :  TCL Programming Language by Ashok P. Nadkarni.

A simple example is shown using the foreach command to implement the autopilot feature from the GUI.

foreach loop

Running this script we are now executing a loop of tests with 1 then 2 then 4 Active Virtual Users in an unattended manner.

Unattended CLI Test

When the final iteration in the loop is complete the CLI returns to the prompt.

TEST SEQUENCE COMPLETE

and the log file provides a summary of all of the workloads.

Vuser 1:1 Active Virtual Users configured
Vuser 1:TEST RESULT : System achieved 33643 NOPM from 77186 SQL Server TPM
...
Vuser 1:2 Active Virtual Users configured
Vuser 1:TEST RESULT : System achieved 66125 NOPM from 152214 SQL Server TPM
...
Vuser 1:4 Active Virtual Users configured
Vuser 1:TEST RESULT : System achieved 106080 NOPM from 243964 SQL Server TPM
...

Running the CLI from the OS Shell

Of course you can use TCL scripting to configure complex build and test scenarios including the execution of host commands.  Another way is to run multiple CLI scripts from the OS shell such as Bash on Linux and Powershell on Windows. The following example shows a build and test script for SQL Server. Note the additional waittocomplete command in the build script. This command is a subset of runtimer and causes the CLI to wait indefinitely until all of the Virtual Users return a complete status. At this point the CLI will return. In this case it is followed by a quit command to exit the CLI.  These scripts are called sqlbuild2.tcl and sqlrun2.tcl respectively.

sqlbuild2.tcl
sqlrun2.tcl

In Windows we can now write a powershell script called buildrun2.ps1 that calls the build and run scripts in turn. In Linux we would do the same with a bash script. In this case we use the auto argument to run the script in a non-interactive mode.  These commands can be interspersed with other operating system or database commands at the shell level. In this case we have only written to the output however any additional database configuration commands can be used to build a complex test scenario.

buildrun2.ps1

Starting the powershell we change to the HammerDB directory and run the powershell script.  (You should always run the scripts after changing to the HammerDB directory rather than running them from another directory). This starts running  the build script to build the schema.

Once the build is complete the build script exits, the powershell takes over and follows it by running the driver script.

After the test is complete the powershell exits and returns to the command prompt.

Summary

Once you have an overview of the HammerDB workflow by following the GUI menu driven system using the CLI should be straightforward using the same approach. The key concept is to understand that both the GUI and the CLI are multithreaded and the Virtual Users themselves run entirely independently as operating system threads and therefore you interact with the Virtual Users by interacting with the main interface thread and passing messages to the VUs. This means if you exit too early from the main interface in either GUI or CLI the entire workload will be stopped.

Once you have familiarity with how the CLI works it is then not difficult to adapt this understanding to build complex automated workflows using TCL scripting, shell scripting or a combination of both.

 

What programming languages does HammerDB use and why does it matter?

HammerDB is a load testing and benchmarking application for relational databases. All the databases that HammerDB tests implement a form of MVCC (multi-version concurrency control). This helps to minimise locking allowing multiple sessions to access the same data at the same time. On high-performance multi-core systems all the supported databases can return performance in the many millions of transactions per minute. However, it is crucial that the benchmarking application does not have inherent bottlenecks that artificially limits the scalability of the database. This is why the choice of programming language is so important from the outset.

This post explains why HammerDB made the language decisions it made to make it the best performing and most usable database benchmarking software.

Basic Benchmarking Concepts

As we have seen databases are designed to handle multiple database sessions at the same time. To benchmark a database we introduce the concept of a Virtual User. The benchmarking software simulates the actions of multiple individual users and these users must run in parallel to test the MVCC (Multiversion Concurrency Control) capabilities of the database. There is a key distinction here between parallelism and concurrency. It is important that the concurrency between sessions is handled at the database not at the client because that is how databases are accessed in the real world. When we have multiple CPU cores on both the benchmark client and database server it is crucial that these database sessions run independently of each other at the same time, in parallel. For simplicity, we do not include networking or transaction management middleware in this discussion because although important in the real world they do not affect the key concepts.

Database benchmarking in parallel

SQL

Firstly, for a database benchmarking application it should not come as a huge surprise that the key language used for testing databases is Structured Query Language known as SQL. For HammerDB both TPROC-C and TPROC-H run all of their workloads on the database being tested in SQL. The following is an example from TPROC-C from SQL Server.

SELECT @st_o_id = district.d_next_o_id
FROM dbo.district
WHERE district.d_w_id = @st_w_id AND district.d_id = @st_d_id

SELECT @stock_count = count_big(DISTINCT stock.s_i_id)
FROM dbo.order_line
, dbo.stock
WHERE order_line.ol_w_id = @st_w_id
AND order_line.ol_d_id = @st_d_id
AND (order_line.ol_o_id < @st_o_id) AND order_line.ol_o_id >= (@st_o_id - 20)
AND stock.s_w_id = @st_w_id
AND stock.s_i_id = order_line.ol_i_id
AND stock.s_quantity < @threshold
OPTION (LOOP JOIN, MAXDOP 1) 

and the following from TPROC-H

select top 100 
s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment 
from part, supplier, partsupp, nation, region 
where p_partkey = ps_partkey 
and s_suppkey = ps_suppkey 
and p_size = 47 
and p_type like '%COPPER' 
and s_nationkey = n_nationkey 
and n_regionkey = r_regionkey 
and r_name = 'EUROPE' 
and ps_supplycost = ( 
select min(ps_supplycost) 
from partsupp, supplier, nation, region 
where p_partkey = ps_partkey 
and s_suppkey = ps_suppkey 
and s_nationkey = n_nationkey 
and n_regionkey = r_regionkey 
and r_name = 'EUROPE'
) 
order by s_acctbal desc, n_name, s_name, p_partkey option (maxdop 2)

Application Logic in Stored Procedures

So the interaction with the database is in SQL. For the TPROC-H workload this is all we need the queries are long-running analytics queries so once executed on the database do not need to wait for the benchmarking client. TPROC-C however is derived from the TPC-C specification and requires application logic around the SQL. HammerDB implements the TPROC-C application logic in the form of stored procedures for all the supported databases.

DatabaseApplication Logic
OraclePL/SQL
SQL ServerT-SQL
Db2SQL PL
PostgreSQLPL/pgSQL
MySQLstored program language
HammerDB Stored Procedures.

So now our TPROC-C example from the Stock Level stored procedure on SQL Server begins as follows.

CREATE PROCEDURE [dbo].[slev]  
 @st_w_id int,
 @st_d_id int,
 @threshold int
 AS 
 BEGIN
 DECLARE
 @st_o_id int, 
 @stock_count int 
 BEGIN TRANSACTION
 BEGIN TRY
 SELECT @st_o_id = district.d_next_o_id 
 FROM dbo.district 
 WHERE district.d_w_id = @st_w_id AND district.d_id = @st_d_id
 ....

Why does it matter that we implement the application logic in the form of stored procedures? To prevent the roundtrip between the client and database becoming the bottleneck. As an illustration if we compare HammerDB with a sysbench workload both running on the same system and same MySQL database we can observe that both workloads are driving 75-80% of the CPU on database throughput however sysbench is utilising 20% system time compared to 3% for HammerDB. (The HammerDB workload shows that database locking prevents full CPU utilisation on the MySQL server).

Why is this happening? sysbench has the application logic in the client and is therefore spending 6-7X of the time on socket communication because every single SQL statement requires a separate round trip to the client (recvfrom and sendto).

Top 10 Processes
Top Files

However, having the application logic in the client is even worse because you are sacrificing key database efficiencies of prepared statements. Of course, you can prepare individual statements, however you can see major efficiency gains by using natively compiled stored procedures. With HammerDB the stored procedures are compiled (where supported by the database) at the time of schema creation by the HammerDB build. When the HammerDB driver is run we parse the statement to call the procedure once (and only once) per session. The example below is for the NEW ORDER stored procedure for the Oracle database.

"BEGIN neword(:no_w_id,:no_max_w_id,:no_d_id,:no_c_id,:no_o_ol_cnt,:no_c_discount,:no_c_last,:no_c_credit,:no_d_tax,:no_w_tax,:no_d_next_o_id,TO_DATE(:timestamp,'YYYYMMDDHH24MISS')); END;"

We therefore don’t use CPU repeatedly parsing any of the SQL used for the workload. Instead, we are using bind variables and each time we call the stored procedure we bind the INPUT variables, execute the stored procedure and fetch the OUTPUT variables. The driver only needs to generate strings that correspond to the bind variables and therefore not only is it doing much more work per roundtrip it is sending and fetching a lot less data as well.

Database interfaces in C

So our application logic is in SQL and stored procedures and we are preparing once then binding and executing statements multiple times. So lets take a look at an extract of how this is done in HammerDB for Db2 and the language used.

int Db2_bind_exec (ClientData cData, Tcl_Interp * interp, int argc,
CONST84 char *argv[])
{
int i = 1;
Tcl_Channel conn_channel;
Db2Connection *conn;
char id[MAX_ID_LENGTH + 1];
char buff[MAX_ID_LENGTH + 1];
SQLHANDLE hdbc, hstmt;
SQLLEN ival = SQL_NULL_DATA;
short num_params;
int nparam;
char **paramList;

...

conn->rc = SQLFreeStmt(hstmt, SQL_RESET_PARAMS);
    if (conn->rc != SQL_SUCCESS)
    {
        SQLError (henv, conn->hdbc, hstmt,
                (SQLCHAR *) & conn->sql_state,
                &conn->native_error,
                (SQLCHAR *) & conn->error_msg,
                sizeof (conn->error_msg), &conn->size_error_msg);
        Tcl_AppendResult (interp, conn->error_msg, (char *)NULL);
        if (paramList) ckfree((char *) paramList);
        return TCL_ERROR;
    }

for (i = 0; i < nparam; i++)
{
if (strncmp (paramList[i], "NULL", 4) == 0)
{
/* null bind / conn->rc = SQLBindParameter (hstmt, i + 1, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_CHAR, 0, 0, NULL, 0, &ival); 
} 
else 
{ / bind value */
conn->rc = SQLBindParameter (hstmt, i + 1, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_CHAR, 0, 0, paramList[i], 0, NULL);
}

 if (conn->rc != SQL_SUCCESS)
    {
        SQLError (henv, conn->hdbc, hstmt,
                (SQLCHAR *) & conn->sql_state,
                &conn->native_error,
                (SQLCHAR *) & conn->error_msg,
                sizeof (conn->error_msg), &conn->size_error_msg);
        Tcl_AppendResult (interp, conn->error_msg, (char *)NULL);
        if (paramList) ckfree((char *) paramList);
        return TCL_ERROR;
    }
}

conn->rc = SQLExecute (hstmt);

...

If this looks like a lot like the C programming language, that is because it is or to be more precise it is a C program using the Db2 Call Level Interface (CLI) that is the ‘C’ ‘C++’ programming interface for Db2 ‘to pass dynamic SQL statements as function arguments’ which is exactly what we want to do. For all the databases supported we are using a compiled C interface on the client to use the lowest level of most efficient form of access possible.

Database Programming Interface
OracleOCI
SQL ServerODBC
Db2CLI
PostgreSQLLibpq
MySQLMySQL Native Driver
HammerDB Database Interfaces

Glue language

So now we have all of our application logic in SQL and stored procedures, natively compiled that we can bind and execute. We also have our low-level C level interfaces that can run both SQL and stored procedures at high performance. The next thing we need is what is known as a Glue language. One that can stick all the components together in an application that we can use to run databases benchmarks.

Why not Python as a Glue Language

Surely we can just use any language that we are familiar with? Unfortunately it is not quite this simple. Let’s take a look at using Python as a language to build a driver for a workload derived from TPC-C. There are different language implementations of Python, as we have seen for our high performance benchmarking application our interfaces are written in C so CPython is our only choice. So lets take a look at how this would work in practice.

Referring back to our benchmarking basics we want to run database sessions in parallel. For this reason we need to implement our database sessions in the form of operating system threads. We could use processes, however given we may want to create hundreds or thousands of virtual users multithreading is the best approach to implement a Virtual User.

This is where we hit a roadblock with using Python as a glue language for a benchmarking application, the Python GIL. The GIL or Global Interpreter Lock is a mutex that stops multiple Python threads executing Python bytecodes at the same time.

The Python GIL

In other words our Virtual Users instead of executing in parallel are now effectively running their database sessions in serial and the more sessions we have the more performance will degrade as each session tries to acquire the GIL. As it says on the Python Wiki ‘it is only in multithreaded programs that spend a lot of time inside the GIL, interpreting CPython bytecode, that the GIL becomes a bottleneck.’ – this describes exactly the scenario that we use for database benchmarking and is discussed further in the section Bytecode Execution.

So having an application client mutex lock that artificially stalls our Virtual Users does not sound good. Unfortunately, however our glue language doesn’t know that we are testing a high performance database processing numerous databases sessions concurrently. Our sessions will also be taking out locks on the database itself meaning that blocked Virtual Users on the client could themselves be blocking running Virtual Users on the database. Conversely, if throughput is limited we could also be preventing the database from handling the database locking that results from running multiple sessions accessing the same data at the same time. In either scenario we are preventing the database from managing sessions concurrently when this is precisely the scenario we want to test.

Why Tcl as a Glue Language

HammerDB abandoned Python as a glue language at the design stage because of the lack of multithreading and parallel capabilities. The only language that met such specific requirements for high throughput database benchmarking was Tcl or Tool Command Language. Not only was it designed from the ground up to interface with applications built in C, but it also supports true multithreading enabling our Virtual Users to be implemented as an independent operating system thread and genuinely run in parallel.

Tcl Multithreading in parallel

How does Tcl do this if Python can’t? Whereas within Python all threads run in a single interpreter (after acquiring the GIL) in Tcl each thread has its own copy of the interpreter. This is possible because the Tcl interpreter is exceptionally compact and lightweight (Also for this reason Tcl is often used as an embedded language in hardware such as Cisco Routers). By default, there is no shared data between threads, instead each thread runs an event loop completely independently and communication is done by passing messages to run events in those threads.

So what if we pre-created a number of operating system threads, loaded a C level interface to communicate with a database and then passed a script for the threads to evaluate in their event loop. That script would create strings of data for parameters and then either run SQL statements or call databases stored procedures? That is exactly what HammerDB does. In this scenario we have true multithreading and linear scalability. When our benchmarking client has multiple cores and threads we can take advantage of them and run entirely in parallel ensuring a true concurrent workload on the database server. Of course if the threads need to communicate they can through messages in a thread safe way, so for example if you press the stop button in HammerDB you send a message to all Virtual Users to end the current running workload and exit the thread. When you create Virtual Users in HammerDB you can see that you have created threads as follows. (run this command in the console or in the CLI).

puts [ thread::names ]
tid000000000000360C tid0000000000000944 tid00000000000032A8 tid0000000000002E10 tid00000000000019E4 tid0000000000001B38 tid0000000000002E80 tid00000000000004F4 tid00000000000006D0 tid0000000000003C44 tid0000000000002CF4

There is one small exception to this parallelism. Of course, you can only update a graphical user interface with the main application thread and therefore any output to be displayed must be sent via a message to this main application thread. Therefore, the Test workload in HammerDB that prints the output from all user sessions requires every session to pass its output to the main display. For this reason the Timed workload suppresses output unless there is an error.

Bytecode Execution

So as we saw in Python ‘it is only in multithreaded programs that spend a lot of time inside the GIL, interpreting CPython bytecode, that the GIL becomes a bottleneck.’ so lets see what happens in HammerDB. Running a performance profiling tool such as perf in Linux we can see that the top event is TEBCresume standing for Tcl Execute Byte Code.

Samples: 67K of event 'cycles:ppp', Event count (approx.): 33450114923
 Overhead  Shared Object                  Symbol
   33.56%  libtcl8.6.so                   [.] TEBCresume
    7.68%  libtcl8.6.so                   [.] Tcl_GetDoubleFromObj
    6.28%  libtcl8.6.so                   [.] EvalObjvCore
    6.14%  libtcl8.6.so                   [.] TclNRRunCallbacks

It is not a coincidence that as a Tcl proc is compiled to bytecode, HammerDB implements the calling of the database stored procedures as procs. You can see the bytecode generated with a command such as follows.

puts "bytecode:[::tcl::unsupported::disassemble proc slev]"

with output such as the following for the SQL Server slev stored procedure we saw earlier.

bytecode:ByteCode 0x00000211562A4AA0, refCt 1, epoch 17, interp 0x00000211560F30D0 (epoch 17) 
 Source 
 set threshold [ RandomNumber 10 20 ]
 if  {[catch {se… Cmds 21, src 514, inst 314, litObjs 14, aux 0, stkDepth 8, code/src 0.00 Proc 0x0000021155F1E060, refCt 1, args 4, compiled locals 8 
 slot 0, scalar, arg, slev_st 
 slot 1, scalar, arg, w_id 
 slot 2, scalar, arg, stock_level_d_id 
 slot 3, scalar, arg, RAISEERROR 
 slot 4, scalar, threshold 
 slot 5, scalar, message 
 slot 6, scalar, resultset 
 slot 7, scalar, slrows 
...

So lets look at the slev proc before disassembly. We are calling the stored procedure with the warehouse id and stock level district id parameters, setting the threshold, executing the stored procedure on the database and fetching the results.

#STOCK LEVEL
 proc slev { slev_st w_id stock_level_d_id RAISEERROR } {
 set threshold [ RandomNumber 10 20 ]
 if  {[catch {set resultset [ $slev_st execute [ list st_w_id $w_id st_d_id $stock_level_d_id threshold $threshold ]]} message ]} {
     if { $RAISEERROR } {
 error "Stock Level : $message"
     } else {
 puts "Stock Level : $message"
     }
       } else {
 if {[catch {set slrows [ $resultset allrows ]} message ]} {
 catch {$resultset close}
 if { $RAISEERROR } {
 error "Stock Level Fetch : $message"
     } else {
 puts "Stock Level Fetch : $message"
     }} else {
 catch {$resultset close}
     }
     }
 }

As seen before the application logic and our workload is on the database. Our Virtual User is generating and passing strings of data to call these stored procedures meaning the client logic is exceptionally lightweight. Not only is it lightweight but as we have seen each Virtual User is an operating system thread running compiled bytecode generating strings so not only is it lightweight and parallel it is also very fast. But let’s quantify fast by running the same calculation in SQL Server (see the HammerDB documentation for the Oracle example)

USE [tpcc]
 GO
 / Object:  StoredProcedure [dbo].[CPUSIMPLE]    Script Date: 25/02/2021 17:41:35 /
 SET ANSI_NULLS ON
 GO
 SET QUOTED_IDENTIFIER ON
 GO
 ALTER PROCEDURE [dbo].[CPUSIMPLE] 
 AS
    BEGIN
       DECLARE
          @n numeric(16,6) = 0,
          @a DATETIME,
          @b DATETIME
       DECLARE
          @f int
       SET @f = 1
       SET @a = CURRENT_TIMESTAMP
       WHILE @f <= 10000000 
          BEGIN
       SET @n = @n % 999999 + sqrt(@f)
             SET @f = @f + 1
          END
          SET @b = CURRENT_TIMESTAMP
 PRINT 'Timing = ' + ISNULL(CAST(DATEDIFF(MS, @a, @b)AS VARCHAR),'')
 PRINT 'Res = ' + ISNULL(CAST(@n AS VARCHAR),'')
    END

Timing = 7767
Res = 873729.721235
(1 row affected)
Completion time: 2021-02-25T17:40:16.1261747+00:00

and in Tcl Bytecode.

proc runcalc {} {
 set n 0
 for {set f 1} {$f <= 10000000} {incr f} {
 set n [ expr {[::tcl::mathfunc::fmod $n 999999] + sqrt($f)} ] 
 }
 return $n
 }
 puts "bytecode:[::tcl::unsupported::disassemble proc runcalc]"
 set start [clock milliseconds]
 set output [ runcalc ]
 set end [ clock milliseconds]
 set duration [expr {($end - $start)}]
 puts "Res = [ format %.02f $output ]"
 puts "Time elapsed : [ format %.03f [ expr $duration/1000.0 ] ]"

hammerdb>source runcalc.tcl
Res = 873729.72
Time elapsed : 3.553
hammerdb>

So on the same test system SQL Server completed the T-SQL calculation in 7.7 secs and Tcl completed the calculation in 3.5 secs. So for this example the client language is 2X faster than one of the fastest server languages. Yet remember the client logic is only generating strings to call the server side stored procedures (or SQL statements for TPROC-H) and all the Virtual Users are running independently of each other meaning that the client side of the workload is minimal compared to the database side.

As HammerDB tests multiple databases we also have the insight into client performance from comparing and contrasting the throughput from both commercial and open source databases. Consequently, we know that when we see a result for a highly tuned commercial database that is more than 10X higher throughput than a comparative database on the same hardware system we have 100% confidence that the limitation does not reside in the HammerDB client side of the test but instead in the database.

What about Coroutines?

We have seen that for database benchmarking it is important that our Virtual Users run in parallel meaning that each Virtual User should operate as an operating system thread. However, there is one scenario where you could raise an objection. What about when we want to run thousands of database sessions? In this scenario it would not be possible to run thousands of operating system threads due to the overhead on system resources and therefore couldn’t we run an implementation using coroutines instead? The answer is yes and when we want to run thousands of database sessions this is exactly what HammerDB does.

proc promise::async {name paramdefs body} {
     # Defines an procedure that will run a script asynchronously as a coroutine.

This option is chosen when you select the Asynchronous Scaling checkbox. You define the number of Virtual Users (operating system threads) and the number of clients per Virtual User sessions managed with coroutines. However, when you select Asynchronous Scaling note that it also activates keying and thinking time for you, this is not coincidental.

A coroutine implementation is appropriate for managing many sessions concurrently (rather than in parallel) when there is a clear and defined time delay between transactions per session. This is why HammerDB uses threads for Virtual Users for maximum throughput and threads and coroutines for asynchronous scaling when sessions will sleep for keying and thinking time. Coroutines alone would not enable the parallelism required for maximum throughput.

Building a GUI with Tk

HammerDB can run in command line mode but has always had a GUI that runs on both Linux and Windows platforms. For most Python applications the graphical interface used is called Tkinter which is the Python interface to Tcl/Tk. HammerDB bypasses this interface and uses Tk directly meaning that all the features available to a Tkinter application are also available to HammerDB but also more as well, meaning for example that HammerDB could take advantage of SVG graphics for high definition displays before a Python Tkinter application could, creating a native display for both Linux and Windows.

HammerDB GUI

Summary

We have discussed why HammerDB is written in the programming languages it uses and why running the clients in parallel in operating system threads is so important when we want to test concurrency on the database being tested. We have seen that the workloads are written in SQL and stored procedures and the client logic is compiled into Bytecode for performance. All of this is wrapped in an application interface that is simple and intuitive so all you need to do is point HammerDB at your database and start testing.

HammerDB v4.0 Signed Installer for Windows

There is now a signed installer available for HammerDB v4.0 for Windows from the HammerDB releases. The installation steps from HammerDB-4.0-Win-x64-Signed-Setup.exe is identical to HammerDB-4.0-Win-x64-Setup.exe and the contents in the installed directory are the same.

The key difference between the 2 installers is that the Signed Setup has been signed with a code signing certificate, whereas for the unsigned installer you can verify the installer itself manually with the provided checksums. When running the signed setup it will confirm the Verified publisher as the TPC Council as shown.

If you choose to Show more details you can view the Certificate Information from the TPC Council.

For Linux installations and the provided zip file for Windows, verification of the release files continues to be done through checksums.

HammerDB v4.0 New Features Pt4: Connect Pooling for Clusters

Prior to HammerDB v4.0 for the TPROC-C test there was the option to connect to one database instance only. If it was required to connect to multiple instances in a cluster then the Primary/Replica modes were used to create multiple HammerDB instances to connect to the separate database instances simultaneously. HammerDB has introduced a connect pool feature whereby a single instance of HammerDB can create a pool of multiple database instance connections with policies defined at the stored procedure level to determine how the individual stored procedures are run on which connections to the database instances. For example in an environment where there are primary read-write instances and secondary read-only it would be possible to define a policy whereby the neworder, payment and delivery stored procedures run against the read-write instance and stocklevel and orderstatus run against the read-only instance. Where there are multiple instances serving a similar purpose the policy can determine how an individual transaction is assigned. For example if there are three read write-instances then the neworder stored procedure can be defined to execute a transaction at each in a round-robin fashion or instead select an instance at random.

Connect Pool

To define the connect pool there are new XML files in the config/connectpool directory. These provide a template for the multiple connections with the same connection options for the standard interface defined in the connections section. The connections are named c, c2, c3 and so on with no limit on the number of connections that you define. There is also an sprocs section where you define what connections each stored procedure should use and what policy to apply, the policy can be  first_named, last_named, random or round_robin.

pgcpool.xml

When you have defined your configuration, select the XML Connect Pool option when loading the driver script. Your active Virtual Users will then use your XML defined connections and connect to each defined one thereby holding a pool of connections open to distribute the transactions to. For all databases the connect pool connections use prepared statements and once the connection is established will prepare statements for all of the stored procedures against each connection.

PostgreSQL XML Connect Pool

Within the driver script there is a commented line that can be uncommented to report details on all of the connections and prepared statements that are made.

postgresql connection information

Finally it is important to note that the main monitor connection continues to connect to the standard defined connection and reports NOPM and TPM from that single instance. Where a clustered environment such as Oracle RAC reports performance data for the entire cluster this will report cluster performance. If however you have defined connections to separate unrelated instances then this monitor connection will only report out for the instance it is connected to. For this reason where a database will not report performance data across the cluster the XML connect pool driver script will also report client side transactions for each Virtual User and in total to provide a guide to the workload directed to each instance.

The connect pool can be used in conjunction with other features such as use all warehouses and asynchronous connections. For more information on configuring the XML connect pool see the relevant section in the HammerDB documentation.

HammerDB v4.0 New Features Pt3: Refactored Stored Procedures

Another key feature introduced with HammerDB v4.0 is the refactoring of the stored procedures for some of the TPROC-C workloads. This means that the performance metrics reported in NOPM/TPM could be different from previous releases as well as the ratio between NOPM and TPM for these workloads. Therefore results from v4.0 may not be directly comparable with the results from previous releases for your database. The reason for these changes is that over time for some databases more advanced features or drivers have been introduced that improve performance since the time that the original HammerDB transactions or interfaces were written. These changes were contributed and accepted into HammerDB v4.0.

NOPM vs TPM

Firstly it is important to understand the metrics NOPM, TPM and the difference between them. The TPROC-C workload is derived from the TPC-C workload, the primary metric for TPC-C is called tpmC, the number of new order transactions processed per minute. As HammerDB TPROC-C is a derived workload it is not permitted to use tpmC and therefore instead uses a metric called NOPM that records the number of new order transactions processed per minute. Although a similar metric to tpmC it is not considered to be directly comparable. From HammerDB v4.0 NOPM should be considered the primary metric and is the only one that should be used for a cross database comparison. For this reason from HammerDB v4.0 NOPM is printed first.

NOPM Primary Metric

However for backward compatibility the generic.xml configuration file contains an option called first_result, setting this to TPM will print the results in the same format as v3.3 and earlier.

<benchmark>
...
<first_result>TPM</first_result>
...
</benchmark>

So why not just print NOPM and report a single metric for TPROC-C as per the official TPC-C workloads? The key reason is that HammerDB is intended to give us more insights into database performance and whereas NOPM is a cross-database comparable metric TPM is a database specific metric that can give more details on the workload and be related to other database specific metrics but cannot be compared between different databases. For example for the Oracle database the TPM value is calculated from the number of user commits + user rollbacks and you can query these metrics in the v$sysstat table during the test. This metric is the same used in Oracle tools such as AWR reports and as shown the TPM metric captured by HammerDB is exactly the same as Transactions and Rollbacks in the Load Profile section when these values are multiplied by a value of 60. This can then be used to compare with other Oracle specific metrics such as redo size and logical reads.

Load Profile

Per SecondPer TransactionPer ExecPer Call
DB Time(s):151.20.00.000.00
DB CPU(s):124.90.00.000.00
Background CPU(s):0.00.00.000.00
Redo size (bytes):862,578,321.35,390.2  
Logical read (blocks):10,369,967.864.8  
Block changes:5,009,042.831.3  
Physical read (blocks):498.80.0  
Physical write (blocks):23.60.0  
Read IO requests:329.70.0  
Write IO requests:2.20.0  
Read IO (MB):6.80.0  
Write IO (MB):0.20.0  
IM scan rows:0.00.0  
Session Logical Read IM:0.00.0  
User calls:122,685.10.8  
Parses (SQL):74,085.20.5  
Hard parses (SQL):0.00.0  
SQL Work Area (MB):155.90.0  
Logons:0.00.0  
User logons:0.00.0  
Executes (SQL):3,303,622.120.6  
Rollbacks:270.40.0  
Transactions:160,026.2 
Oracle AWR Report Load Profile

Similarly for SQL Server TPM records Batches/sec which is the same value seen in the Activity Monitor in SSMS. Using the database specific metric means that we can sample the transaction rate in real time for the HammerDB transaction counter whereas doing so for the NOPM value would impact the test by reading from a table being modified during the test. It can be seen from the example below that the HammerDB Transaction Counter

Transaction Counter

reports the same data per minute as Activity Monitor reports per second and can therefore be related directly to the other data that Activity Monitor provides such as Database I/O.

Activity Monitor

Refactoring Differences

When using HammerDB v4.0 the notable differences between TPM and NOPM rates than can be observed are as follows:

SQL Server

For SQL Server note that a change in the TPM metric was introduced at v3.3 and continues into v4.0. Initially these releases were planned close together however v4.0 was delayed to introduce UHD scalable graphics. The key difference is that in v3.2 the TPM value is reported as 2X the value of v4.0 (and v3.3) due to a change in the SQL Server driver interface used. The reason for the change was that there was a bug raised for the previous ODBC v1 interface tclodbc the bug meant that the 3rd party library would crash if SQL Server was shutdown during a test and then the Virtual Users subsequently closed after they had correctly reported the error. The fix to this was to move to a newer ODBCv3 interface called tdbc::odbc. The previous tclodbc interface required 2 commits per transaction one in the stored procedure and one outside in the driver script. If the external one was removed or commented then the driver would error saying that the connection was already in use by a previous command. Moving to the newer interface this external commit was no longer needed and was removed, as a result the TPM value has reduced however the NOPM value is exactly the same. (From tests it should be slightly higher with the newer ODBC v3 interface). It is important to note that as far as the HammerDB TPROC-C workload is concerned it is not doing any more work as such – just doing one less commit for the same work. If you compare the driver scripts for the different versions you can observe where the extra commit that was removed. For comparisons with previous releases NOPM can be used unmodified as the primary metric or the TPM value multiplied by 2X to account for the removed commit.

Oracle

For Oracle from v4.0 the stored procedures have been updated to use bulk processing features. As we are now doing bulk operations  we are processing more changes per database commit and this can seen in an AWR report that the redo generated per transaction has also increased from v3.3 to v4.0.  For this reason the NOPM value has increased whilst the TPM value has remained level, therefore the TPM to NOPM ratio has changed for Oracle between these releases. This has been measured at 3.0X TPM to NOPM for v3.3 and earlier but has reduced to 2.11X for v4.0. Therefore HammerDB v4.0 improves the Oracle performance by approximately 1.35-1.45X measured in NOPM for the same TPM. This improvement in NOPM is a recognition of actual improved TPROC-C transactional throughput by using advanced PL/SQL features.

PostgreSQL

The PostgreSQL stored procedures and functions have undergone similar changes as Oracle to take advantage of bulk processing features. These changes have improved NOPM and TPM by approximately 1.35X in HammerDB v4.0 compared to previous releases and is also a recognition of the use of more advanced features.

In conclusion in v4.0 HammerDB has undergone a number of evolutionary changes for some of the database workloads that impact performance results when compared to earlier versions. These differences should be accounted for when comparing results. Although the aim is to keep performance as consistent as possible between releases wherever possible as some databases had introduced new features or newer more advanced interfaces were available it was considered reasonable to modify the stored procedures or use these new interfaces to use available features even though reported results are different in the newer version.

It is also important to note that HammerDB is open source and therefore if it is considered that there are underused features in any of the supported databases then a pull request can be made to ensure that those features are fairly represented in the results reported by HammerDB. The overall goal is to remain consistent as possible whilst providing an accurate representation of the capabilities that each supported database provides as their features are updated and improved.

HammerDB v4.0 New Features Pt2: Scalable UHD Graphics

Up to version 3.3 the HammerDB GUI operated with a fixed scaling factor of 1.33 pixels per point. That means regardless of monitor resolution HammerDB was set to use a fixed number of pixels. Therefore on a higher resolution screen as shown the HammerDB GUI would appear smaller than on a standard DPI display.

HammerDB v3.3 on UHD

For most purposes this did not present an Issue and the HammerDB display rendered correctly up to 1920 x 1080 Full HD displays. However a GitHub Issue was raised because once moving to UHD displays with a pixel density beyond 1920×1080 the HammerDB interface became too small for use. Additionally these type of displays on devices such as PixelSense on Microsoft Surface Book were popular for presentations and demos and therefore the task was to update the HammerDB display to support scalable graphics. This meant that HammerDB would detect the display density and size the application interface accordingly. At HammerDB v4.0 the graphical interface has been extensively updated as shown to support UHD displays with pixel densities beyond 1920 x 1080 on both Windows and Linux.

HammerDB v4.0 on UHD

To achieve this scaling it was necessary to move from fixed PNG based graphics to SVG requiring all graphics and icons to be regenerated in the updated format and the interface rewritten to use SVG. Similarly it was necessary to use scalable fonts to ensure that the text scaled alongside the graphics. It was also needed to move to updated themes so that all dialogs and options scaled alongside the main display. With the move to the updated nomenclature of TPROC-C and TPROC-H already needing extensive re-documentation it was decided to delay the release of v4.0 in summer 2020 to invest the time in rewriting the interface to ensure usability on devices such as PixelSense displays.

Once the main interface was updated it was also necessary to update the transaction counter, otherwise the fixed display would have continue to render at a smaller size within the updated interface.

Transaction Counter

Similarly the CPU Metrics display was modified to scale in proportion to the main interface.

CPU Metrics

Finally it was needed to update the graphical Oracle metrics with the same scalability settings. Although this feature is only available for Oracle at v4.0 the plan is to make it available for other databases with future releases.

Oracle Metrics

With these changes the entire HammerDB application supports scalable graphics with the feature tuneable by the settings in the generic.xml file in the config directory.

<theme>
<scaling>auto</scaling>
<scaletheme>auto</scaletheme>
<pixelsperpoint>auto</pixelsperpoint>
</theme>

By default all of the theme settings are set to auto and this is the recommended configuration. However they can be modified if required. Firstly the scaling setting can be changed to fixed to revert to the previous fixed non-scaling graphics. This could for example be used when using a remote X-windows display for faster graphics rendering when a faster alternative such as VNC is not available. For the scaletheme you can use settings of “auto”, “awlight”, “arc” or “breeze”, with “awlight” the default on Linux and “breeze” the default on Windows, with the plan to introduce more theme options over time.

awlight theme

Finally pixelsperpoint is for expert usage to fine tune the scaling of HammerDB according to different screen settings.

Now that HammerDB is available with scalable graphics from v4.0 there is no longer any barrier to running database performance presentations and demos from devices with UHD displays.

HammerDB v4.0 New Features Pt1: TPROC-C & TPROC-H

One of the key differences that stands out with HammerDB v4.0 compared to previous releases is that the workload names have changed from TPC-C and TPC-H to TPROC-C and TPROC-H respectively and therefore a key question is how are the v4.0 workloads different from the previous releases of v3.3 and earlier, what has changed and how does this impact interpreting results? The simple answer is nothing, the workloads are exactly the same workloads derived from the TPC-C and TPC-H specifications and HammerDB v4.0 can be seen as a continuation and enhancement from previous releases, only the name has changed, not the workload itself.  From an engineering perspective this may be all that you need to know. However a reasonable follow-on question is then why change the name at all? This post aims to give some of the background to the change and provide you with the information of where and how the TPROC-C and TPROC-H differ from a ‘real’ TPC-C and TPC-H respectively.

First of all it has always been clear in the HammerDB documentation that the TPROC-C/TPC-C and TPROC-H/TPC-H workloads have not been ‘real’ audited and published TPC results instead providing a tool to run workloads based on these specifications.  For example HammerDB has not used tpmC terminology to report TPC-C based metrics instead using TPM and NOPM nomenclature.  Initially from TPC documentation it was not specified whether using TPC-C and TPC-H terminology for derived workloads was permitted. Additionally both commercial and open source tools based on the specifications also continued to use TPC-C and TPC-H to describe these workloads.  This has now been made explicitly clear, using TPC-C or TPC-H for a non-compliant workload to the full specification is a trademark violation.  For this reason HammerDB has changed the workload names to be legally compliant.  Any tools using TPC-C, TPC-H (or other trademarked TPC workload names) for tools or workloads both commercial and open source should also consider renaming for trademark compliance purposes or take legal guidance.

But why is this important? Surely it was clear that as a derived workload HammerDB results were not actual fully audited TPC results so that this non-compliance was implicit? In the vast majority of cases this was the case and even when using TPC terminology it was clear that using for example TPC-C meant “derived from TPC-C”. However there were cases where intermediate understanding based on some familiarity with the specifications (but without the implicit understanding that the difference was already clear) have questioned the validity of published performance data by users based on this implicit understanding. This change makes that difference explicit and puts everyone on the same page, TPROC-C means derived from TPC-C.

So why derive a workload from TPC-C or TPC-H at all and instead just rely on vendor published and fully audited results? This question is the very essence of what HammerDB is, a tool that takes the well designed and scalable TPC-C and TPC-H specifications and implements a workload derived from them that is accurate and repeatable yet tests database capabilities to the full (compared to alternative simple 1-table no contention workloads) so that running database performance benchmarks becomes fast, low cost and accessible to all, making database benchmarks open source thereby allowing anyone to compare the performance of their databases.

A full understanding of why this is important requires some knowledge of the evolution of database hardware and software.  The HammerDB TPROC-C workload by design intended as CPU and memory intensive workload derived from TPC-C – so that we get to benchmark at maximum CPU performance at a much smaller database footprint.  In the days before highly performant SSDs and persistent memory, database benchmarks had a significant challenge in comparing performance due to the available I/O performance.  For TPC-C this meant enough available spindles to reduce I/O latency and for TPC-H enough bandwidth for data throughput.  Fully audited configurations would require multiple racks of I/O capacity to reach maximum CPU performance. This was both expensive and time consuming to configure.  Even with superior I/O performance today the I/O configuration required on-premise or in the cloud remains costly.

Additionally a fully audited benchmark requires multiple client servers to sustain the large volume of clients as well as a TP Monitor (Transaction Processing Monitor) This TP monitor acts as middleware transferring the transactions between the clients and the database server.  The challenge was how to take the essence of the TPC specifications that are made available for free and implement the workloads in a method that maximises CPU performance and can be run on anything from laptops to servers without the significant expense of I/O capacity and multiple client servers to run the workloads.  Therefore for HammerDB TPROC-C we eliminated keying and thinking time and eliminated the requirement for terminals. This meant we could dispense with the TP-monitor, reduce the number of clients, reduce the storage and the schema size but still run similar transactions to the TPC-C specification so that it was running a scalable and repeatable workload in a CPU and memory intensive manner.

Over time since we first ran HammerDB workloads we noticed that the CPU generation to generation performance ratios between systems was the same with this CPU intensive default mode as the official published TPC-C benchmarks. I.e. if system A generated 1.5X more transactions than system B in the fully audited benchmark then the HammerDB result was also 1.5X better.  Consequently we were getting very similar insights both faster and cheaper meaning we could test orders of magnitude more configurations in the same amount of time generating relative performance ratios.

Why would this be the case?  Surely if the database schema is smaller, the workload more intensive, and there is not the same I/O capacity demands then the results would be different. The key aspect is the presence of the TP Monitor and this is arguably the area where the most confusion arose in stating that a HammerDB TPC-C workload was not an actual TPC-C workload which should have already been clear. Typically in a fully audited TPC-C the client sessions are not connecting directly to the database with many 1000’s of sessions. Instead all of the client servers are connecting to the TP monitor.  This is not a new concept, as given in the description for the earlier TPC-B workload  “a transaction monitor can multiplex transaction streams to match the processing profile of the database subsystem. For example, 1000 user terminals would present transactions with human think times and delays, and the transaction monitor will concentrate them down to a steady stream of, say, 50 concurrent processes.”   Therefore the actual database server workload is both CPU and memory intensive with actual connections typically numbered in the hundreds processing transactions for the clients managed by the TP monitor.  HammerDB eliminates this stage instead implementing a workload that connects the steady stream of transactions directly to the database.  Additionally because the number of Virtual Users is lower for the HammerDB   default mode (see below for alternatives) each Virtual User will choose a home warehouse at random. Once that home warehouse is chosen then most of the work takes place against that warehouse – therefore for example you hit max CPU at 200 VUsers then most of the work is on 200 warehouses regardless of how many you have created.  As the home warehouse is chosen at random however you want enough to ensure an even spread of Virtual Users across the warehouses. Therefore the general advice is 250-500 warehouse per socket and for example a starting point of 1000 warehouses for a 2 socket server regardless of the database is a good starting point. This is also the reason why the default maximum number of warehouses is 5000. You can change this if you wish, however this limit is set to prevent a typical error of over configuring the database size in the expectation that it will improve performance.

Nevertheless more recently SSDs and persistent memory are lowering the price points for high performance I/O increasing the desire to test more I/O that wasn’t there when people needed to buy a dedicated storage array to do so. For this reason beyond the default mode there are the 3 more advanced features:

  1. Use All Warehouses (Choose a new warehouse per transaction)
  2. Event driven scaling (asynchronous clients with keying and thinking time)
  3. XML Connect Pooling (test distributed clusters)

These can be used separately or together for the sort of scenarios where it is desired to increase the I/O load, the number of Virtual Users or to test a distributed environment.  However to test a large number of Virtual Users with event driven scaling will need increasing the number of HammerDB clients with primary and replica modes and you will also need to implement a form of middleware to concentrate these connections. Therefore you are moving away from a more agile and rapid form of testing to more complex configurations.  You are entirely in control of your test environment and therefore the choice is yours dependent on the scenarios you wish to test rather than being bound by a more rigid specification.

In terms of analytics and the TPROC-H workload derived from TPC-H, this specification  does not require middleware so when running TPROC-H you are close to the specification,  however there remain differences.  For example the TPC-H specification includes measuring database load times that is beyond the scope of HammerDB and you may wish to implement features such as in-memory column stores, partitioning and compression to improve performance. HammerDB cannot implement advanced features that may not be available in all test environments and therefore builds a base configuration that would be available to all. The user can then modify the schema as they wish. For this reason TPROC-H is also a workload derived from TPC-H and has moved away from providing information on calculating the QphH figure to focusing on actual query times for power and throughput tests and a total geometric mean of query times.

In conclusion, TPROC-C and TPROC-H are the new names for the same HammerDB specific workloads that mean “derived from TPC-C” and “derived from TPC-H” respectively to make running workloads based on these specification both faster and cheaper and available to all.  Official TPC-C and TPC-H compliant results can as has always been the case only be found on the official TPC website.

 

 

Automating CLI Tests on Windows

The information in this post is a duplicate of this GitHub Issue https://github.com/TPC-Council/HammerDB/issues/84. The issue regards running build and driver scripts automatically in Windows.  As both build and driver examples are not given together in previous posts, the examples are copied here. This is intended as a template that you can take and modify for your own needs.

A few things to note. Firstly these scripts are multithreaded so once you do build schema and vurun it is running multiple threads. We need a different approach for both because in the schema build we are waiting for all of the virtual users to finish (vwait forever), however for the driver we are waiting for a set period of time before terminating them. For the build I have created more warehouses but with 5 virtual users to show the multithreaded nature of the build. For the driver we have set a rampup and duration of 3 minutes in total (180 secs) – this runs in the virtual users, then in the main monitor virtual user it runs a timer – this is set to 200 seconds more than the rampup and driver combined (if it is less it will terminate the test before ending).

With both I copied the scripts to C:\Program Files\HammerDB-3.3. I then did:

hammerdbcli.bat auto autorunbuild.tcl

and

hammerdbcli.bat auto autorundrive.tcl

When both are complete the shell will exit as intended therefore it is a good idea to set logtotemp to ensure that you receive both the output and errors (for example if the build script exits because the database already exists).

Build Script – autorunbuild.tcl

puts "SETTING CONFIGURATION"
global complete
proc wait_to_complete {} {
global complete
set complete [vucomplete]
if {!$complete} { after 5000 wait_to_complete } else { exit }
}
puts "SETTING CONFIGURATION"
dbset db mssqls
diset connection mssqls_server {(local)\SQLDEVELOP}
diset tpcc mssqls_count_ware 10
diset tpcc mssqls_num_vu 5
vuset logtotemp 1
buildschema
wait_to_complete
vwait forever

Driver Script – autorundrive.tcl

#!/bin/tclsh
proc runtimer { seconds } {
set x 0
set timerstop 0
while {!$timerstop} {
incr x
after 1000
  if { ![ expr {$x % 60} ] } {
          set y [ expr $x / 60 ]
          puts "Timer: $y minutes elapsed"
  }
update
if {  [ vucomplete ] || $x eq $seconds } { set timerstop 1 }
    }
return
}
puts "SETTING CONFIGURATION"
dbset db mssqls
diset connection mssqls_server {(local)\SQLDEVELOP}
diset tpcc mssqls_driver timed
diset tpcc mssqls_rampup 1
diset tpcc mssqls_duration 2
vuset logtotemp 1
loadscript
puts "SEQUENCE STARTED"
foreach z { 1 2 4 } {
puts "$z VU TEST"
vuset vu $z
vucreate
vurun
#Runtimer in seconds must exceed rampup + duration
runtimer 200
vudestroy
after 5000
}
puts "TEST SEQUENCE COMPLETE"

A recommended editor that recognises TCL syntax for editing HammerDB script is Vim.