HammerDB v4.10 New Features: Partitioning and Advanced Statistics for SQL Server TPROC-H

 

HammerDB v4.10 adds performance enhancements for the SQL Server TPROC-H workload in the form of partitioning for the orders and lineitem tables as well as the option to create advanced statistics.

Enabling Partitioning and Advanced Statistics

These can be selected with a checkbox in the GUI.

or the following options being set to true in the CLI.

mssqls_tpch_partition_orders_and_lineitems = true
mssqls_tpch_advanced_stats = true

Benefits of Partitioning and Advanced Statistics

When partitioning is selected both the order and Lineitem tables are partitioned with each partition spanning 1 week. By default t partition flag is set to false by default and must be enabled. Both Column Store and Row store are allowed to be partitioned. When partioning is enabled. Certain primary and foreign key definitions defined in HammerDB are no longer valid and will not be created. However HammerDB will still create those keys/constraints when the Orders and Lineitem tables are not partitioned.`

For columnstore indices, the is an initial degradation of performance, however if the user calls Alter Index … Reorganize or leaves the server running after the initial load for approximately 48 hours, the performance of power runs improves by 2X when compared to column store indices with no partitioning.

The advanced statistics option is also set to off by default, and when enabled it does increase the overall load time however it creates an option for users loading TPCH for MSSQL to perform a more advanced statistical analysis to benefit query performance.

Summary

HammerDB v4.10 introduces an option to partition the SQL Server TPROC-H schema and perform a more advanced statistical analysis, taking advantage of these options can benefit query performance.

 

 

 

 

HammerDB v4.10 New Features: Schema and Consistency Checks

The HammerDB TPROC-C and TPROC-H workloads are derived from the TPC-C and TPC-H workloads respectively.  Although the HammerDB workloads are not identical to TPC-C and TPC-H it is still important that the workloads implemented maintain the same data consistency as the official workloads. For this reason HammerDB has implemented the data consistency checks for TPC-C and TPC-H to be run against all databases.

The consistency checks are useful to be run after a schema build but also after a workload has been run to ensure that the data has remained consistent throughout all of the changes implemented by the workload.  As the consistency checks are the same ones stipulated by the TPC specification you are also welcome to run them against tests run against clustered and distributed environments and any workloads claimed to be derived from TPC workloads. The checks ensure that your transactions are fully and correctly processed by your database.

In addition to the consistency checks, HammerDB has also added a number of schema checks to ensure that the a schema build has completed successfully and the database is fully populated with the correct stored procedures.

Running the TPROC-C Schema and Consistency Check

To run the schema check, firstly build your schema and let it complete.

Then select the Check option from the main menu or treeview, accept the dialog prompt and let it run through the checks.  With the CLI run the checkschema command. Note that on some databases row counts may take longer than others and the test may take a number of minutes to complete.  The first part of the schema checks are particularly relevant after the build and it will check.

  1. Database Exists.
  2. Tables Exist.
  3. Warehouse count in schema is the same as the HammerDB configuration.
  4. Tables are indexed.
  5. Tables are populated.
  6. Stored Procedures Exist.

It will also run the consistency checks that we will cover after the workload has completed.

After the initial check, run the workload.

Let the workload run to completion.

You can then re-run the schema and consistency checks.

After you have run a workload the consistency checks are the most important aspect although the schema checks provide benefit especially in checking the row populations. With the consistency checks, HammerDB will create a temporary table with a sample of warehouse ids and then run the following checks.

  1. For each District within a Warehouse, the next available Order ID (d_next_o_id) minus one is equal to the most recent Order ID [max(o_id)] for the ORDER table associated with the preceding District and Warehouse. Additionally, that same relationship exists for the most recent Order ID [max(o_id)] for the NEW-ORDER table associated with the same District and Warehouse. Those relationships can be illustrated as: d_next_o_id – 1 = max(o_id) = max(no_o_id) where (d_w_id = o_w_id = no_w_id) and (d_id = o_d_id = no_d_id)
  2. For each District within a Warehouse, the value of the most recent Order ID [max(no_o_id)] minus the first Order ID [min(no_o_id)] plus one, for the NEW-ORDER table associated with the District and Warehouse, equals the number of rows in that NEW-ORDER table. That relationship can be illustrated as: max(no_o_id) – min(no_o_id) + 1 = rows in NEW-ORDER where (o_w_id = no_w_id) and (o_d_id = no_d_id)
  3. For each District within a Warehouse, the sum of Order-Line counts [sum(o_ol_cnt)] for the Orders associated with the District equals the number of rows in the ORDER-LINE table associated with the same District. That relationship can be illustrated as: sum(o_ol_cnt) = rows in the ORDER-LINE table for the Warehouse and District
  4. The sum of balances (d_ytd) for all Districts within a specific Warehouse is equal to the balance (w_ytd) of that Warehouse.

If completed successfully the check ensures that your data has remained consistent after the workload has completed.

Running the TPROC-H Schema and Consistency Check

The schema and consistency checks for the TPROC-H as expected follow a similar approach to TPROC-C and you can run the check after the build to ensure that the build completed successfully. If you only run the query aspect of the TPROC-H workload the data is unmodified and therefore there will not be benefit from running the schema and consistency checks again after you have verified the initial build. However the refresh function does modify the data and therefore if you run a refresh function then you can run the check to verify the consistency. In the example below we have used one virtual user to run the power test which runs the new sales order refresh, one query set and the old sales refresh.

After this test running the refresh function and query workload run the schema and consistency check to verify the refresh function has not impacted the data consistency.

 

The TPROC-H schema and consistency checks run the following checks:

  1. Database Exists.
  2. Tables Exist.
  3. Scale Factor in schema is the same as the HammerDB configuration.
  4. Tables are indexed.
  5. Tables are populated.
  6. Consistency Check.
    1. A consistent state for the TPC-H database is defined to exist when: O_TOTALPRICE = SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)(1+L_TAX) for each ORDER and LINEITEM defined by (O_ORDERKEY=L_ORDERKEY) and can be checked by: SELECT DECIMAL(SUM(DECIMAL(INTEGER(INTEGER(DECIMAL (INTEGER(100DECIMAL(L_EXTENDEDPRICE,20,3)),20,3)* (1-L_DISCOUNT)) * (1+L_TAX)),20,3)/100.0),20,3) FROM LINEITEM WHERE L_ORDERKEY = okey SELECT DECIMAL(SUM(O_TOTALPRICE, 20, 3)) from ORDERS WHERE O_ORDERKEY = okey

    Summary

    HammerDB v4.10 introduces schema and data consistency checks for all databases and workloads to enable you to verify the schema has been built correctly and that the data remains consistent before and after running workloads.

HammerDB v4.10 New Features: Purge and Write back for MariaDB TPROC-C

Many of the HammerDB TPROC-C workloads have included features to prevent the database doing maintenance tasks for the previous run whilst another run is taking place. This is particularly important when running automated workloads back-back to generate a performance profile for a progressively increasing number of virtual users.  An example of this is the “Checkpoint when complete” option for Oracle which will flush all dirty/modified data blocks in the in-memory buffer cache to disk and then switch the logfiles so this will not happen during a subsequent run with database writer activity impacting performance.

History List

With MySQL and MariaDB a key maintenance activity that can impact performance is purging which can be monitored with the history list length with a SQL statement such as follows.

select variable_value from information_schema.global_status where variable_name = 'INNODB_HISTORY_LIST_LENGTH'"

When a workload deletes data from MySQL or MariaDB the row is not deleted straight away and only when it is not needed for read consistency and the referring undo record for the operation is deleted. The history list length that can be queried is the number of undo log pages that contain changes.  If the history list length grows too large then a larger number of row versions can impact query performance, however purging to process the undo log pages can also itself impact performance requiring mutex locking.  For this reason we have a number of parameters, notably innodb_purge_threads, innodb_purge_batch_size, innodb_max_purge_lag and innodb_max_purge_lag_delay.  The number of purge threads and batch size determine how aggressively purging takes place and the purge lag and delay allow throttling of throughput of transactions if the history list grows too large to allow purging to catch up.

It should be clear that if you want to stress a MySQL or MariaDB databases then a workload that includes delete operations is essential to the overall picture and fortunately in the HammerDB workload the DELIVERY stored procedure includes a loop that processes a large number of delete operations and therefore is a great opportunity to test the effectiveness of your purge settings.  However until recently the purge parameters have only been able to be changed after a server restart and therefore we haven’t been able to accelerate a purge after a workload has completed.  Fortunately with MariaDB this change  Make number of purge threads variable dynamic as it suggests makes the configuration dynamic and we can take advantage of this to do a purge to clear the history list after a workload has completed so the purging for one test run does not unduly impact the following run.

In addition to the history list when adding this functionality we can also add similar functionality to write back the dirty buffers in the buffer pool at the same time.

Setting Purge and Write Back

To run the purge and write back after a MariaDB workload you need at least version 10.7.0 of MariaDB when the variables were made dynamic.  If you do not have this version HammerDB will report during a run that the settings cannot be made.  Otherwise if you have version 10.7.0 or above, in the GUI choose the Purge when complete checkbox in your driver settings.

Or in the CLI set the option maria_purge to true.

tpcc {
maria_count_ware = 30
...
maria_purge = true
}

With this setting enabled, run your MariaDB TPROC-C workload as normal.

Once the run has completed it will dynamically set the variables to run the purge and write back and restore your variables when complete.

Choosing optimal purge settings

Of course for this notebook based example the history list length does build up unduly over the course of the workload, however in a larger server environment with high transaction rates it is possible to see history list lengths such as follows depending on your purge settings.

"TEST RESULT : System achieved 916487 NOPM from 2128748 MariaDB TPM",
"Starting purge: history list length 13107078",
"Starting write back: dirty buffer pages 2289677",

Typically for optimal performance during a HammerDB run you want purge_threads and batch_size to be at the default settings as follows innodb_purge_threads =4 , innodb_purge_batch_size = 1000, and then set max_purge_lag and purge lag_delay to low values so we minimize delays during the test, such as innodb_max_purge_lag=0 and innodb_max_purge_lag_delay=1.  As described previously, even in this large server environment HammerDB will modify these settings to accelerate the purge and write back after the run has completed and then restore your settings when it has finished ready for the next workload giving us the best possible results each time.

Summary

Now that MariaDB from version 10.7.0 allows us to dynamically set the purge settings we can take advantage of this to complete the purge after a run has completed before the next one starts so that our test runs are entirely repeatable without being impacted by purging.

Thank-you to Marko Mäkelä of MariaDB for guidance on exactly which settings were needed for purge and write back to make this new HammerDB feature possible.