HammerDB v4.10 New Features: Schema and Consistency Checks

The HammerDB TPROC-C and TPROC-H workloads are derived from the TPC-C and TPC-H workloads respectively.  Although the HammerDB workloads are not identical to TPC-C and TPC-H it is still important that the workloads implemented maintain the same data consistency as the official workloads. For this reason HammerDB has implemented the data consistency checks for TPC-C and TPC-H to be run against all databases.

The consistency checks are useful to be run after a schema build but also after a workload has been run to ensure that the data has remained consistent throughout all of the changes implemented by the workload.  As the consistency checks are the same ones stipulated by the TPC specification you are also welcome to run them against tests run against clustered and distributed environments and any workloads claimed to be derived from TPC workloads. The checks ensure that your transactions are fully and correctly processed by your database.

In addition to the consistency checks, HammerDB has also added a number of schema checks to ensure that the a schema build has completed successfully and the database is fully populated with the correct stored procedures.

Running the TPROC-C Schema and Consistency Check

To run the schema check, firstly build your schema and let it complete.

Then select the Check option from the main menu or treeview, accept the dialog prompt and let it run through the checks.  With the CLI run the checkschema command. Note that on some databases row counts may take longer than others and the test may take a number of minutes to complete.  The first part of the schema checks are particularly relevant after the build and it will check.

  1. Database Exists.
  2. Tables Exist.
  3. Warehouse count in schema is the same as the HammerDB configuration.
  4. Tables are indexed.
  5. Tables are populated.
  6. Stored Procedures Exist.

It will also run the consistency checks that we will cover after the workload has completed.

After the initial check, run the workload.

Let the workload run to completion.

You can then re-run the schema and consistency checks.

After you have run a workload the consistency checks are the most important aspect although the schema checks provide benefit especially in checking the row populations. With the consistency checks, HammerDB will create a temporary table with a sample of warehouse ids and then run the following checks.

  1. For each District within a Warehouse, the next available Order ID (d_next_o_id) minus one is equal to the most recent Order ID [max(o_id)] for the ORDER table associated with the preceding District and Warehouse. Additionally, that same relationship exists for the most recent Order ID [max(o_id)] for the NEW-ORDER table associated with the same District and Warehouse. Those relationships can be illustrated as: d_next_o_id – 1 = max(o_id) = max(no_o_id) where (d_w_id = o_w_id = no_w_id) and (d_id = o_d_id = no_d_id)
  2. For each District within a Warehouse, the value of the most recent Order ID [max(no_o_id)] minus the first Order ID [min(no_o_id)] plus one, for the NEW-ORDER table associated with the District and Warehouse, equals the number of rows in that NEW-ORDER table. That relationship can be illustrated as: max(no_o_id) – min(no_o_id) + 1 = rows in NEW-ORDER where (o_w_id = no_w_id) and (o_d_id = no_d_id)
  3. For each District within a Warehouse, the sum of Order-Line counts [sum(o_ol_cnt)] for the Orders associated with the District equals the number of rows in the ORDER-LINE table associated with the same District. That relationship can be illustrated as: sum(o_ol_cnt) = rows in the ORDER-LINE table for the Warehouse and District
  4. The sum of balances (d_ytd) for all Districts within a specific Warehouse is equal to the balance (w_ytd) of that Warehouse.

If completed successfully the check ensures that your data has remained consistent after the workload has completed.

Running the TPROC-H Schema and Consistency Check

The schema and consistency checks for the TPROC-H as expected follow a similar approach to TPROC-C and you can run the check after the build to ensure that the build completed successfully. If you only run the query aspect of the TPROC-H workload the data is unmodified and therefore there will not be benefit from running the schema and consistency checks again after you have verified the initial build. However the refresh function does modify the data and therefore if you run a refresh function then you can run the check to verify the consistency. In the example below we have used one virtual user to run the power test which runs the new sales order refresh, one query set and the old sales refresh.

After this test running the refresh function and query workload run the schema and consistency check to verify the refresh function has not impacted the data consistency.

 

The TPROC-H schema and consistency checks run the following checks:

  1. Database Exists.
  2. Tables Exist.
  3. Scale Factor in schema is the same as the HammerDB configuration.
  4. Tables are indexed.
  5. Tables are populated.
  6. Consistency Check.
    1. A consistent state for the TPC-H database is defined to exist when: O_TOTALPRICE = SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)(1+L_TAX) for each ORDER and LINEITEM defined by (O_ORDERKEY=L_ORDERKEY) and can be checked by: SELECT DECIMAL(SUM(DECIMAL(INTEGER(INTEGER(DECIMAL (INTEGER(100DECIMAL(L_EXTENDEDPRICE,20,3)),20,3)* (1-L_DISCOUNT)) * (1+L_TAX)),20,3)/100.0),20,3) FROM LINEITEM WHERE L_ORDERKEY = okey SELECT DECIMAL(SUM(O_TOTALPRICE, 20, 3)) from ORDERS WHERE O_ORDERKEY = okey

    Summary

    HammerDB v4.10 introduces schema and data consistency checks for all databases and workloads to enable you to verify the schema has been built correctly and that the data remains consistent before and after running workloads.

Author