Chapter 13. Generating and Loading Bulk Datasets

Table of Contents

1. Generate the Dataset
2. Generate the Dataset with the CLI
3. Generate the template database
3.1. Capture and run the table creation DDL
3.1.1. Oracle
3.1.2. SQL Server
3.1.3. Db2
3.1.4. MySQL
3.1.5. PostgreSQL/Amazon Redshift
4. Run the bulk data load
4.1. Oracle
4.2. SQL Server
4.3. Db2
4.4. MySQL
4.5. MariaDB
4.6. PostgreSQL/Amazon Redshift

For all workloads HammerDB can create the schema and generate and load the data without requiring a staging area, in many circumstances this is the preferred method of loading especially for OLTP workloads. Nevertheless in some circumstances it is preferable to create the data externally as flat files and then use a special database vendor provided bulk loading command to load the data into pre-created tables. This option may be preferred for example where the target database to load is located in the cloud or where the target database has a column structure meaning that load performance using batch inserts is poor. Additionally bulk loading can enable more flexibility to modify the schema according to preference and reload during testing. This chapter details how to generate and load large data sets with HammerDB. From version 4.2 the limit for generating the TPROC-C schema has increased from 30,000 to 100,000 warehouses. Note that this is an interface limit to prevent over-provisioning (100,000 warehouses may generate up to 10TB of data), however it is straightforward to exceed this capacity by manually modifying the generated datageb build script to increase the value.