Chapter 11. Introduction to Analytic Testing (TPROC-H derived from TPC-H) and Cloud Queries

Table of Contents

1. What is TPROC-H derived from TPC-H?
2. Running the Power and Throughput Test and Calculating the Geometric Mean
3. Choosing a Database for running TPROC-H workloads
3.1. Oracle
3.2. Microsoft SQL Server
3.3. Db2
3.4. PostgreSQL
3.5. MySQL
3.6. MariaDB
4. Benchmarking Database Cloud Services
4.1. Redshift Cloud Analytic Workload

Analytic workloads can also be interchangeably described as Decision Support, Data Warehousing or Business Intelligence, the basis of these workloads is the ability to process complex ad-hoc queries on large volumes of data. In contrast to a transactional workload the focus is upon reading as opposed to modifying data and therefore requires a distinct approach. The ability of a database to process transactions gives limited information towards the ability of a database to support query based workloads and vice-versa, therefore both TPROC-C and TPROC-H based workloads complement each other in investigating the capabilities of a particular database. When reading large volumes of data to satisfy query workloads it should be apparent that if multiple CPU cores are available reading with a single processing thread is going to leave a significant amount of resources underutilized. Consequently the most effective Analytic Systems employ a feature called Parallel Query to break down such queries into multiple sub tasks to complete the query more quickly. Additional features such as column orientation, compression and partitioning can also be used to improve parallel query performance. Advances in server technologies in particular large numbers of CPU cores available with large memory configurations have popularised both in-memory and column store technologies as a means to enhance Parallel Query performance. Examples of databases supported by HammerDB that support some or all of these enhanced query technologies are the Oracle Database, SQL Server, Db2, MariaDB and PostgreSQL, databases that do not support any of these technologies are single threaded query workloads and cannot be expected to complete these workloads as quickly. If you are unfamiliar with row-oriented and column-store technologies then it is beneficial to read one of the many guides explaining the differences and familiarising with the technologies available in the database that you have chosen to test. With commercial databases you should also ensure that your license includes the ability to run Parallel workloads as you may have a version of a database that supports single-threaded workloads only.