{"id":378,"date":"2019-05-05T10:05:00","date_gmt":"2019-05-05T10:05:00","guid":{"rendered":"http:\/\/www.hammerdb.com\/blog\/?p=378"},"modified":"2019-07-08T10:22:56","modified_gmt":"2019-07-08T10:22:56","slug":"testing-mysql-8-0-16-on-skylake-with-innodb_spin_wait_pause_multiplier","status":"publish","type":"post","link":"https:\/\/www.hammerdb.com\/blog\/uncategorized\/testing-mysql-8-0-16-on-skylake-with-innodb_spin_wait_pause_multiplier\/","title":{"rendered":"Testing MySQL 8.0.16 on Skylake with innodb_spin_wait_pause_multiplier"},"content":{"rendered":"<p>In the recent MySQL 8.0.16 <a href=\"https:\/\/dev.mysql.com\/doc\/refman\/8.0\/en\/mysql-nutshell.html\">release<\/a> there is a new variable for the InnoDB storage engine called <a class=\"link\" href=\"https:\/\/dev.mysql.com\/doc\/refman\/8.0\/en\/innodb-parameters.html#sysvar_innodb_spin_wait_pause_multiplier\"><code class=\"literal\">innodb_spin_wait_pause_multiplier<\/code><\/a> described as providing &#8220;greater control over the duration of spin-lock polling delays that occur when a thread waits to acquire a mutex or rw-lock&#8221; and &#8220;delays can be tuned more finely to account for differences in PAUSE instruction duration on different processor architectures&#8221;<\/p>\n<p>This post aims to address what this about and whether you really need to be concerned about the difference in instructions on the different processor architectures for MySQL\u00a0 by testing the performance with HammerDB.\u00a0 Note that the main developer of HammerDB is Intel employee (#IAMINTEL) however HammerDB is a personal open source project and HammerDB has no optimization whatsoever for a database running on any particular architecture.\u00a0 Fortunately the HammerDB TPC-C\/OLTP workload intentionally has a great deal of contention between threads and is therefore ideal for testing spin-locks.<\/p>\n<p>So the PAUSE instruction is\u00a0 an optimization over NOP when a thread is waiting to acquire a lock and is particularly important in spin-locks on x86 CPUs for<a href=\"https:\/\/software.intel.com\/en-us\/articles\/benefitting-power-and-performance-sleep-loops\"> power and performance.<\/a> However in the Skylake microarchitecture (you can see a list of CPUs <a href=\"https:\/\/ark.intel.com\/content\/www\/us\/en\/ark\/products\/codename\/37572\/skylake.html\">here<\/a>) the PAUSE instruction changed and in the <a href=\"https:\/\/software.intel.com\/sites\/default\/files\/managed\/9e\/bc\/64-ia-32-architectures-optimization-manual.pdf?countrylabel=Colombia\">documentation<\/a> it says &#8220;the latency of the PAUSE instruction in prior generation microarchitectures is about 10 cycles, whereas in Skylake microarchitecture it has been extended to as many as 140 cycles.&#8221; and &#8220;as the PAUSE latency has been increased significantly, workloads that are sensitive to PAUSE latency will suffer some performance loss.&#8221;\u00a0 The impact on a database workload is not necessarily straightforward however as it depends on how much time that workload actually spends in spin-locks, fortunately as noted a HammerDB deliberately induces a great deal of contention so is at the top end of workloads that do.<\/p>\n<p>Also it is interesting to note that the impact of this change was not observed in other databases or other MySQL storage engines such as MyRocks the only noticeable impact with HammerDB workloads occurred in MySQL\u00a0 with InnoDB where in the source code ut0ut.cc in the directory storage\/innobase\/ut the following section has UT_RELAX_CPU defined to call the PAUSE instruction on x86.<\/p>\n<pre>for (i = 0; i &lt;delay * 50; i++) {\n\u00a0\u00a0\u00a0 j += i;\u00a0 \n\u00a0\u00a0    UT_RELAX_CPU();\n  }<\/pre>\n<p>Note that this fixed value of 50 is multiplied by the parameter\u00a0 <a href=\"https:\/\/dev.mysql.com\/doc\/refman\/5.7\/en\/innodb-parameters.html#sysvar_innodb_spin_wait_delay\">innodb_spin_wait_delay<\/a> which has a default of 6 (but selects a value at random up to this value) so could be calling PAUSE up to 300 times. So <a href=\"https:\/\/dev.mysql.com\/doc\/refman\/5.7\/en\/innodb-parameters.html#sysvar_innodb_spin_wait_delay\">inn<\/a>odb_spin_wait_delay has always been configurable in recent versions but now from MySQL <a class=\"link\" href=\"https:\/\/dev.mysql.com\/doc\/refman\/8.0\/en\/innodb-parameters.html#sysvar_innodb_spin_wait_pause_multiplier\"><code class=\"literal\">innodb_spin_wait_pause_multiplier<\/code><\/a> is configurable also rather than requiring modification of the source code to do so.\u00a0\u00a0 However as noted many factors affect spin-locks including how much time you actually spend in locks in the first place and therefore the best way to really determine how much time you have to spend in fine-tuning spin-locks in\u00a0 the real world can be done through testing.<\/p>\n<p>So to test I took a system with Skylake CPUs and all storage on a <a href=\"https:\/\/www.intel.co.uk\/content\/www\/uk\/en\/products\/memory-storage\/solid-state-drives\/data-center-ssds\/optane-dc-p4800x-series.html\">P4800X SSD<\/a>.<\/p>\n<pre>model name : Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz<\/pre>\n<p>The OS was:<\/p>\n<pre>\u00a0Ubuntu 18.04.1 LTS (GNU\/Linux 4.15.0-23-generic x86_64)<\/pre>\n<p>I downloaded and installed the pre-built Linux binary from <a href=\"https:\/\/dev.mysql.com\/downloads\/mysql\/\">here<\/a>:<\/p>\n<pre>mysql-8.0.16-linux-glibc2.12-x86_64<\/pre>\n<p>and set the following my.cnf<\/p>\n<pre>[mysqld]\ndatadir=\/home\/mysql\/mysql-8.0.16\/mysql-8.0.16-linux-glibc2.12-x86_64\/data\nlanguage=\/home\/mysql\/mysql-8.0.16\/mysql-8.0.16-linux-glibc2.12-x86_64\/share\/english\ndefault_authentication_plugin=mysql_native_password\nsocket=\/tmp\/mysql.sock\nport=3307\nbind_address=127.0.0.1\n# general\n max_connections=4000\t\n table_open_cache=8000\n table_open_cache_instances=16\n back_log=1500\n default_password_lifetime=0\n ssl=0\n performance_schema=OFF\n max_prepared_stmt_count=128000\n skip_log_bin=1\n character_set_server=latin1\n collation_server=latin1_swedish_ci\n transaction_isolation=REPEATABLE-READ\n\n# files\t\t\t\t\t\t\n innodb_file_per_table\n innodb_log_file_size=1024M\n innodb_log_files_in_group=32\n innodb_open_files=4000\n\t\t\t\t\t\t\n# buffers\n innodb_buffer_pool_size=64000M\n innodb_buffer_pool_instances=16\n innodb_log_buffer_size=64M\n\n# tune\n innodb_doublewrite=0\n innodb_thread_concurrency=0\n innodb_flush_log_at_trx_commit=0\n innodb_max_dirty_pages_pct=90\n innodb_max_dirty_pages_pct_lwm=10\n\t\t\t\t\t\t\n join_buffer_size=32K\n sort_buffer_size=32K\n innodb_use_native_aio=1\n innodb_stats_persistent=1\n innodb_spin_wait_delay=6\n innodb_spin_wait_pause_multiplier=50\n\t\t\t\t\t\t\n innodb_max_purge_lag_delay=300000\n innodb_max_purge_lag=0\n innodb_flush_method=O_DIRECT_NO_FSYNC\n innodb_checksum_algorithm=none\n innodb_io_capacity=4000\n innodb_io_capacity_max=20000\n innodb_lru_scan_depth=9000\n innodb_change_buffering=none\n innodb_read_only=0\n innodb_page_cleaners=4\n innodb_undo_log_truncate=off\n\n# perf special\n innodb_adaptive_flushing=1\t\n innodb_flush_neighbors=0\n innodb_read_io_threads=16\n innodb_write_io_threads=16\n innodb_purge_threads=4\n innodb_adaptive_hash_index=0\n\n# monitoring\t\t\t\t\t\t\n innodb_monitor_enable='%'<\/pre>\n<p>Initially I set doublewrite and flush_log_at_trx_commit and then deleted and rebuilt the schema to re-test with these parameters turned on.<\/p>\n<p>I installed HammerDB 3.1 and used the client library from MySQL 5.7 installed in the home directory<\/p>\n<pre>libmysqlclient.so.20<\/pre>\n<p>and added this to the library path as follows:<\/p>\n<pre>export LD_LIBRARY_PATH=\/home\/mysql:$LD_LIBRARY_PATH<\/pre>\n<p>I created a HammerDB build file and called it using the source command with the hammerdbcli tool.<\/p>\n<pre>HammerDB-3.1$ cat innodbbuild400.tcl\n#!\/bin\/tclsh\nputs \"SETTING CONFIGURATION\"\ndbset db mysql\ndiset connection mysql_host 127.0.0.1\ndiset connection mysql_port 3307\ndiset tpcc mysql_count_ware 400\ndiset tpcc mysql_partition true\ndiset tpcc mysql_num_vu 64\ndiset tpcc mysql_storage_engine innodb\nprint dict\nbuildschema<\/pre>\n<p>Once the schema was built I shutdown and restarted the database and configured a test script as follows:<\/p>\n<pre>HammerDB-3.1$ cat innodbtest1.tcl\n#!\/bin\/tclsh\nproc runtimer { seconds } {\nset x 0\nset timerstop 0\nwhile {!$timerstop} {\nincr x\nafter 1000\nif { ![ expr {$x % 60} ] } {\nset y [ expr $x \/ 60 ]\nputs \"Timer: $y minutes elapsed\"\n}\nupdate\nif { [ vucomplete ] || $x eq $seconds } { set timerstop 1 }\n}\nreturn\n}\nputs \"SETTING CONFIGURATION\"\ndbset db mysql\ndiset connection mysql_host 127.0.0.1\ndiset connection mysql_port 3307\ndiset tpcc mysql_driver timed\ndiset tpcc mysql_rampup 1\ndiset tpcc mysql_duration 2\nvuset logtotemp 1\nloadscript\nputs \"SEQUENCE STARTED\"\nforeach z { 1 2 4 8 16 24 32 40 48 56 64 72 80 88} {\nputs \"$z VU TEST\"\nvuset vu $z\nvucreate\nvurun\nruntimer 240\nvudestroy\nafter 20000\n}\nputs \"TEST SEQUENCE COMPLETE\"<\/pre>\n<p>and then ran the test leaving it unattended until it reported the message &#8220;TEST SEQUENCE COMPLETE&#8221;. In particular note that the parameter innodb_spin_wait_pause_multiplier remained at the default throughout.<\/p>\n<pre>HammerDB-3.1$ .\/hammerdbcli \nHammerDB CLI v3.1 \nCopyright (C) 2003-2018 Steve Shaw \nType \"help\" for a list of commands \nThe xml is well-formed, applying configuration \nhammerdb&gt;source innodbtest1.tcl<\/pre>\n<p>The following was the results in graph form:<\/p>\n<p><a href=\"https:\/\/www.hammerdb.com\/blog\/wp-content\/uploads\/2019\/05\/mysql8016.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-393\" src=\"https:\/\/www.hammerdb.com\/blog\/wp-content\/uploads\/2019\/05\/mysql8016.png\" alt=\"\" width=\"599\" height=\"394\" srcset=\"https:\/\/www.hammerdb.com\/blog\/wp-content\/uploads\/2019\/05\/mysql8016.png 599w, https:\/\/www.hammerdb.com\/blog\/wp-content\/uploads\/2019\/05\/mysql8016-300x197.png 300w\" sizes=\"auto, (max-width: 599px) 100vw, 599px\" \/><\/a><\/p>\n<p>with a peak performance of the following:<\/p>\n<pre>Vuser 1:64 Active Virtual Users configured \nVuser 1:TEST RESULT : System achieved 2159378 MySQL TPM at 713488 NOPM<\/pre>\n<p>so we are generating more than 2 million MySQL Transactions per minute and more than 700,000 New Orders per minute without having modified any of the spin-loop parameters.<\/p>\n<p>Now that the parameter is configurable I could also run some experiments:<\/p>\n<pre>mysql&gt; show global variables like '%pause%'; \n+-----------------------------------+-------+ \n| Variable_name | Value | \n+-----------------------------------+-------+ \n| innodb_spin_wait_pause_multiplier | 50 | \n+-----------------------------------+-------+ \n1 row in set (0.00 sec) \n\nmysql&gt; set global innodb_spin_wait_pause_multiplier=25; \nQuery OK, 0 rows affected (0.00 sec) \n\nmysql&gt; show global variables like '%pause%'; \n+-----------------------------------+-------+ \n| Variable_name | Value | \n+-----------------------------------+-------+ \n| innodb_spin_wait_pause_multiplier | 25 | \n+-----------------------------------+-------+ \n1 row in set (0.00 sec)<\/pre>\n<p>what I found was when reducing innodb_spin_wait_pause_multiplier signficantly to a value such as 5 I could achieve marginally improved performance up to 40 virtual users\u00a0 643423 NOPM at a value of 5 compared to 626630 NOPM at 50 however beyond this point performance was signficantly lower and at 64 virtual users was only 278909 NOPM.\u00a0\u00a0 The optimal performance remained at the default.<\/p>\n<p>Further testing involved doubling the number of warehouses with the same results, however not partitioning the schema did bring some benefits in reducing innodb_spin_wait_pause_multiplier to a lower value.<\/p>\n<p>The key finding was remarkably consistent with that of spin-lock configuration on other databases in that in the vast majority of use cases you simply do not have to worry about modifying these parameters from the default regardless of the processor architecture. However it is noted that there are many factors that input into home much time is spent in spin-locks in the first place until these parameters come into play, examples such as the speed of the memory and the storage and configuration of the schema with partitioning as well as the operating system.\u00a0 In particular the number of concurrent users is going to be a significant factor.<\/p>\n<p>Nevertheless if you have an application generating millions of\u00a0 transactions per minute and have the opportunity to test and optimize surely the availability of more parameters to fine tune a database to a particular application is welcome because with each release MySQL is continuing to push the boundaries of performance over previous releases.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the recent MySQL 8.0.16 release there is a new variable for the InnoDB storage engine called innodb_spin_wait_pause_multiplier described as providing &#8220;greater control over the duration of spin-lock polling delays that occur when a thread waits to acquire a mutex or rw-lock&#8221; and &#8220;delays can be tuned more finely to account for differences in PAUSE &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.hammerdb.com\/blog\/uncategorized\/testing-mysql-8-0-16-on-skylake-with-innodb_spin_wait_pause_multiplier\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Testing MySQL 8.0.16 on Skylake with innodb_spin_wait_pause_multiplier&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"ppma_author":[5],"class_list":["post-378","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"authors":[{"term_id":5,"user_id":2,"is_guest":0,"slug":"hammerdb","display_name":"HammerDB","avatar_url":{"url":"https:\/\/www.hammerdb.com\/blog\/wp-content\/uploads\/2018\/10\/logo-white.png","url2x":"https:\/\/www.hammerdb.com\/blog\/wp-content\/uploads\/2018\/10\/logo-white.png"},"author_category":"","user_url":"http:\/\/www.hammerdb.com","last_name":"","first_name":"","job_title":"","description":""}],"_links":{"self":[{"href":"https:\/\/www.hammerdb.com\/blog\/wp-json\/wp\/v2\/posts\/378","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.hammerdb.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.hammerdb.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.hammerdb.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.hammerdb.com\/blog\/wp-json\/wp\/v2\/comments?post=378"}],"version-history":[{"count":24,"href":"https:\/\/www.hammerdb.com\/blog\/wp-json\/wp\/v2\/posts\/378\/revisions"}],"predecessor-version":[{"id":404,"href":"https:\/\/www.hammerdb.com\/blog\/wp-json\/wp\/v2\/posts\/378\/revisions\/404"}],"wp:attachment":[{"href":"https:\/\/www.hammerdb.com\/blog\/wp-json\/wp\/v2\/media?parent=378"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.hammerdb.com\/blog\/wp-json\/wp\/v2\/categories?post=378"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.hammerdb.com\/blog\/wp-json\/wp\/v2\/tags?post=378"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.hammerdb.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=378"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}