Testing to Destruction: Part two - PowerPoint PPT Presentation

slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Testing to Destruction: Part two PowerPoint Presentation
Download Presentation
Testing to Destruction: Part two

play fullscreen
1 / 92
Download Presentation
Testing to Destruction: Part two
Download Presentation

Testing to Destruction: Part two

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

    1. Testing to Destruction: Part two OOW 2010 Session ID: S316246 Mike Ault Oracle Guru Texas Memory Systems

    2. Testing to Destruction Performance Testing Your Oracle Database Types of things that we need to predict when dealing with databases operating system memory needs operating system CPU needs operating system storage requirements. Growth

    3. How? We must be able to control two specific parts of the database environment: User load Transaction mix If you can not control the users or the transactions impacting the database then it becomes impossible to accurately predict the trends.

    4. Use Standard Tools One way to completely control both the users and the transaction load is to utilize benchmarking tools.

    5. Types of Benchmarks TPC-C TPC-E TPC-H Home Grown Results from the standard benchmarks are posted to the www.tpc.org website

    6. TPC-C TPC-C is an online transaction processing (OLTP) benchmark. TPC-C involves a mix of five concurrent transactions of different types and complexity executed either online or queried for deferred execution. The database is comprised of nine types of tables with a wide range of record and population sizes. TPC-C is measured in transactions per minute (tpmC.).

    7. TPC-E The TPC-E benchmark is slated to replace the TPC-C It was felt that the TPC-C didnt reflect the real world TPC-E is OLTP but adds in a required network component

    8. TPC-H The TPC-H is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications.

    9. Home Grown You select the transactions You select the number of users Can be exact for your environment

    10. Part Two: In part one of this presentation we looked at several Home grown tests In Part two we will look at a TPC-C and a TPC-H test In Both the HDD verses SDD configurations for the respective database types will be evaluated

    11. OLTP Testing Comparison of a disk based system to SSD

    12. Benchmark Tables

    13. Sizing a Benchmark Benchmarks use scale factors Scale factors determine the size of tables/amount of data You must also allow for temporary, system, rollback and log space For OLTP 10 users per scale factor integer is recommended We will use a SF of 1000

    14. SF 1000 Row Counts

    15. Choosing Realities Every Benchmark has latency built in to simulate reality Simulates keyboard and thinking latencies of an average user Can add 1-2 seconds per transaction (randomized) More non-system latency, the more users needed to reach maximum TPM-C In these tests latency other than system related was eliminated

    16. Tests Ramp from 5-500 users spread over several client machines Loads generated with the Quest Software Benchmark Factory software tool. Two copies of the database: one on solid state devices one on two JBOD arrays of 45-15K, 144 gigbayte hard drives. Want to document the effects on the number of transactions caused by placing the data and indexes on SSD versus placing them on disk arrays. RAC scaling test (1-4 nodes) Memory stress test Attempting forcing non-use of interconnect

    17. System Setup The system consists of 4 dual socket quad core 2 GHz, 16 GB memory DELL Infiniband interconnects and switch Oracle11g, RAC database. Fibre Channel switch 2-RamSan-400s, Single RamSan-500 Two JBOD arrays of 45-10K disks each ASM and OCFS for disk/SSD management

    18. System Setup

    19. Data Setup 4 major tablespaces: DATA, INDEXES, HDDATA, and HDINDEXES. DATA and INDEXES tablespaces were placed on the RamSan-500 HDDATA and HDINDEXES tablesapces were placed on 2 sets of 45 disk drives configured via ASM as a single diskgroup with a failover group (RAID10). RamSan 400 SSDs were used for undo tablespaces, redo logs, and temporary tablespaces.

    20. Reasoning One set of tables and indexes owned by schema/user TPCC is placed in the DATA and INDEXES tablespaces An identical set using schema/user HDTPCC is placed on the HDDATA and HDINDEXES tablespaces We can test the effects of having the database reside on SSD assets or having it reside on disk drive assets using the exact same database configuration as far as memory and other internal assets. Placing the redo, undo, and temporary tablespaces on the RamSan-400s we eliminate the concerns of undo, redo, and temporary activities This isolates the results due to the relative placement of data and indexes

    21. Data Layout

    22. Test Software Oracle with: RAC Parallel Query Partitioning Linked for Infiniband (RDP) Benchmark Factory with custom load scripts Allows loading in parallel streams

    23. Scalability Size of the SGA was kept constant for each node while additional nodes were added The number of users was cycled from 5 to 300. The transactions per second (TPS) and peak user count at the peak TPS were noted for both the disk and SSD-based systems.

    24. Scalability Results: 1-4 Instances

    25. Scalability Results The HDD results peak at 1051 TPS and 55 users. The SSD results peak at 3775 TPS and 245 users. The HDD results fall off from 1051 TPS with 55 users to 549 TPS and 15 users, going from 4 down to 1 server. The SSD results fall from 3775 TPS and 245 users down to 1778 TPS and 15 users. However, the 1778 TPS seems to be a transitory spike in the SSD data with an actual peak occurring at 1718 TPS and 40 users

    26. Scalability Conclusions Even at 1 node, the SSD-based system outperforms the HDD based system with 4 full nodes, with the exception of 3 data points, across the entire range from 5 to 295 users. In the full data set out to 500 users for the 4 node run on SSD the final TPS is 3027 while for the HDD 4 node run it is a paltry 297 TPS. A factor of 10 better performance for SSD over HDD on identical hardware and configuration

    27. SSD Top Five Wait Events Top 5 Timed Foreground Events ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Avg Wait % DB Event Waits Time(s) (ms) time Wait Class ----------------------- ----------- -------- ---- ------ ---------- gc buffer busy acquire 82,716,122 689,412 8 47.5 Cluster gc current block busy 3,211,801 148,974 46 10.3 Cluster DB CPU 120,745 8.3 log file sync 15,310,851 70,492 5 4.9 Commit resmgr:cpu quantum 6,986,282 58,787 8 4.0 Schedule

    28. Evaluation The only physical I/O related event in the SSD 4 node top 5 events is the log file sync event with a timing of 5 ms per event. The rest are interconnect issues

    29. HDD Top 5 Wait Events

    30. Evaluation The predominate event is the db_file_sequential_read with 15,034,486 waits and an average wait of 39 ms per wait. This was with a 579.3 read requests per second per instance The free buffer waits were probably caused by wait for writing (there were no free buffer waits on the SSD tests)

    31. Did we see IO waits on the SSD?

    32. Conclusions The main problem for the HDD runs was latency related. The SSD system has a latency that is 10 percent of the HDD system, the physical I/O waits are lower by at least a factor of 10 overall. Reduction of latency allows more processing to be accomplished and more transactions More users can be supported

    33. Transaction and Response Times For HDD to get IOPS we can add disks and eventually get latency down to between 2-5 ms. Would need to increase our number of disks from 90 in a RAID10 to 900 to achieve the minimal latency possible. Purchasing that many disks, HBAs, cabinets, and controllers and providing heating and cooling to support less than a 100 gigabyte database would be ludicrous Throughput is also shown by looking at transaction time and response time in a given configuration. Three statistics: minimum time, maximum time, and 90th percentile time for both transactions and response. Transaction time is the time to complete the full transaction Response time is the amount of time needed to get the first response back from the system for a transaction

    34. HDD Results

    35. HDD Evaluation Transaction times and the response times for the HDD system track closely to each other. At around 150 users the 90th percentile times begin to exceed 1 second and soon reach several seconds as the user load continues to increase. Maximum response and transaction times reach over 200 seconds at around 200 users and hover there for the remainder of the test

    36. SSD Results

    37. SSD Evaluation At no time, even at 500 users, do the average, 90th percentile, or maximum transaction or response times ever exceed 6 seconds. The average and 90th percentile transaction and response times for the SSD run never exceed 0.4 seconds.

    38. Scalability Summary The SSD out-performed the HDD-based database in every scalability test. In total transactions, maximum number of users served at peak transactions, and lowest transaction and response times the SSD-based system showed a factor of 5-10 times better performance and scalability. A single node SSD test outperformed the 4 node HDD test. A single server with 8 CPUs and 16GB of memory hooked to a RamSan-500 can outperform a 4-node, 8 CPU, and 16GB per node RAC system when it is running against a 90 disk RAID10 array setup with identical configurations for the Oracle database.

    39. Additional Note:

    40. Memory Restriction Results: 1 to 9 Gigabyte Cache Sizes

    41. Evaluation The SSD handles the memory stress better by a factor ranging from 3 at the top end at 9 GB total cache size to a huge 7.5 at the low end comparing a 1.05 GB cache on the SSD to a 3 GB cache on the HDD run. The HDD run was limited to 3 GB at the lower end by time constraints. Of course the reason for the wide range can be traced to the increase in physical I/O that resulted from not being able to cache results and the subsequent increase in physical IOPS

    42. Physical IOPS

    43. Evaluation The physical IOPS for the SSD ranged from 86,354 at 1.050 GB down to 33,439 at 9 GB. The HDD was only able to achieve a maximum of 14,158 at 1.050 GB down to 13,500 at 9 GB of memory utilizing the 90 disk RAID10 ASM controlled disk array. The flat response curve of the HDD tests indicate that the HDD array was becoming saturated with I/O requests and had reached its maximum IOPS. The IOPS was allowed to reach the natural peak for the data requirements in the SSD-based tests, while for the HDD tests it was being artificially capped by the limits of the hardware.

    44. Evaluation For the SSD runs the time spent doing db file sequential reads varied from a high of 169,290 seconds for the entire measurement period at 1.05 GB down to 53,826 seconds for the 9 GB run. The HDD required 848,389 seconds at 3 GB down to 579,571 seconds at 9 GB Soeven at 9X as much memory per node the SSD configuration beat the HDD configuration by a factor of 10

    45. Memory Starvation Summary The SSD array handles a reduction in available memory much better than the HDD array. At a little over 1 GB of total cache area per node for a 4-node RAC environment, the SSD outperformed the HD array at a 9 GB total cache size per node for a 4-node RAC using identical servers and database parameters. Due to bugs in the production release of Oracle11g, release, we were unable to test the Automatic Memory Management feature of Oracle; the bug limits total SGA size to less than 3-4 gigabytes per server in Linux.

    46. Limiting Cache Fusion with GC_FILES_TO_LOCKS In the prior TPC-C tests the number one wait event was related to cache fusion. The average wait time per GC event was 8 milliseconds. If we double or triple the I/O wait time of the SSD setup to 2 or 3 milliseconds to allow for the write and read activity of a real ping we could theoretically save anywhere between 5-6 ms per event if we could force the database to not use cache fusion. Per Oracle documentation, GC_FILES_TO_LOCKS will defeat cache fusion and force behavior into the old OPS style pinging. In these tests we attempted to see if cache fusion could be defeated and if we could actually improve performance by saving those 5-6 milliseconds per wait event by converting the transfer across the interconnect into a ping using the RamSan-500 as the transfer medium. NOTE: Supposedly in Oracle12 the kernel will look at IO time verses interconnect transfer time and decide which to use

    47. Setting GC_FILES_TO_LOCKS In our first test we set the GC_FILE_TO_LOCKS parameter to 1-13:0 which was supposed to turn on fine grain locking (the 0 setting) for files 1 to 13 (all of our data files). In our second test we researched the GC_FILES_TO_LOCKS parameter a bit more and found that we shouldnt have set the parameter for the UNDO tablespaces, and even though the examples showed the 0 setting for setting fine grain locks, we decided to set it to a hard number of locks equal to or greater than the number of blocks in each data file. The documentation also showed that generally speaking the SYSTEM and SYSAUX tablespaces dont need as many locks, so we decided to set the value to 1000 for those tablespaces. You dont set the parameter for temporary tablespaces. This led to a new setting of 1-2:1000EACH:7-13:4128768EACH.

    48. Testing Results

    49. Evaluation Our attempts to limit or eliminate cache fusion failed and resulted in poorer performance overall. While we reduced the gc buffer busy acquire waits and in some cases increased the db file sequential reads, the increase in the wait times for the GC related waits offset any gains that were made

    50. Wait Events

    51. TPC-C Tests Summary The SSD array out-performs the HDD array in every aspect of the OLTP environment. The SSD array achieved a minimum of 3 times the performance of the HDD array, many times giving a 10 fold or better performance boost Between buying additional servers and memory and an HDD SAN versus getting fewer servers, less memory, and an SSD array, you will get better performance from the SSD system in every configuration tested. The only way to get near the performance of the SSD array would be to over-purchase the number of disks required by almost two orders of magnitude. When the costs of purchasing the additional servers and memory and the costs are combined with the ongoing energy and cooling costs for a large disk array system, it should be clear that SSDs are a better choice.

    52. DSS and DWH Testing Evaluation of a TPC-H on SSD verses HDD systems

    53. TPC-H Tests This benchmark is used to mimic applications that have long or complex selects, reports, and large inserts and updates, but few deletes. This benchmark mimics DSS and DWH environments

    54. Configuration The same configuration is used for this test as was used for the TPC-C tests

    55. Benchmark Details The database structure for the TPC-H has only 8 tables. The benchmark consists of 22 queries, a delete, and an insert/update transaction in its full form. The first phase of the TPC-H, called the power phase, runs a single thread of queries and the speed with which the system executes the full query set, a 10% delete, and 10% insert/update transaction are used to calculate the power values for the specific system. The second phase of the TPC-H, called the throughput stage, utilizes multiple threads with the number of threads determined by the scale factor used to create the database

    56. Benchmark Tables

    57. Scale factor in TPC-H The scale factor (SF) in a TPC-H determines the size of the database that is built for the test. The standard TPCH-H sizes are: 100 gigabytes, roughly a SF of 107 300 gigabytes, roughly a SF of 322 1 terabyte that would correspond to a SF of about 1066. For our tests the 300 gigabyte size was selected. The 300 gigabytes corresponds to over a terabyte of actual database size once the needed undo, redo, and temporary tablespaces and indexes are added to the 300 gigabytes of data

    58. Table Row Counts

    59. Building the Database We utilized a manual script to create the needed partitioned tables The Benchmark Factory (BMF) application with a custom build script was used to populate the resulting schema. The SSD-based tables were the first built and loaded. As each table finished loading, a simple INSERT INTO HDTPCH.table_name SELECT * FROM TPCH.table_name command was used to populate the HD-based tables. Once all of the tables where built in both locations, the indexes where then built using a custom script After the database build, statistics where built on each schema using the DBMS_STATS.GATHER_SCHEMA_STATS() Oracle provided PL/SQL package.

    60. Single Stream Results Limited Bandwidth In this set of tests the bandwidth was limited for the SSD arrays to levels similar to the disk arrays. The disk arrays total I/O capability is limited to approximately 5600 IOPS which with 8K I/O size yields 43.75 Megabytes per second which is within the limits of a single 4 gigabit Fibre Channel connection, the hard disk arrays will not be limited by the bandwidth. However, since the total I/O capability of the SSD arrays is well over 1,000,000 IOPS, the performance of the SSD arrays will be limited by this configuration.

    61. Performance Factor Results

    62. Evaluation For all queries the SSD drives out-performed the HDD arrays by factors (averaged over 4 individual runs each) ranging from a low of 1.28 for query 3 to a high of 15.21 for query 19, with an average of 2.95. Over all the average performance improvement was 295% (averaging to the nearest whole percent) even with the bandwidth restrictions in place. The ratio of the times to perform the query shows the performance ratio which when multiplied by 100 gives the percent increase or decrease in performance between the HDD and SSD. Note that Query 9 was removed from this data since no run was able to solve it due to internal Oracle errors

    63. Complete HDD and SSD Results

    64. Query 9 Issue Query number 9 failed to execute, issuing the following error: ORA-00600: internal error code, arguments [kxfrGraDistNum3],[65535],[4] Investigations at Oracle Support and other online forums revealed no solutions or work-arounds. Oddly, when the ORDER table was empty or the PART table had no indexes the query would execute and/or complete. Changing the number of parallel query processes seemed to have no effect on the query (they were increased from 150 to 400).

    65. Single Stream Maximized Throughput

    66. Evaluation The Performance Factor varied from a low of 1.33 for query 3 to a high of 24.66 for query 19. Over all the SSD arrays outperformed the HDD arrays by a factor of 5.01 (501%) for the entire query set (less query 12) Maximizing the bandwidth for the SSD arrays improved SSD performance by nearly a factor of 2 (2.95 to 5.01 PF).

    67. Full Results

    68. Query 12 Issues Query number 12 failed to execute on any run in the SSD or HDD tests, issuing the following error: Agent(amd-d31) Error: ORA-12801: error signaled in parallel query server P014, instance dpe-d70:TMSTPCH3 (3) ORA-00600: internal error code, arguments: [qesblMerge], [369164678], [0], [], [], [], [], [], [], [], [], [] Investigations at Oracle Support and other online forums revealed no solutions or work-arounds. There was one bug filed on version for this error but no further information, projected fix date, or release listed.

    69. Wait Events

    70. Evaluation Waits on direct path reads and writes consumed 62% of the total wait time in spite of being located on SSD arrays. For the HDD run 92.3% of time was spent waiting on I/O related events The high waits on SSD for temporary related I/O may be due to only having two 4Gb FC links on each of the RamSan 400s, The problem may also lie with the type of I/O being done to the SSD devices, that is, large sequential writes instead of single block writes. The writes to a temporary tablespace will generally be the size of a sort segment; SSD may not be as efficient writing these large files as it is writing small blocks such as 8k

    71. SSD IOPS Evaluation With both the data and indexes and temporary tablespaces on SSD, the peak Oracle-based (phyreads and phywrites - gv$filestat and gv$tempstat statistics) IOPS reached 1,714,861 for all I/O sources in the database other than redo logs. Of course the actual IOPS to the I/O subsystem are subject to batching by a factor of up to 16, so all numbers in the graphs have been reduced by a factor of 16.

    72. IOPS

    73. HDD IOPS Evaluation With the data and indexes on a hard disk array and the temporary tablespace still on SSD the peak Oracle IOPS reached 852,875, with 848,184 IOPS being credited to the temporary tablespace still on SSD. The HDD arrays reached a peak of 91,348 Oracle-based IOPS. Just to remind you, the IOPS reported by Oracle may be as much as a factor of 16 higher than those actually seen by the I/O subsystem. The graph shows projected IOPS at the I/O subsystem based on a factor of 16 reduction

    74. HDD IOPS

    75. IOPS Notes Note that the IOPS as shown by monitoring the SSDs from their monitor utility rarely showed any one SSD array exceeding 2500 IOPS. This disparity between what Oracle was recording as phyreads and phywrites versus what was actually being seen at the SSD array interfaces may be due to the grouping of read and write requests both by the operating system and the host bus adapters (HBAs) and thus the actual total I/Os may be up to 16 times smaller than reported by Oracle internal read and write request monitoring. Normal I/O grouping runs to approximately 16 I/O requests per actual I/O issued to the SSDs, with 4 servers doing this grouping that leads to a maximum 16 fold decrease in the number of actual requests reaching the SSDs; the actual number is probably less than 16

    76. 8 Stream Results In the 8 stream test the queries had to be reordered on each stream to counteract the results of caching on stream performance and to prevent overloading of the process counts in the parallel streams for the larger queries. Interestingly enough the HDD system outperformed (on the average) the SSD system on queries 16 (PF 0.17) and 18 (PF 0.91) with the other performance factors ranging from a low of 1.6 to a high of 20.5.

    77. Results Graph

    78. Temp IO Times

    79. Evaluation The performance issues on queries 16 and 18 were due to possible bandwidth errors on the SSD side, since the RamSan-400s were connected with only two 1-GB FC connections each to the 4 servers. High write times for the 400 devices were noted. Even with the HDD arrays beating the SSD arrays on 2 queries, overall the SSD systems outperformed the HDD system on the aggregate set of queries by a factor of 3.91 (391%.).

    80. Overall Results Multi stream

    81. Latency of the Data and Index Tablespaces Average overall data and index read latency for the SSD-based system was 1.05 milliseconds Average read latency for the HDD-based systems was 27.47 milliseconds

    82. Performance Irregularities During Multi-Stream Tests It was noted that during the multi-stream tests certain query response times would vary widely. In the SSD runs the ratio of high time to low time ranged from a low of 1.67 to a high of 49.66 with an average of 10.077. With the HDD system the range for the ratio between high and low query times ranged from a low of 1.69 to a high of 43.87 and an average of 8.65. For comparison purposes, during the single stream tests the difference between the minimum and maximum response times for the SSD queries for the 4 runs was between 1.0022 and 1.67 and the time ratios for the HDD runs were between 1.010 and 1.796

    83. Note on Oracle Version for Multi-Thread Tests On the multi-thread tests reported here, Oracle 11g- was utilized. In the total memory management works and internal memory is adjusted within specified ranges determined by the settings of the MEMORY_MAX_SIZE and MEMORY_TARGET parameters. Once the systems were upgraded to to take advantage of bug fixes related to query pathway determinations, a bug caused the memory control to be limited to 4 gigabytes or less when MEMORY_MAX_SIZE and MEMORY_TARGET are set; therefore the systems had to be reverted to using the manual parameters: DB_CACHE_SIZE LARGE_POOL_SIZE SHARED_POOL_SIZE PGA_AGGREGATE_TARGET However, even with the manual parameters set to values greater than had been set using the automated features in Oracle11g-, I was unable to generate a satisfactory 8 stream TPC-H run on either the SSD or HD tablespaces. Therefore, the results reported here are on of Oracle11g with limited bandwidth. The cause of multiple query failures per run was usually due to parallel query slave failures on one or more nodes due to operating system limits becoming exhausted.

    84. Single Stream HDD Results: Moving Temporary Tablespace to HD Arrays As a final, test the temporary tablespace, which was located on the SSD arrays for all previous tests, was moved to the HDD arrays, alternating 6-31 gigabyte temporary files between the two HDD arrays to evenly distribute I/O between the data and index areas

    85. Results

    86. Evaluation Notice that only 4 queries showed major changes when the temporary tablespace was moved to the HDD arrays. This indicates that a majority of sorts and hashes are being done in the PGA areas of the user processes rather than being shifted to disk. Changes in just these 4 queries resulted in an increase in total time of 21.5 minutes over the average HDD execution time

    87. Major Oracle Waits with Temporary on HDD

    88. Evaluation In the HDD run with temporary on HDD the direct path read and write to temp are showing extreme increases in latency. The direct path read temp wait event has increased from an average of 1 millisecond per wait to 96 milliseconds per wait and the direct path write temp increased from 1 millisecond to 63 milliseconds

    89. IOPS

    90. Evaluation The Oracle Internal IOPS with the temporary tablespace moved from SSD to HDD arrays peaked at 318,940 total IOPS with 318,914 IOPS credited to the temporary tablespace.

    91. TPC-H Summary The overriding issue with the performance of TPCH queries is the latency of the underlying storage subsystem. Tertiary to latency is the bandwidth available to transfer the low latency information between the I/O subsystem and the related servers. A properly configured data warehouse system should have low latency storage with high bandwidth connections to the data warehouse servers. The SSD system provided performance improvements of up to a factor of 20 (2000%) for some queries, with the average improvement being a factor of 3.91 or 391 percent for a multi-stream environment and a average improvement factor of 2.91 (291%) for the single stream tests when bandwidth was limited on the RamSan A factor of 5.01 (501%) was noted with no bandwidth restrictions. The SSD technologies provided a large improvement for the Oracle application on the TPC-H benchmark just as it did on the TPC-C benchmark

    92. Questions? www.ramsan.com www.statspackanalyzer.com Mike.ault@texmemsys.com www.tpc.org