1 / 49

Database Performance and Tuning for developers (RAC issues and examples)

Database Performance and Tuning for developers (RAC issues and examples). WLCG Service Reliability Workshop 26 November 2007 Miguel Anjo , CERN-Physics Databases team. Outline. Motivation Basics (what you should knew before start developing an application with Oracle backend)

berit
Download Presentation

Database Performance and Tuning for developers (RAC issues and examples)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Database Performance and Tuning for developers(RAC issues and examples) WLCG Service Reliability Workshop 26 November 2007 Miguel Anjo, CERN-Physics Databases team

  2. Outline • Motivation • Basics (what you should knew before start developing an application with Oracle backend) • Oracle way of executing a query, cursors • Constraints (PK, FK, NN, unique) • Bind variables • Transaction definition • Optimizations • Execution Plan • Oracle RAC architecture • Connection management • Views, materialized views • Partitioning • Way Oracle uses indexes • Index Organized Tables • Index – function based, reversed, bitmap • Composite indexes (FTS example) • Analytical functions • PL/SQL - advantages of use • Sequences (VOMS problem) • Inserts vs updates (Phedex example) • Hints and plan stability (SAM example) • Conclusions • Reference Database Performance and Tuning for developers - 2

  3. Motivation (1/4) • Applications must scale to many users • Many performance problems are not seen in small environments • Very good performance can be achieved by good application coding practice • Try to make the application performant from the beginning Basics • If too slow later Optimization (tunning)

  4. Motivation (2/4) – FTS example Database Performance and Tuning for developers - 4

  5. Motivation (3/4) Sources of performance problems • Using too many resources, such as CPU or disk I/O • Potential cause of poor response time (SQL statement takes too long to execute) • Waiting for others holding a single resource, such as a lock • Potential cause of poor scalability (adding more CPU doesn’t allow to run more concurrent users) • Causes contention for the resource • Want to avoid these from the beginning!

  6. Motivation (4/4) • Tuning Cost/Benefit Tuning cost increases in time Tuning benefit decreases in time Benefit Cost Taking a look at tuning cost and benefit over time from application design till full production use Time Design Development Implementation Production

  7. Basics – Oracle way of executing select FILEID from CNS_FILE_REPLICA where SFN=:B0 • Hard parse – check syntax, tables, • Optimization – finds best way to get data (stats, indexes) • Soft parse – check access rights • Bind – change variables with values • Execute – go to index, get rowid, go to table • Fetch – (selects) send rows through network to application • Cursors – queries in memory – there is a maximum number per session Database Performance and Tuning for developers - 7

  8. Basics – Constraints • PK – Primary key • Unique, indexed, not null • FK – Foreign key • Reference to PK, should be always indexed • Ux – Unique key • Unique, indexed • NN – Not null • Indexes do not include NULL values • Reference: http://www.ixora.com.au/tips/not_null.htm Database Performance and Tuning for developers - 8

  9. Basics – Bind variables (1/2) • Key to application performance • No bind • select FILEID from CNS_FILE_REPLICA where SFN=23434 • Hard parse, optimization and all the rest • Very CPU intensive, latches/locks • Bind • select FILEID from CNS_FILE_REPLICA where SFN=:B1 • Soft parse and all the rest • Fast • USE BIND VARIABLES - 100x faster, friendly to others • Reference: • http://www.akadia.com/services/ora_bind_variables.html Database Performance and Tuning for developers - 9

  10. Basics – Bind variables (2/2) • Big brother is watching you! • For complex single time queries might be better not to use bind variables, as it hides the current value to the optimizer Database Performance and Tuning for developers - 10

  11. Basics – Transactions (1/3) • What if the database crashes in middle of several updates? • Transaction is a unit of work that can be either saved to the database (COMMIT) or discarded (ROLLBACK). • Objective: Read consistency, preview changes before save, group logical related SQL • Start: Any SQL operation • End: COMMIT, ROLLBACK, DDL operation (CREATE TABLE,...) • Changes (UPDATE, DELETE, INSERT) are invisible to other users until end of transaction • Changed rows are locked to other users • If others try to change locked rows, they wait until end of other transaction (READCOMMITTEDmode) • Get error if try to change locked rows (SERIALIZABLE mode) • If crashes, rollbacks.

  12. User LCG_FTS_PROD1 SELECT status FROM t_file WHERE file_id= :B1; (status = ‘ready’) SELECT status FROM t_file WHERE file_id = :B1; (status = ‘ready’) SELECT status FROM t_file WHERE file_id = :B1; (status = ‘done’) User LCG_FTS_PROD2 UPDATE t_file SET status = ‘Transfering’ WHERE file_id= :B1; SELECT status FROM t_file WHERE file_id = :B1; (status = ‘Transfering’) UPDATE t_file SET status = ‘Done’ WHERE file_id = :B1; COMMIT; Basics – Transactions (2/3)

  13. User LCG_FTS_PROD1 UPDATE t_file SET status = ‘Done’ WHERE file_id = :B1; …wait… …what’s going on?… …damn… 1 row updated (aleluia!) User LCG_FTS_PROD2 UPDATE t_file SET status = ‘Transfering’ WHERE file_id= :B1; 1 row updated COMMIT; Basics – Transactions (3/3)

  14. Optimizations (1/2) • “Optimize just up to achieve the application needs” • Check what are the needs • This graph should take max 3 seconds to appear • Profile, where time is spent, • Optimize up to the needs, • Set as a baseline, • Write tests that check if this baseline is not met • Use the database features (Oracle is not MySQL) • Or else you can try to use Coral for generic applications • Involve your experiment DBA or PhyDB DBAs in the loop Database Performance and Tuning for developers - 14

  15. Optimizations (2/2) The steps for Tuning/Optimization • Identify what is slow: an application step is often thousands of lines of code -> intuition, code instrument, profiling • Understand what happens in this step, (execution plan) • Modify application / data so that it is better, sometimes it can be as simple as • Adding an index • Removing an index • Changing the definition of an index • Change the syntax of the select statement

  16. Hash Scans Joins Nested Loops Hash Joins SortMerge Joins Cartesian Joins Outer Joins Execution plan (1/3) • Series of steps that Oracle will perform to execute the SQL statement • Generated by the optimizer • Describes steps with meaningful operators - Access Paths • Table Access Full • Access by Index RowID • Index Scans • Index Unique Scan • Index Range Scan • Index Skip Scans • Full Scans • Fast Full Index Scans • Index Joins • Bitmap Joins

  17. Execution plan (2/3) • EXPLAIN PLAN • SQL command that allows to find out what is the executionplan before the SQL statement runs SQL> EXPLAIN PLAN FOR SELECT file_state FROM lcg_fts_prod.t_file WHERE file_id = :B1; SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY); ------------------------------------------------------- | Id | Operation | Name | ------------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | TABLE ACCESS BY INDEX ROWID| T_FILE | |* 2 | INDEX UNIQUE SCAN | FILE_FILE_ID_PK | ------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("FILE_ID"=TO_NUMBER(:B1)) • Use a tool (e.g. Benthic Golden - Ctrl-P)

  18. Execution plan (3/3) • The real one - from SQL*Plus SQL> set autotracetraceonly SQL> var :b1 number; SQL> exec :b1 := 3423 SQL> SELECT file_state FROM lcg_fts_prod.t_file WHERE file_id = :B1; ----------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 11 | 1 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T_FILE | 1 | 11 | 1 (0)| 00:00:01 | |* 2 | INDEX UNIQUE SCAN | FILE_FILE_ID_PK | 1 | | 0 (0)| 00:00:01 | ----------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("FILE_ID“=TO_NUMBER(:B1)) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 1 consistent gets 1 physical reads 0 redo size 279 bytes sent via SQL*Net to client 385 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed

  19. Oracle RAC architecture Database Performance and Tuning for developers - 19

  20. Oracle RAC architecture • Maximum a 3-way protocol Database Performance and Tuning for developers - 20

  21. Developing for RAC • Cache Fusion technology  shared cache • Better than go to disk • We should avoid too much interconnect communication • Concurrent access to same blocks to not scale • Normal B*-Tree Indexes on sequences • The most efficient execution plan in single instance is also the best in RAC • Large cache for sequences • And different sequences for different objects • Avoid DDL (data dictionary is shared among everyone) • Anyway you should do NO DDL in production • Replace frequent column updates by insert + deletes • Example later on Database Performance and Tuning for developers - 21

  22. Connection Management • Connection creation is slow and heavy • Connection pooling • Java, C++ • Persistent connections • PHP, Python • Connections can end – reconnect • Transactions can be aborted – retry • High load, slow SQL are not solved by more connections – limit max connections(!) • If DB not available, buffer queries for while? • Use row pre-fetch to reduce trips to the server Database Performance and Tuning for developers - 22

  23. Connection Management (FTS) Database Performance and Tuning for developers - 23

  24. Views • Way to hide complex SQL • Or hide some data you don’t want to expose • Use it for: • Give access to certain amount of data to the “reader”/”writer” accounts (see updatable views) • Hide complex SQL to the application layer • Do not: • Stack views • select * from view_x / view_x = select xx from view_y • Query data that can be better queried without view • Select t1_c2 from view_x where t1_c1=y • View_x = select t1_c1, t1_c2, t2_c1 from t1, t2 where t1_c1=t2_c2 Database Performance and Tuning for developers - 24

  25. Materialized views • Tables created as subqueries are stored but do not follow changes in base tables • Views defined as subqueries follow changes in base tables but are not stored • Impractical if querying big base table is costly • Materialized views created as subqueries are tables whose stored values follow changes in base tables! • They occupy space, but they significantly speed up queries! • Excellent for not real time data

  26. Materialized views and query rewrite • Typical syntax for materialized views: CREATE MATERIALIZED VIEW mv2 BUILD IMMEDIATE REFRESH ON COMMIT ENABLE QUERY REWRITE AS (SELECT… FROM tab1) • Automatic query re-write: CREATE MATERIALIZED VIEW mv_sal_per_deptno BUILD IMMEDIATE REFRESH START WITH ROUND(SYSDATE + 1) + 11/24 NEXT NEXT_DAY(TRUNC(SYSDATE), 'MONDAY')) + 15/24 ENABLE QUERY REWRITE AS (SELECT deptno count(empno), sum(sal) FROM emp GROUP BY deptno); • Now: SELECT depto, count(empno) FROM emp GROUP BY deptno; • Will probably use the mv_sal_per_depno

  27. Partitioning • Problem: Myqueries are gettingslow as mytable is enormous... • Partitioning is the key concept to ensure the scalability of a database to a very large size • data warehouses (large DBs loaded with data accumulated over many years, optimized for read only data analysis) • online systems (periodic data acquisition from many sources) • Tables and indices can be decomposed into smaller and more manageable pieces called partitions • Manageability:data management operations at partition level • parallel backup, parallel data loading on independent partitions • Query performance: partition pruning • queries restricted only to the relevant partitions of the table • Partitioning is transparent to user applications • tables/indices logically unchanged even if physically partitioned!

  28. Types of partitioning Partitioning according to values of one (or more) column(s) • Range:partition by predefined ranges of continuous values (historic) • Hash:partition according to hashing algorithm applied by Oracle • List: partition by lists of predefined discrete values (ex: VOs) • Composite:e.g. range-partition by key1, hash-subpartition by key2 (R+H) Composite (L+H) Composite

  29. INSERT INTO sales ( …, sale_date, … ) VALUES ( …, TO_DATE(’3-MARCH-2001’,’dd-mon-yyyy’), … ); JAN2001 FEB2001 MAR2001 … DEC2001 JAN2001 FEB2001 MAR2001 DEC2001 … SELECT … FROM sales WHERE sales_date = TO_DATE (’14-DEC-2001’,’dd-mon-yyyy’); Partitioning benefits: partition pruning Loading data into a table partitioned by date range Querying data from a table partitioned by date range

  30. … JAN2001 JAN2001 FEB2001 FEB2001 MAR2001 MAR2001 DEC2001 DEC2001 join … … JAN2001 JAN2001 FEB2001 FEB2001 MAR2001 MAR2001 DEC2001 DEC2001 tab1 joins Partition benefits: partition-wise joins • Without partitioning: global join (query time ~ N x N) • With partitioning: local joins (query time ~ N) SELECT … FROM tab1, tab2 WHERE tab1.key = tab2.key AND … tab1

  31. Partitioned (local) indexes • Indexes for partitioned tables can be partitioned too • Local indices: defined within the scope of a partition CREATE INDEX i_sale_date ON sales (sale_date) LOCAL • In contrast to global indexes: defined on the table as a whole • Combine the advantages of partitioning and indexing: • Partitioning improves query performance by pruning • Local index improves performance on full scan of partition • Prefer local indexes, but global indexes are also needed • Primary Key constraint automatically builds for it a global B*-tree index (as PK is globally unique within the table) • Bitmap indexes on partitioned tables are always local • The concept of global index only applies to B*-tree indexes

  32. Oracle and indexes • 3 indexes on a table  insert 10x slower • Limit indexes on very dynamic tables • Indexes are not read in parallel to tables • single block io -- read root block • single block io -- read branch block • single block io -- read leaf block which has row id • single block io -- read table block • 1, 2, 3, 4.... in order • index range scan is 1,2,3,4,3,4,3,4,3,4,3,4..... (in general) • Composite indexes faster (skip step 4) • “Index Range Scan” is usually scalable • “Index Fast Full Scan” is not scalable Database Performance and Tuning for developers - 32

  33. Index organized tables (IOT) • If a table is most often accessed via a PK, it may be useful to build the table itself like a B*-tree index! • In contrast to standard “heap” tables • Advantages and disadvantages: • Faster queries (no need to look up the real table) • Reduced size (no separate index, efficient compression) • But performance may degrade if access is not via the PK • IOT syntax (LHCb Bookkeeping example) CREATE TABLE joboptions ( job_id, name, recipient, value, CONSTRAINT pk_joboptions PRIMARY KEY (job_id) )ORGANIZATION INDEX;

  34. Bitmap indexes • Indexes with a bitmap of the column values • When to use? • low cardinalities (columns with few discrete values/<1%) • Merge of several AND, OR, NOT and = in WHERE clause SELECT * FROM costumers WHERE mar_status=‘MARRIED’ AND region =‘CENTRAL’ OR region =‘WEST’; CREATE BITMAP INDEX i_costumers_region ON costumers(region);

  35. Function-based indexes • Indexes created after applying function to column • They speed up queries that evaluate those functions to select data • Typical example, if customers are stored as “ROSS”, “Ross”, “ross” (design problem!): CREATE INDEX customer_name_index ON sales (UPPER(customer_name)); • Index only some items (the ones to be searched): CREATE INDEX criticality_iscritical ON criticality( CASE WHEN is_critical = ‘Y' THEN ‘Y'ELSE NULLEND); • Bitmap indices can also be function-based • Allowing to map continuous ranges to discrete cardinalities • For instance, map dates to quarters: CREATE BITMAP INDEX sale_date_index ON sales (UPPER TO_CHAR(sale_date, ‘YYYY”Q”Q’)); • Combining bitmap indices separately built on different columns speeds up multidimensional queries (“AND” of conditions along different axes)

  36. Reverse key indexes • Index with key reversed (last characters first) • When to use? • Most of keys share first characters (filenames with path) • No use of range SELECTs (BETWEEN, <, >, ...) • Sequencial values • 123, 124, 125 willbeindexed as 321, 421, 521 • How to create? CREATE INDEX i_ename ON emp (ename) REVERSE;

  37. Composite indexes (FTS example) • Index over multiple columns in a table • When to use? • When WHERE clause uses more than one column • To increase selectivity joining columns of low selectivity • How to create? • Columns with higher selectivity first • Columns that can be alone in WHERE clause first SELECT max(jobid) FROM t_job WHERE channel_name = :b1 group by vo_name; CREATE INDEX job_report ON t_job(channel_name, vo_name, job_id); DEPTNO MGR

  38. Analytic functions (FTS example) • Compute an aggregate value based on a group of rows • Sliding windows (group of rows) • AVG, COUNT, MAX, MIN, SUM • DENSE_RANK, RANK, LEAD • Example (FTS): Get next files to transfer SELECT id FROM (SELECT DISTINCT t_job.job_id id, DENSE_RANK() OVER ( ORDER BY t_job.priority DESC, SYS_EXTRACT_UTC(t_job.submit_time) ) TopJob FROM t_job, t_file WHERE t_job.job_id = t_file.job_id AND (( t_job.job_state IN ('Pending','Active') AND t_file.file_state = 'Pending') OR ( t_job.job_state = 'Pending' AND t_file.file_state = 'Hold' AND t_job.cancel_job = 'Y')) AND t_job.vo_name = :1 ) WHERE :2=0 OR TopJob<=:3 Database Performance and Tuning for developers - 38

  39. PL/SQL advantages of use • PL/SQL is a portable, high-performance transaction processing language • Tight Integration with SQL • Better performance • Full portability (runs on platform where Oracle runs) • Tight security • Large API Database Performance and Tuning for developers - 39

  40. PL/SQL advantages of use • Procedures • Small programs to execute a bunch of operations • Bulk deletes based on input date • Change values of rows based on several conditions • Restrict execution of queries on tables (called from R/W accounts) • Triggers • Starts automatically on a event (insert/delete/update) • Change value of another table if condition is met • Functions • Return value after changing an input • Convert unix timestamp Database Performance and Tuning for developers - 40

  41. Sequences • Example (LCG VOMS): • select seq from seqnumber; • update seqnumber set seq=55; • (no bind variables) • “sequence” is a database object that generates (in/de)creasing unique integer numbers • Can be used as Primary Keyfor the rows of a table • In the absence of a more “natural” choice for row identifier • Better than generating ID in application code • Very efficient thanks to caching • Uniqueness over multiple sessions, transaction safe, no locks • No guarantee that ID will be continuous • rollback, use in >1 tables, concurrent sessions • Gaps less likely if caching switched off • On RAC might be contention on sequences • Have big cache • Consider use of reverse indexes Database Performance and Tuning for developers - 41

  42. Inserts vs updates (Phedex example) • Problem: Contention due concurrent updates of transfer state of a file (single table) • Solution: • Created tables by transfer state • T_XFER_REQUEST, T_XFER_DONE, T_XFER_ERROR, etc • Insert on the right table when status change • Bulk deletes Database Performance and Tuning for developers - 42

  43. Hints • Instructions that are passed to the Optimizer to favor one query plan vs. another • /*+ hint hint hint … hint */ • Performance Tuning Guide and Reference manual • Many different types, e.g. hints for Optimization Approaches and Goals, Access Paths, Query Transformations, Join Orders, Join Operations, Parallel Execution, … • Our advise: avoid as much as possible! • complex, not stable across releases • CBO w/hints same as RBO w/developer setting rules instead of optimizer! • Warning: if they are wrongly set, Oracle will plainly ignore them • No error condition is raised • Need to check the query plan to be sure..

  44. Hints - Most famous • ALL_ROWS optimizes for best throughput • FIRST_ROWS optimizes for best response time to get the first rows… • FULL chooses a full table scan • It will disable the use of any index on the table • INDEX chooses an index scan for the table • INDEX_JOINwill merge the scans on several (single-)column indexes

  45. Hints – (SAM Example) • Problem: Query slow due usage of wrong index • Found by checking that execution got slower after index creation (to improve other queries) • Solution: Force usage of index by adding hint SELECT /*+ index(data TESTDATA__PK) */ data.EnvID, ... • Before: | Id | Operation | Name | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 559 (1)| 00:00:07 | | 1 | SORT ORDER BY | | 559 (1)| 00:00:07 | | 2 | HASH GROUP BY | | 559 (1)| 00:00:07 | | 3 | NESTED LOOPS | | 557 (1)| 00:00:07 | | 4 | NESTED LOOPS | | 557 (1)| 00:00:07 | |* 5 | INDEX RANGE SCAN | TESTCRITIC_TESTVOID3 | 12 (0)| 00:00:01 | | 6 | PARTITION RANGE ITERATOR | | 64 (0)| 00:00:01 | | 7 | TABLE ACCESS BY LOCAL INDEX ROWID| TESTDATA | 64 (0)| 00:00:01 | |* 8 | INDEX RANGE SCAN | TESTDATA_INVPK_IX | 60 (0)| 00:00:01 | |* 9 | INDEX RANGE SCAN | TESTDEF_SERVICEID | 0 (0)| 00:00:01 | ----------------------------------------------------------------------------------------------- Database Performance and Tuning for developers - 45

  46. Hints – (SAM Example) • Problem: Query slow due usage of wrong index • New execution plan has higher cost but… is faster • Due the values used in the query • Maybe with more statistics on table and indexes the good plan good be automatic • After: | Id | Operation | Name | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 11670 (1)| 00:02:21 | | 1 | SORT ORDER BY | | 11670 (1)| 00:02:21 | | 2 | HASH GROUP BY | | 11670 (1)| 00:02:21 | | 3 | NESTED LOOPS | | 11668 (1)| 00:02:21 | |* 4 | HASH JOIN | | 11668 (1)| 00:02:21 | |* 5 | INDEX RANGE SCAN | TESTCRITIC_TESTVOID3 | 12 (0)| 00:00:01 | | 6 | PARTITION RANGE ITERATOR | | 11655 (1)| 00:02:20 | | 7 | TABLE ACCESS BY LOCAL INDEX ROWID| TESTDATA | 11655 (1)| 00:02:20 | |* 8 | INDEX RANGE SCAN | TESTDATA__PK | 9559 (1)| 00:01:55 | |* 9 | INDEX RANGE SCAN | TESTDEF_SERVICEID | 0 (0)| 00:00:01 | Database Performance and Tuning for developers - 46

  47. Conclusion • Optimize just to achieve the application needs • Use the database specific features (Oracle is not MySQL) • Try to use Coral for generic applications • Involve your experiment DBA or PhyDB DBAs in the optimization • Do not play on production

  48. References & Resources • oradoc.cern.ch • Performance Planning manual • Performance Tuning Guide and Reference manual • Tom Kyte’sEffective Oracle by Design • Physics Databases wiki: https://twiki.cern.ch/twiki/bin/view/PSSGroup/PhysicsDatabasesSection

  49. Q & A Database Performance and Tuning for developers - 49

More Related